1dent1ty cHa0s: February 2011

Monday, February 21, 2011

Using DFS and GPO in FIM High Availability Scenarios PowerPoint

FIM 2010: Build 4.0.3573.2 Performance Improvements, part 3

In our previous two installments:

FIM 2010- Build 4.0.3573.2 Performance Improvements, part 1 – I documented the base configuration of the Hyper-V environment
FIM 2010- Build 4.0.3573.2 Performance Improvements, part 2 – I documented the configuration of the VM's, the two disk configurations and baseline results in build 4.0.3531.2 (RTM w/Update 1) to use as our comparison for the new hotfix build 4.0.3573.2

In this installment I'll show you how new features can be enabled in this build to improve your initial load times by a factor of 3.

Installation

To download the new hotfix, follow the link and file the request:

http://support.microsoft.com/kb/2417774

All of the features get a boost here and FIM CM gets a bunch of fixes that you can read about in the hotfix article (sorry Brian). The two we will focus on here are:

FIMService_x64_KB2417774
FIMSyncService_x64_KB2417774

Don't forget that there are updates to PCNS and the FIM Add-ins to deploy as well! As always, remember to take a good backup of your databases in case you need to roll these back. I didn't encounter any issues during the installation other than the fact you must manually stop the service before the installer can continue.

Case Bug Fixed

One of the significant, yet minor fixes resolves an issue where updates to string attributes would be marked as completed but fail to update. This happened whenever you were just changing the case of the string, and not the full value itself:

Case-only changes that are made to existing attributes are not applied to the FIM service database even though the Requests are marked as Completed.

I've tested this and it works, no more exported-change-not-reimported errors due to case updates!

Initial Load Performance Improvements

The new hotfix provides an asynchronous export mode from the FIM MA to the FIM Service. While in some configurations this could lead to a slow down in portal usability, if you have deployed your Service Partitions correctly then you should be pointing your FIM Sync to either a dedicated Admin Portal instance of the FIM Service, or a dedicated instance for FIM Sync as I do. However, I should note, that even in these situations, if you completely load up the SQL Server that is hosting the FIMService database, you will experience a slow down on all service partitions regardless.

There are two additions you can make to the configuration files in order to enable the new 'async' mode:

Microsoft.ResourceManagement.Service.exe.config

synchronizationExportThrottle="Unlimited"/>

The resourceManagementService section has three new settings:

synchronizationExportThrottle="Single" – Default mode, do nothing and you have the existing behavior of ~0.6 Exports/sec
synchronizationExportThrottle="Unlimited" – New mode where each export is confirmed immediately allowing the next export to export right away. The exports to the FIM Service are cached and evaluated asynchronously
synchronizationExportThrottle="Limited" requestRecoveryMaxPerMinute="60" – New mode allowing you to use the capabilities of the async mode but in a responsible manner so as not to degrade portal performance; use the second parameter to help gate this

There is an excellent note here I'll reiterate:

We expect that customers will have to optimize this setting to for their environment, and to accommodate their hardware capabilities and portal load. To tune this setting, monitor the FIM database SQL CPU usage and the Windows Workflow Foundation Workflows In Memory performance counters. Adjust the throttle up or down until you obtain a maximum throughput state. Example target metrics include SQL CPU usage of about 70% and Windows Workflow Foundation not building up a large queue in the Workflows in Memory performance counter.

This setting can be changed dynamically You do not have to re-start the FIM service.

It's nice that you don't have to bounce the service to get this change to go into effect, but I agree with Craig's Connect suggestion that these should have been Run Profile options which would allow us to call a predetermined configuration whenever we needed without resorting to changing the configuration file between runs.

miiserver.exe.config

In the FIM Sync configuration file we have two new lines you need to insert, but only if you wish to change the default behavior once either the "Unlimited" or "Limited" modes are enabled.

I didn't notice this line initially and I ended up with a nasty BAIL error when running any of the FIM MA run profiles, so make sure you insert this under the first.

In the main body of the configuration file you can insert a resourceSynchronizationClient tag now, the default of which is:

This allows you to tune the three parameters if you have a high-performance disk array:

Property Name	Default Value	Description
exportFetchResultsPollingTimerInSeconds	5	When the Synchronization service is exporting objects in asynchronous mode, this property controls the frequency of polling results that are returned from the FIM service by the FIM MA. Changing this value may give a higher processing rate, depending on your system configuration.
exportRequestsInProcessMaximum	50	When the Synchronization service is exporting objects in asynchronous mode, this property controls how many requests can be queued up in the FIM service for processing. If this limit is met, FIM MA will wait until asynchronous results are sent back before resuming additional exports. Setting this value higher may provide additional processing throughput during export. However, during system failures, these objects may have to be re-exported from the synchronization engine when the FIM-Export process restarts.
exportWaitingForRequestsToProcessTimeoutInSeconds	600	This is the time-out value that FIM MA will use to wait for the FIM service to process a request. If no response is received from the FIM service within this time, FIM MA will end the export with a “cd-error” error.

Now, with these settings you must restart the FIM Sync service in order to get them to apply.

Performance Tuning Results

I ran two tests, both with the new "Unlimited" switch enabled – the first was with the default Sync settings and the second I increased the values to see if I could eek out any additional performance; here are the results:

	4.0.3573.2 Disk Configuration 2 synchronizationExportThrottle=" Unlimited"	4.03573.2 Disk Configuration 2 exportFetchResultsPollingTimerInSeconds="15" exportRequestsInProcessMaximum="100" exportWaitingForRequestsToProcessTimeoutInSeconds="600"
Records (8 attributes/record)	11,251	11,251
FIM MA Export Only Elapsed Time (mins)	92	103
FIM MA Objects Exported/sec	2.031	1.818
Processor Time - miiserver	2.071	1.901
Processor Time - fimservice	2.13	1.89
Processor Time – SQL	66.231	60.545
Logical Disk (SQL) - Average Disk Queue Length	0.986	1.594
Logical Disk (SQL) - Average Disk sec/Transfer (ms)	7	11
Objects Exported/sec Improvement Factor over Previous configuration	2.97	0.90
Elapsed Time improvement over previous configuration (mins)	122	-11

In the first test we can see a huge improvement – 3x over our best run with the previous build breaking 2 Objects Exported/sec and we came in 122 minutes under our prior time! Note that our disk latency and queue are now beginning to show the signs of another bottleneck. In both tests the CPU on the SQL Server was above 60% indicating there was still room to push the system but disk got in the way.

In the second test, I increased the default settings and we ran 11 minutes over reducing our Objects Exported/sec to 1.818 exposing disk as our bottleneck again with an Average Disk Queue Length of 1.594 and latency up to 11ms. Again, this is a home Hyper-V setting with desktop components, so a good Enterprise class deployment should be able to exceed these numbers. It's encouraging that the new asynchronous mode will let us stress the disk a bit more which seems to indicate that the caching can further expand performance on well tuned systems.

I would encourage everyone to start with the defaults and get a good grasp on what your overall disk performance is like so that you know when to back off of some of these settings. If you can keep your queue lengths to 1 or below then you should be at the right mark. In future tests, I hope to move some of the VHD's onto the SSD and see if I can eek out any more performance on this system.

Sunday, February 20, 2011

FIM 2010: Build 4.0.3573.2 Performance Improvements, part 2

In the previous installment, FIM 2010: Build 4.0.3573.2 Performance Improvements, part 1, I documented the base configuration of my Hyper-V test machine and now I'll document the configuration of the virtual machines themselves and share the results of the initial disk tuning for the patched RTM release, build 4.0.3531.2.

Virtual Machine Configuration

Dedicated AD DC (2008 R2)
Dedicated SQL Server (2008 R2 10.50.1600) w/Dual Processors and 4GB RAM

Separate OS (4k), DB (64k), Logs (64k), and TempDB (64k) drives within the VM, but all VHD’s on a single RAID 4-drive set
All VHD files were dedicated (fully expanded), not dynamic

FIM Sync/Service Server (2008 R2) w/Dual Processors and 2GB RAM
FIMService and FIMSynchronization databases set to Simple recovery and pre-grown to 4GB (DB and Logs)
No autogrowth observed throughout the load on either DB
All NIC's (virtual and physical) have Large Send offload disabled

Initial Load Scenario

In my initial load scenario testing I have the FIM Service loaded bare, with no additional sets, policies or workflow added, the same as you'd expect prior to migrating any policy over. In my personal testing, I've see 44% faster load times simply by not loading your policy first and importing all of your objects into a pristine system.

So, we have all of the FIM Services running on a single VM and all of the databases hosted on a single SQL Server, both joined to a domain hosted by a dedicated AD Domain Controller. Next, I will illustrate the disk configuration.

In the first example we have a poor disk I/O configuration, no caching and RAID 5 – this configuration leads to high disk queue length and disk latency making the disk configuration a clear bottleneck. In Configuration 2 we have a somewhat tuned configuration where we've added disk caching, moved the System partition to an SSD and moved to a more efficient RAID 10; from the results below we can see that the disk is no longer a bottleneck.

	4.0.3531.2 Disk Configuration 1	4.0.3531.2 Disk Configuration 2
Records (8 attributes/record)	11,251	11,251
FIM MA Export Only Elapsed Time (mins)	585	214
FIM MA Objects Exported/sec	0.319	0.684
Processor Time - miiserver	0.40%	0.72%
Processor Time - fimservice	14.35%	0.63%
Logical Disk (SQL) - Average Disk Queue Length	2.256	0.001
Logical Disk (SQL) - Average Disk sec/Transfer (ms)	108	3
Objects Exported/sec Improvement Factor over Previous configuration	n/a	2.14
Elapsed Time improvement over previous configuration (mins)		371

Baseline Results

The results from the baseline tests clearly show that the disk subsystem can have adverse effects on the state of your FIM performance, especially when it comes to the initial load scenario. With some simple disk tuning we were able to reduce the run time by 371 mins and achieve a 2.14x improvement over the elapsed running time to export the same records. Average disk queue lengths <1 should not indicate a bottleneck and the fact that our overall latency dropped from 108 ms to 3 ms backs this up. We generally want to keep the latency under 10 ms, and no more than 20 ms. I would like to point out that while the SQL Server disk is broken down into separate volumes, all of the VHD's from all of the VM's are on the same RAID volume in both configurations which would be typical of SAN deployments that split LUN's across all spindles.

In your deployments you should be dealing with at least workgroup class hardware with real servers and performance class SAS/SCSI drives in the 10k-15k RPM range with caching RAID array controllers and should be able to achieve similar results in your initial baseline. In fact, the improved numbers I see here match very closely what I've obtained running on IBM production class hardware and fibre attached SAN (NetApp). I have not been able to personally break 0.7 Objects Exported/sec for an initial load scenario on any configuration running 4.0.3531.2 (RTM with Update 1). I believe these results indicate that now the FIM Service becomes the clear bottleneck as there are no other counters indicating a processor, memory, or network bottleneck.

In the next installment I'll look at how loading the new 4.0.3573.2 hotfix improves times on the same disk configuration 2.

Saturday, February 19, 2011

FIM 2010: Build 4.0.3573.2 Performance Improvements, part 1

I've had some time recently to setup a test rig at home and begin performing some baseline performance tests of our biggest performance problem, the initial load experience with the FIM MA. Here is some information on my test system:

Windows Server 2008 R2 Datacenter running Hyper-V
Intel Core i5-760 2.8GHz Quad Core
16GB RAM (4 4GB DDR3 1333)
EVGA P55V (120-LF-E651-TR) – Intel P55 1156 motherboard with onboard RAID (non-caching)
4 Seagate Barracuda 7200 RPM 1.5TB SATA II (3Gb/s) in RAID 5 array hosting the OS and VM’s
1 Samsung Spinpoint 5400 RPM 2TB SATA II (3Gb/s) as dedicated backup (Volume Shadow Copy) volume

Now, for a point of reference, this is probably the worst disk configuration you can have when it comes to SQL Server and FIM performance metrics. As we'll see the low disk I/O, RAID level and lack of RAID cache will really cause the numbers to fall on my initial test. Shortly after performing my initial test, I upgraded my test rig with the following components:

LSI MegaRAID SAS 9260-4i 512MB Caching RAID controller
OCZ Vertex 2 60GB SATA II SSD

The disk configuration for the subsequent tests is as follows:

1 OCZ Vertex 2 60GB SATA II SSD (3Gb/s) hosting the Hyper-V Host OS (AHCI with TRIM)
4 Seagate Barracuda 7200 RPM 1.5TB SATA II (3Gb/s) in RAID 10 array hosting the VM’s
1 Samsung Spinpoint 5400 RPM 2TB SATA II (3Gb/s) as dedicated backup (Volume Shadow Copy) volume

Hosting the Hyper-V host on the SSD really screams and the addition of the caching array controller running in RAID 10 mode made a measurable difference to the performance of all of the guest VM's. In future tests I'll try moving the database VHD's to the SSD.

The subsequent posts will focus on performance improvements through disk upgrades as well as the ones introduced in the new 4.0.3573.2 hotfix rollup.

Friday, February 18, 2011

SOAP security negotiation with 'http://fim:5725/ResourceManagementService/Resource'

We finally got to the bottom of a problem we were having with the Public Client with regards to this odd SOAP security negotiation error. The inner exception might look something like this:

Inner Exception: Security Support Provider Interface (SSPI) authentication failed. The server may not be running in an account with identity 'FIMService/fim.test.com. If the server is running in a service account (Network Service for example), specify the account's ServicePrincipalName as the identity in the EndpointAddress for the server. If the server is running in a user account, specify the account's UserPrincipalName as the identity in the EndpointAddress for the server.

Oddly enough, our error contained the SPN reference of 'host/' and not 'FIMService', but the real problem here as to do with the way your Kerberos delegation is setup for your FIM Service account – the account that is running the FIM Service itself. The 'Before You Begin' section of the Install Guide correctly instructs you to configure the Service Principal Names for this account, however it leaves out one bit of clarifying information when instructing you how to configure the Constrained Delegation. The instructions are:

Turn on Kerberos delegation for the FIM Service service account in AD DS. You can turn on delegation for all services either by selecting Trust this user for delegation to any service (not recommended) or by using constrained delegation (recommended) by selecting Trust this user for delegation to the specified services only. If you use constrained delegation, search for the FIM Service service account, and then select the entry that you added in the previous step.

Now, here is how we had our FIM Service account configured:

Note the setting "Use Kerberos only" – using this configuration will restrict the delegated service from delegating to a service using any other protocol other than Kerberos. In this configuration, FIM itself works just fine and the first time I saw this create an issue was when testing Henrik Nilsson's FIM Attribute Store for ADFS. I kept getting errors and I was assured they were Kerberos issues, of which I stubbornly pointed out that everything was configured properly and working on my side. Smile

So, three are three types of delegation with respect to Kerberos:

Unconstrained delegation – the "old" way
Constrained delegation – the "new recommended" way
Constrained delegation with Protocol Transition – for when the initial authN is not Kerberos based

When you configure constrained delegation in this manner using the Use Kerberos Only setting, you are preventing protocol transition from occurring. For reasons I don't completely understand, the FIM Public Client leverages protocol transition and the internal FIM classes do not.

So, how do I fix this thing? Easy, set the account to the "Use any authentication protocol" setting and then restart your FIM Services.