Thursday, 28 June 2012

OpsMgr by Example: The Active Directory 2008 Management Pack

OpsMgr by Example: The Active Directory 2008 Management Pack

Installation

1) Download the Active Directory Management Pack (http://www.microsoft.com/downloads/details.aspx?FamilyId=008F58A6-DC67-4E59-95C6-D7C7C34A1447&displaylang=en). The Active Directory Management Pack Guide is included in the download and labeled "OM2007_MP_AD2008.doc."

2) Read the Management Pack guide – cover to cover. This document spells out in detail some important pieces of information you will need to know.

3) Import the AD Management Pack (using either the Operations console or PowerShell).

4) Deploy the OpsMgr agent to all domain controllers (DCs). The agent must be deployed to all DCs. Agentless configurations will NOT work for the AD Management Pack.

5) Get a list of all domain controllers from the Operations console. In the Authoring space, navigate to Authoring -> Groups -> AD Domain Controller Group (Windows 2008 Server). Right-click on the group(s) and select View Group Members.

6) Enable Agent Proxy configuration on all Domain Controllers identified from the groups. This is in the Administration space, under Administration -> Device Management -> Agent Managed. Right-click each domain controller, select Properties, click the Security tab, and then check the box labeled "Allow this agent to act as a proxy and discover managed objects on other computers." Perform this action for every domain controller, even if the DC is added after your initial configuration of OpsMgr.

7) Configure the Replication account in the Operations console, under Administration -> Security (full details for this are in the AD MP Guide). Do this for every domain controller, even if a DC is added after your initial OpsMgr configuration.

8) Validate the existence of the "OpsMgrLatencyMonitors" container. Within this container, create sub-folders for each DC, using the name of each domain controller. If the container does not exist, it is often due to insufficient permissions. (See information configuring the Replication account within the AD MP Guide for details.)

9) Open the Operations console. Go to the Monitoring node and navigate to Monitoring -> Microsoft Windows Active Directory -> Topology Views and validate functionality. (You may have to set the scope to the AD Domain Controllers Group to get these views to populate).

10) Check to make sure Active Directory shows up under Monitoring -> Distributed Applications as a distributed application that is in the Healthy, Warning or Critical state. If it is in the "Not Monitored" state, check for domain controllers that are not installed or are in a "gray" state.

11) Create a MicrosoftWindowsActiveDirectory_Overrides management pack to contain any overrides required for the MP (hey, if it's not created now we'll never remember to create it and we'll end up using the default MP and that's not good – seehttp://cameronfuller.spaces.live.com/blog/cns!A231E4EB0417CB76!1152.entry orSystem Center Configuration Manager 2007 Unleashed for details there).

Deploying the Active Directory 2008 Management Pack was relatively painless. After importing the management pack, there was no significant impact on processors seen on the domain controllers. The Active Directory Topology Root appeared as a distributed application and showed a health state of green. The Active Directory diagram view also worked as expected.

Tuning/Alerts to Look For

We encountered and resolved the following alerts while tuning the Active Directory management pack.

Alert: The AD Last Bind latency is above the configured threshold.

Issue: One domain controller had consistently high AD Last Bind Latency. Logon to the system showed it as extremely unresponsive.

From product knowledge, we used the suggested tasks to validate that the bind was not going slowly and no high CPU processes were identified on the system. The view available in product knowledge pointed to a large spike in the time required for the LDAP query (checking the Active Directory Last Bind counter). The spike occurred while there was a very heavy processor utilization occurring on one of the domain controllers. This monitor checks every 5 minutes. Alert auto-resolved itself after the LDAP query was responding in an acceptable timeframe.
Resolution: Attempts to debug the issue were inconclusive and extremely difficult due to the performance issue with the system. We rebooted the domain controller, it came back online, and the AD Last Bind Latency returned to normal values.

 

Alert: A problem has been detected with the trust relationship between two domains.

Issue: A server in a location (site 1) lost communication with domain controllers that existed in a second location (site 2). This critical alert did NOT auto-resolve. This was detected by the alert rule "A problem has been detected with the trust relationship between the two domains." We verified that the Last Modified date occurred during the outage (add this column to the display by personalizing the view on the Active Alerts to include the field) and the Repeat Count was not incrementing.

Resolution: We used the Active Directory Domain Controller Server 2008 Computer Role Task of Enumerate Trusts to validate all trusts were working after site connectivity was re-established. We then logged into the domain controller reporting the error and used the Active Directory Domains and Trusts UI to validate each of the trusts. We closed the alert manually.

 

Alert: A problem with the inter-domain trusts has been detected.

Issue: A server in a location (site 1) lost communication with domain controllers that existed in a second location (site 2). This critical alert did NOT auto-resolve. This was detected by the AD Trust Monitoring monitor which runs every 5 minutes using the AD Monitor Trusts script. We verified that the Last Modified date occurred during the outage (add this column to the display by personalizing the view on the Active Alerts to include the field) and the Repeat Count was not incrementing.

Resolution: We used the Active Directory Domain Controller Server 2008 Computer Role Task of Enumerate Trusts to validate all trusts were working after site connectivity was re-established. We next logged into the domain controller reporting the error and used the Active Directory Domains and Trusts UI to validate each of the trusts. This alert should auto-resolve when the trust relationships are working, but that functionality does not appear to work. We manually closed the alert.

 

Alert: AD Op Master is inconsistent.

Issue: We tested using the Alert Monitor "Ad Replication Partner Op Master Consistency," which runs every minute, to verify the incoming replication partners for the domain controller show the same operations masters. We also used the REPADMIN Replsum task in the Active Directory MP.

Resolution: The REPADMIN Replsum command validated that replication was functioning correctly (we had to override the "Support Tools Install Dir" on Windows 2008 to %windir%\system32 to make the task work correctly). The link between the domain controllers has been running close to fully saturated. The alert auto-resolved once the network utilization slowed down.

 

Alert: AD Client Side – Script Based Test Failed to Complete.

Issue: This alert is generated by the "AD Replication Partner Op Master Consistency" monitor. The system reporting the error was generating an error of event id 45 in the Operations Manager Log from the source of Health Service Script.

This event is occurring on an hourly basis (12:57, 1:58, and so on):

AD Replication Partner Op Master Consistency : The script 'AD Replication Partner Op Master Consistency' failed to execute the following LDAP query: '<LDAP://servername.contoso.com/CN=Configuration,DC=CONTOSO,DC=COM>;(&(objectClass=crossRefContainer)(fSMORoleOwner=*));fSMORoleOwner;Subtree'.

The error returned was 'Table does not exist.' (0x80040E37)

This alert is linked to "Could not determine the FSMO role holder." alerts that are occurring.

Resolution: We believe this was related to a misconfiguration of the anti-virus settings on the domain controllers in the environment.

Alert: DC has failed to synchronize its naming context with replication partners.

Issue: A server in a location (site 1) lost communication with domain controllers that existed in a second location (site 2). The rule generating this alert is "DC has failed to synchronize naming context with its replication partner".

Resolution: The alerts occurred when connectivity was lost between the sites. These alerts had a Repeat Count of 0. We used the REPADMIN Replsum command to validate that replication was functioning correctly (had to override the "Support Tools Install Dir" on Windows 2008 to %windir%\system32 to make the task work correctly). We closed the alerts manually.

Alert: Could not determine the FSMO role holder.

Issue: Each domain controller in the environment reported the error when trying to determine the Schema Op Master on the various domain controllers. The rule generating this was "Could not determine the FSMO role holder".

Resolution: We used the NETDOM Query FSMO task (changing the Support Tools Install Dir to %windir%\system32) to validate the FSMO role holders on each domain controller.

 

Alert: DC has failed to synchronize its naming context with replication partners.

Issue: One of the domain controllers in the environment went to a grayed out status.

The server having the issues reported the "DC has failed to synchronize its naming context with replication partners" issue and "A problem has been detected with the trust relationship between two domains" and "AD Replication is occurring slowly" and "Script Based Test Failed to Complete" (for multiple AD related scripts).

Other domain controllers reported "Could not determine the FSMO role holder" and "AD Client Side – Script Based Test Failed to Complete".

Events also occurred on the client system (21006 OpsMgr Connector, 20057 OpsMgr Connector, 21001 OpsMgr Connector).

Resolution: We installed the Telnet client feature to test connectivity to the management server. Telnet connectivity failed from this system but not from others. We then restarted the OpsMgr Health service but it had no effect on the gray status. After rebooting the system, the status went back to non-gray.

 

Alert: AD Client Side – Script Based Test Failed to Complete.

Issue: AD Replication Partner Op Master Consistency: The script 'AD Replication Partner Op Master Consistency' could not create object 'McActiveDir.ActiveDirectory'. This is an unexpected error. The error returned was 'ActiveX component can't create object' (0x1AD)

Resolution: In MOM 2005, this was resolved by changing the Action account. In OpsMgr 2007, this alert occurred in a different domain than the one with the OpsMgr RMS server. To resolve this, we created a Run As Account for the domain (DMZ) and assigned the Run As Account to the AD domain controllers in the DMZ domain.

 

Alert: Script Based Test Failed to Complete.

Issue: AD Lost And Found Object Count: The script 'AD Lost And Found Object Count' failed to create object 'McActiveDir.ActiveDirectory'. This is an unexpected error. The error returned was 'ActiveX component can't create object' (0x1AD)

Resolution: We configured the AD MP Account (Administration / Security / Run As Profiles) for each of the two servers in the domain that were reporting errors.

 

Alert: Script Based Test Failed to Complete.

Issue: AD Database and Log : The script 'AD Database and Log' failed to create object 'McActiveDir.ActiveDirectory'. The error returned was 'ActiveX component can't create object' (0x1AD).

Resolution: We configured the AD MP Account (Administration -> Security -> Run As Profiles) for each of the two servers in the domain that were reporting errors.

 

Alert: Performance Module could not find a performance counter.

Issue: In PerfDataSource, could not resolve counter DirectoryServices, KDC AS Requests, Module will be unloaded.

Resolution: We created a Run As Account and configured the AD MP Account (Administration -> Security -> Run As Profiles) for each of the two servers in the domain that were reporting errors.

 

Alert: Script Based Test Failed to Complete.

Issue: AD Database and Log : The script 'AD Database and Log' failed to create object 'McActiveDir.ActiveDirectory'. The error returned was 'ActiveX component can't create object' (0x1AD)

Resolution: We installed OOMADS from the OpsMgr 2007 SP 1 CD.

 

Alert: This domain controller has been promoted to PDC.

Issue: No issue, this was an informational message. The message was generated when the PDC emulator role was moved between domain controllers.

Resolution: No actions required, this message is provided for situations where the PDC emulator role was moved unexpectedly.

 

Alert: The Domain Changes report has data available.

Issue: No issue, this was an informational message. This was generated when the PDC emulator role was moved between domain controllers in the environment.

Resolution: No actions required, this message is provided for situations where the PDC emulator role was moved unexpectedly.

 

Alert: AD Domain Performance Health Degraded.

Issue: More than 60% of the DCs contained in this AD Domain report a Performance Health problem

Resolution: This alert indicates that there are alerts that are occurring in more than 60% of the domain controllers in a domain. This alert does not require an action for itself but does require analysis to determine what is causing the domain controllers to be in a degraded state.

 

Alert: AD Site Performance Health Degraded.

Issue: More than 60% of the DCs contained in this AD Site report a Performance Health problem

Resolution: This alert indicates that there are alerts that are occurring in more than 60% of the domain controllers in a site. This alert does not require an action for itself but does require analysis to determine what is causing the domain controllers to be in a degraded state.

 

Alert: Account Changes Report Available.

Issue: Informational alert, which can be accessed in the AD SAM Account Changes report (available on the right side under Active Directory Domain reports).

Resolution: No resolution required. We checked the AD SAM Account Changes report (available on the right-side under Active Directory Domain reports) to see the changes that were available.

 

During our testing, we had a period of time when we lost network connectivity to a site that had one of the domain controllers. The result was a flurry of alerts listed below:

 

Alerts:

Critical Alerts:

  • A problem with the inter-domain trusts has been detected
  • DNS 2008 Server External Addresses Resolution Alert
  • OleDB: Results Error

Warnings:

  • A problem has been detected with the trust relationship between two domains
  • AD Client Side – Script Based Test Failed to Complete (multiple)
  • Could not determine the FSMO role holder. (multiple)
  • DC has failed to synchronize its naming context with replication partners (multiple)

Issue: Loss of network connectivity between one site and another, both of which had domain controllers.

Resolution: Once network connectivity was re-established, we resolved all issues identified above.

 

UPDATE: 02/25/09

Alert:  The Op Master Schema Master Last Bind latency is above the configured threshold.

 

Issue: A large number of alerts are generated at > 5 seconds for warning and > 15 seconds for error.

 

Resolution: Per http://technet.microsoft.com/en-us/library/cc749936.aspx the effective thresholds should be changed to warning at > 15 seconds and error at > 30 seconds. Created an override for all types of Active Directory Domain Controller Server 2008 Computer role to change Threshold Error Sec to 30 and Threshold Warning (sec) to 15 and stored it in the ActiveDirectory2008_Overrides management pack.

 

Alert:  The Op Master Domain Naming Master Last Bind latency is above the configured threshold.

Issue: A large number of alerts are generated at > 5 seconds for warning and > 15 seconds for error.

 

Resolution: Per http://technet.microsoft.com/en-us/library/cc749936.aspx the effective thresholds should be changed to warning at > 15 seconds and error at > 30 seconds. Created an override for all types of Active Directory Domain Controller Server 2008 Computer role to change Threshold Error Sec to 30 and Threshold Warning (sec) to 15 and stored it in the ActiveDirectory2008_Overrides management pack.

related posts

  • No Related Post
Source - MajiAai



Submit your suggestion / comments / complaints / Takedown request on lookyp.com@gmail.com

No comments:

Post a Comment