In order to deploy and configure your Gateway Server (see my previous article), you need to approve it via the Microsoft.EnterpriseManagement.GatewayApprovalTool.exe. When you make use of this tool, you can have an error saying that “The gateway name already exists as a computer instance”. If you check OperationsManager event log, you can retrieve this error through:
Event Id 20000 on the Management Server
Event Ids 20070 and 21016 on the Gateway Server
You cannot deploy Gateway Server role on a server with already installed Operations Manager Agent. You will have to uninstall agent locally and remove the computer object from the Agent Managed list on Administration Pane of the console. But in some case, you will have to delete also the managed object directly in OperationsManager Database, more precisely on the dbo.MT_Conputer database. To achieve it you can directly run this query against your SQL instance dedicated to SCOM.
WHERENetbiosComputerName='NETBIOS NAME OF YOUR GATEWAY SERVER'
SCOM 2012 requires mutual authentication be performed between agents and management servers prior to the exchange of information between them. To secure the authentication process between the two, the process is encrypted. When the agent and the management server reside in the same Active Directory domain or in Active Directory domains that have established trust relationships, they make use of Kerberos V5 authentication mechanisms provided by Active Directory.
When the agents and management servers do not lie within the same trust boundary (non-trusted domain, DMZ, workgroup, etc.), X.509 certificates are used to satisfy the secure mutual authentication requirement. If there are many agent-monitored computers, this results in high administrative overhead for managing all of those certificates. In addition, if there is a firewall between the agents and management servers, multiple authorized endpoints must be defined and maintained in the firewall rules to allow communication between them.
To reduce this administrative overhead, Operations Manager has a server role called the gateway server. Gateway servers are located within the trust boundary of the agents and can participate in the mandatory mutual authentication. Because they lie within the same trust boundary as the agents, the Kerberos V5 protocol for Active Directory is used between the agents and the gateway server. Each agent then communicates only with the gateway servers that it is aware of. The gateway servers communicate with the management servers. To support the mandatory secure mutual authentication between the gateway servers and the management servers, certificates must be issued and installed, but only for the gateway and management servers. This reduces the number of certificates required, and in the case of an intervening firewall, it also reduces the number of authorized endpoints to be defined in the firewall rules.
Now let’s see how to deploy and configure these Gateway Servers!
The first thing to do is checking that all servers are able to resolve the FQDN of each other. To check FQDN resolution, make use of ns lookup command in CMD or Powershell on each other computer. Typically this is done through DNS, but if you can’t make use of it, you can use local host file on each server.
To edit the host file from an elevated CMD/Powershell, use:
Enter the IP address and FQDN of all Gateway Servers on all members of the All Management Servers Resource Pool
Enter the IP and FQDN of the All Management Servers Resource Pool on all Gateway Servers
Save the file and close
Then, you need to make sure that if there are any firewalls between the two endpoints that port 5723 is open. You can validate this by running telnet from management servers.
Note You can install telnet client from Features of Server Manager.
Deploying gateway servers requires certificates on all servers in the management group and all gateway servers. These can be internal via a CA or external from a third party vendor like VeriSign.
If you are using a CA follow these instructions:
Open an MMC console
Then click on file, add/remove snap-in
In the Add or Remove Snap-ins, add Certificate Templates and Certification Authority
Expand Certificate Templates. In the Certificate Templates Console Right Click IPSec (Offline request) and then select duplicate template
On the General Tab, type a name for SCOM Template
On the Request Handling:
Select Allow private key to be exported
For 2000 & 2003 Domains:
For Windows 2003 Check Microsoft RSA SChannel Cryptographic provider
For Windows 2000 Check Microsoft Enhanced Cryptographic provider 1.0
On the Extensions Tab:
Select the Applications Policies and Click Edit
Remove IP security IKE intermediate
Add Client Authentication and Server Authentication
On the Security Tab:
Verify that Users should have read rights and enroll rights
Now we need to add the Template to the Certificate Authority
Expand Certification Authority
Right Click on Certificate Templates, then New, then Certificate Template to Issue
Select the template you just created and Click OK
The template you just created should now show up in the Templates list.
Now that we have our certificate template we need to install it on the Gateway server. Create a .inf file and edit it as below.
Then execute the command from a CMD/PowerShell window:
certreq -new -fcer.infcer.req
Open the .req file and copy the key to the clipboard. Go to the web portal of your CA, use https://<servername>/certsrv
You must be logged with account granted with permissions to enroll certificates with template previously created.
Paste in the key in the saved request and select the template created previously. Export the certificate from the CA and import it on the gateway server, again using the MMC on the local server and place it in Personal Certificates. This needs to be done for all management servers and all gateway servers.
Gateway Approval Tool
Before installing the gateway, Gateway Approval Tool must first be executed to provision new gateway. This tool must be first copied from Setup sources to MS local folder. Copy below files:
Note You will need to do this for every Gateway server you are installing. As a result, a newly provisioned gateway server will be visible from SCOM console.
Install Gateway Service
Now that you have all of the prerequisites done you need to install the Gateway service.Right Click on Setup.exe and Run as administrator. As always you are greeted with the System Center 2012 screen. Under Optional Installations Click Gateway management server.
For the Gateway Action Account, you can Local System account or a domain account with local administration privilege in the untrusted domain.
The next bit of configuration is to run the MOMCertIMport.exe tool. Copy the MOMCertIMport.exe Tool from the \SupportTools\ folder of the Operations Manager 2012 distribution files under your respective processor folder. Run the .exe and choose the certificate that we generated before like below.
The final step is to make sure that the management server and the gateway server can properly communicate through the use of a run-as account. In the Administration space, choose Accounts under Run As Configuration.
In the Actions pane, create Run As Account
On the General Properties page make sure that the Run As account type is Windows. Give the account a display name and choose next
You will need to provide account credentials for this run-as account. This should be the action account in the untrusted domain where the gateway server resides. This account needs to be a local administrator on all of the gateway servers in this domain and needs local logon rights
On the window Select a distribution security option, choose More secure
Then, go into the account and assign the gateway servers
Everything should be good at this point. You can validate your installation by looking at Operations Manager application log in Event viewer on Management Server and Gateway Server. You should see successful events on both computers.
You launched your standard availability report in SCOM and you face incomplete report with missing data replaced by Monitoring Unavailable state data? So this article can help you to resolve your problem.
To generate report SCOM use the Operations Manager Datawarehouse database that contains all collected data. In order to reduce the size of this database with too much detailed data, SCOM has a complicated process called the Data Warehouse Availability Aggregation Process.
In brief, the SCOM Management Server Data Warehouse Writer Datasource write the Performance Data and State Data to a raw staging table then the DW Staging Process handles this data by copying the raw into the raw data partition tables. The Standard Maintenance Process generates the Aggregation Sets that have to be processed. During this process, there will be aggregation process rows in the aggregation history table with a dirty indication (DirtyInd) of 1. The raw staged partition data will be processed to aggregated hourly and daily data. When the aggregation is complete the dirty indication for that aggregation will be set on 0. Finally, the stored procedure reads just aggregated data that will be used to generate reports for the end user.
Inside the Data Warehouse, a table called dbo.HealthServiceOutage keeps the outage data when this occurs on all objects. However, sometimes, it forgets to enter an outage end time. And that is the key to our current issue. The result shows you the number of aggregations that are not completed yet for each data set. If you have a high number like this, you have an issue.
So let’s take a look at the Health Service Outage table thanks to this query:
You can see how each managed entity has a reason for the outage like start time, and end time defined by ReasonCode. However, in some cases, the end time will be NULL.
There are two reasons for this.
The object that is “unavailable” is truly still unavailable and this should be NULL.
The object is now healthy, but an EndDateTime did not get written. Unless you have a big problem in your environment, you should have very few of these.
A bunch of Ids doesn’t get us very much information, so let’s enhance the query a bit. This query shows us ONLY the items with a NULL EndDateTime. It is also joined to the ManagedEntity table so we can see the actual names of the objects.
Notice that some objects EndDateTime is NULL. Assuming it is actually available, this should have the actual end date.
The Health Service Outage data is considered in the availability calculations of the Standard Data Set in the Data Warehouse. If you look at any of the State.StateDaily_[GUID] tables, you will see the “HealthServiceUnavailableMilliseconds” column is always maxed out, assuming the other columns are 0. We can make the Data Warehouse recalculate this data by modifying the Health Service Outage table.
If you want to make your objects available again, you can follow the steps below. Please note:
This is NOT supported by Microsoft
You SHOULD backup your Data Warehouse before making any changes
After we make the changes, you Data Warehouse might have to do A LOT of recalculating, causing a kick in performance for a short period of time, or it might cause you data warehouse to fall behind for a little while. If you are having performance issues with your Data Warehouse, you should address them first.
The first thing we want to do is make sure Data Warehouse is not behind. Run the below query. If you get more than one of two rows then continue to read. For others please click here.
WHERE(d.DatasetDefaultName='State data set')
In this case, we are having a serious problem. In order to fix it we will have to run a specific query.
But first, we must override “Enabled” parameter to False on Standard Data Warehouse Data Set maintenance rule for all instances of Standard Data Set.
Now we are really sure no maintenance process is running. And we run own maintenance process every minute because we know catching up the state data aggregation will take some time and we don’t want to create problem’s for other datasets (performance, event, etc.) through this query.
PRINT'Starting loop of StandardDatasetMaintenance jobs'
If you have results, then you have an EndDateTime that is NULL. Before assuming that is should not be NULL, you should go into SCOM and verify the state of the object first to make sure it is available. If the state of the object is available, but your query returned one or more NULL EndDateTime entries, then let’s continue.
Now we need to update the HealthServiceOutage table and enter an EndDateTime for the results above. The query below does the following:
Gets the rows from the query above
Updates the DWLastModifiedDateTime to the current UTC Date and Time
Updates the EndDateTime to match the StartDateTime
We are modifying the DWLastModifiedTime because we want the data warehouse to recalculate the states, so the reports accurately reflect availability. We must update this column, otherwise, recalculation will not happen. In order to do it run this query after having replaced mandatory field.
After we make the change, the next time Standard Data Set Maintenance runs, it will recalculate Availability. It will make the DW look like it is behind, but just give it time to catch up and calculate the state. This could take several hours. You can run the query below periodically and verify that your row count is getting smaller.
WHERE(d.DatasetDefaultName='State data set')
Once the Standard Data Set calculations are finished, run your reports and verify they are no longer gray parts.
The number of asynchronous responses number define the number of notifications that can be send by SCOM at the exactly same time. By default, this number is set to 5. To change this value, on the management servers members of the Notifications Resource Pool, use Regedit.exe to navigate to : HKEY_LOCAL_MACHINE\Software\Microsoft\Microsoft Operations Manager\3.0\Modules
Under this key, create a new subkey called Global
Under the new Global subkey, create another subkey called Command Executer
Under the Command Executer subkey create a new DWORD value AsyncProcessLimit
Modify AsyncProcessLimit value as wanted.
Note You can set a minimum of 0x00000001 (Strongly not recommended) and a maximum of 0x00000064 (100) (again definitely not recommended)
Do not forget to restart SCOM services on each Management Servers you change registry values.