Submit a ticketCall us

Looking to compare latest NPM features with previous versions of NPM?
The NPM new feature summary offers a comparison of new features and improvements offered with this release.

 

Home > Success Center > Network Performance Monitor (NPM) > Orion environment Health Checks

Orion environment Health Checks

Updated November 22nd, 2016

Overview

This article will show you how to perform Health Checks to confirm that your Orion environment is functioning correctly. 

Environment

​All NPM versions

Detail

Note: Before performing any of the Health Check, tests listed below please consult the Orion Minimum Requirements documentation.

 

Active Diagnostics

Active Diagnostics is a self-help tool created for Health Check tests.

This can be launched from the Orion Diagnostics application available on the Start menu.

 

Clicking on this link will launch a separate application.

 

Using the default option produces an easy to read report showing only Warning or Failed tests.

 

You can also run each test individually.

 

Each test will output results based on the test definition. The definitions are updated in the Active Diagnostics Self-Test.

 

All results can be exported in a JSON file format that can be sent to SolarWinds support. 

 

Orion SAM Template

 

 

This template assesses the status of Windows services related to SolarWinds Orion servers.

Prerequisites: WMI access to the target server.
Credentials: Windows Administrator on the target server.

 

Monitored components 

SolarWinds Orion Job Engine

This monitor returns the CPU and memory usage of the SolarWinds Orion Job Engine service. This service is used to perform recurring work. 
This service creates various Job Engine Worker processes for scalability and robustness. The job engine writes information about each job to its database.

 

SolarWinds Orion Module Engine

This monitor returns the CPU and memory usage of the SolarWinds Orion Module Engine service. 
This service is used  to talk to the database.

 

SolarWinds Orion Job Scheduler

This monitor returns the CPU and memory usage of the SolarWinds Orion Job Scheduler service. 
The Job Scheduler service dispatches work to local and/or remote job engines.

 

SolarWinds Syslog Service

This monitor returns the CPU and memory usage of the SolarWinds Syslog service. 
This service is responsible for logging events in log files.

 

SolarWinds Alerting Engine

This monitor returns the CPU and memory usage of the SolarWinds Alerting Engine service. 
This service is responsible for Advanced Alerting.

 

SolarWinds Website

This component monitor tests a web server's ability to accept incoming sessions and transmit the requested page. 
The component monitor can optionally search the delivered page for specific text strings and pass or fail the test based on that search. By default, it monitors TCP port 80.

 

SolarWinds Job Engine v2

This monitor returns the CPU and memory usage of the SolarWinds Job Engine v2 service. This service is used to perform recurring work. 
This service creates various Job Engine Worker processes for scalability and robustness. The job engine writes information about each job to its database.

 

SolarWinds Collector Data Processor

This monitor returns the CPU and memory usage of the SolarWinds Collector Data Processor service. 
This service is responsible for volume and node data synchronization between the Collector and the Standard Poller.

 

SolarWinds Collector Management Agent

This monitor returns the CPU and memory usage of the SolarWinds Collector Management Agent service. 
This service takes part in data synchronization between the Collector and the Standard Poller.

 

SolarWinds Collector Polling Controller

This monitor returns the CPU and memory usage of the SolarWinds Collector Polling Controller service. 
This service takes part in data synchronization between the Collector and the Standard Poller.

 

SolarWinds Information Service

This monitor returns the CPU and memory usage of the SolarWinds Information service. 
This service is used by websites to talk to the database. 
This service is also responsible for how the pollers talk to each other.

 

SolarWinds Information Service V3

This monitor returns the CPU and memory usage of the SolarWinds Information service V3. 
This service is used by websites to talk to the database. 
This service is also responsible for how the pollers talk to each other.

 

SolarWinds JMX Bridge

This monitor returns the CPU and memory usage of the SolarWinds JMX Bridge service. 
The JMX Bridge is only used if you are monitoring Java Application Servers such as WebSphere, WebLogic, or Apache Tomcat via JMX.

Note: By default, this monitor is disabled.

 

SolarWinds Trap Service

This monitor returns the CPU and memory usage of the SolarWinds Trap service. 
This service is responsible for catching and logging trap events.

 

File Count Monitor - JET Files

This monitor returns the number of JET files in C:\Windows\Temp which prevents new DB connections and causes polling to halt. 
This monitor should be less than 65,530.

 

MSMQ Messages in Queue

This is the total number of Message Queuing messages that currently reside in the selected queue. 
When the Data Processor receives more results into MSMQ than it is able to process and pass to the Standard Poller, MSMQ continues growing. 
The size of MSMQ should be near 0 almost all of the time. Some spikes may appear, 
but the Data Processor needs to be able to clean up the MSMQ quickly, otherwise it will not be able to handle DB blackouts or maintenance. 
(Standard Poller performance is affected by DB performance significantly.)

Note: Before using this counter, you should set the correct instance beginning with: <HOSTNAME>\private$\solarwinds\collector\processingqueue
where  <HOSTNAME> - hostname (without < >) of target server, for example: APMhost.

By default, the instance is set to: <HOSTNAME>\private$\solarwinds\collector\processingqueue\solarwinds.node.hardwarehealth.wmi

All available instances can be found by running the perfmon utility and searching for “Messages in Queue” counter in the “MSMQ Queue” category.

Note: This monitor is disabled by default

 

Perfmon DPPL Avg. Time to Process Item

This monitor returns the time needed to process one item. 
If this number is 1, it means you are able to process one item per second. 0.01 means 100 items per second. 
The returned value should be as low as possible.

 

Perfmon DPPL Waiting Items

This monitor returns items in the queue pulled from the message queue but waiting for other results to be processed. 
This should be less than 40. If this number is holding at or above 40, this may indicate issues concerning DB response time, performance issues, or many down elements.

 

MSMQ Folder Size

This monitor returns the MSMQ folder size. 
This monitor should be less than 800 MB. MSMQ maximum size is 1GB. 
If the 1GB limit is reached, polling will stop working correctly.

To Increase the MSMQ size, you should open Computer Management > Features > Messaging Queuing
From here, right-click and change MSMQ Messaging 1 GB Limit to 1.5GB. For Windows Server 2003, this is found under the Storage section.

See the Microsoft Message Queue Fills Directory with Orphaned Files article for more information.

 

Process Monitor - SWJobEngineWorker2.exe

This monitor returns the number of Job Engine worker processes and its CPU and memory usage. 
A value of 10 or lower is acceptable. If the returned value is 100 or greater, there may be problems with jobs hanging.

 

Job Engine v2: Jobs Queued

This monitor returns the number of jobs waiting for execution due to insufficient resources. 
This value should be zero at all times.

 

Job Engine v2: Jobs Lost

This monitor returns the number of lost jobs. 
This value should be zero at all times.

 

Job Engine v2: Jobs Running

This monitor returns the number of jobs currently running.

 

Job Engine v2: Worker Processes

This monitor returns the number of worker processes used. A value of 10 or lower is acceptable. 
If the returned value is 100 or greater, there may be problems with jobs hanging.

 

Job Scheduler v2: Average Execution Delay

This monitor returns the average delay, in seconds, between the time when the job is supposed to be executed and the time that it actually is executed. 
This value should be less than 100.

 

Job Scheduler v2: Results Notified Error

This monitor returns the number of errors that occurred when sending the results back. This value should be zero at all times.

 

Orion Server.apm-template (249.1 K) View  Download

Other Performance Counters

These are Performance Counters that can be loaded on monitored from the SolarWinds Orion servers

System Counters

Counter

GOOD

WARNING

CRITICAL

note

\Processor\%Processor Time <60 >60 >80  
         
\Memory\Committed Bytes       Depends on installed modules
\Memory\Available MBytes >2000 <1000 <100  
         
\LogicalDisk(*)\Avg. Disk Queue Length <2 >2 >10  
\LogicalDisk(*)\Current Disk Queue Length  <10  >10 >32  
\LogicalDisk(*)\Avg. Disk sec/Read  <0.012 >0.012 >0.020  
\LogicalDisk(*)\Avg. Disk sec/Write <0.012 >0.012 >0.020  
\LogicalDisk(*)\% Idle Time <80  <50 <10  
         
\Network Interface(*)\% Network Utilization <50 >50 >80

% Network Utilization doesn't exist as a normal performance counter,

Multiply \Network Interface(*)\Bytes Total/sec by 8 (to convert it to bits total/sec),

divide by \Network Interface(*)\Current Bandwidth, and multiply the result by 100 to get %.

 

Other useful system counters used for performance optimizations

Counter

GOOD

WARNING

CRITICAL

note

\System\Context Switches/sec < 35000 >35000 >45000  
\System\Processor Queue Length <=12 >12 >20 Microsoft counter documenttaion: the expected range of processor queue length on a system with high CPU activity is 4 to 12
\System\System Calls/sec < 300000 >300000 >500000  
\System\Threads < 3500 >3500 >=5000  
IIS Counters

Counter

GOOD

WARNING

CRITICAL

note

WebService\Current Connections       Threshold: No specific value. To be defined based on baseline measurement.
WebService\Current NonAnonymous Users       Threshold: No specific value. To be defined based onbaseline measurement.
ASP.NET Applications\Sessions Active       Threshold: No specific value. To be defined based onbaseline measurement.
         
WebService\Bytes Total/sec       Threshold: No specific value. To be defined based onbaseline measurement.
ASP.NET Applications\Requests/Sec       Threshold: No specific value. To be defined based onbaseline measurement.
ASP.NET Applications\Requests Executing       Threshold: No specific value. To be defined based onbaseline measurement.
ASP.NET Applications\Request in Application Queue       Threshold: No specific value. To be defined based onbaseline measurement.
         
ASP.NET Applications\Request Execution Time < 1000   > 10000 The number of milliseconds that it took to execute the most recent request.
         
ASP.NET Applications\Requests Total N/A N/A N/A Threshold: No specific value. Depends on load and test duration.
ASP.NET Applications\%Requests Succeeded > 99,5%   < 98%

It doesn't exist as a normal performance counter. 
Calculate it as ASP.NET Applications\Requests Succeeded divided by 
ASP.NET Applications\Requests Total 

ASP.NET Applications\%Requests Rejected < 0,5%   > 2% It doesn't exist as a normal performance counter. 
Calculate it as ASP.NET Applications\Requests Rejected divided by 
ASP.NET Applications\Requests Total 
ASP.NET Applications\%Requests Failed < 0,5%   > 2% It doesn't exist as a normal performance counter. 
Calculate it as ASP.NET Applications\Requests Faileddivided by 
ASP.NET Applications\Requests Total 
ASP.NET Applications\%Requests Timed Out < 0,5%   > 2% It doesn't exist as a normal performance counter. 
Calculate it as ASP.NET Applications\Requests Timed Out divided by 
ASP.NET Applications\Requests Total 

 

 

Last modified
10:50, 7 Mar 2017

Tags

Classifications

Public