Submit a ticketCall us

Announcing NCM 7.7
With NCM 7.7, you can examine the rules that make up an access control list for a Cisco ASA device. Then you can apply filters to display only rules that meet the specified criteria, order the rules by line number or by the hit count, and much more.
See new features and improvements.

Home > Success Center > Network Performance Monitor (NPM) > Poller metrics in NPM 10.2 and later

Poller metrics in NPM 10.2 and later

Table of contents
Created by Adrian Cook, last modified by MindTouch on Jun 23, 2016

Views: 17 Votes: 0 Revisions: 6

Overview

This article provides information on a brand new polling mechanism responsible for scanning (SNMP, ICMP and WMI queries) Nodes, Volumes and Interfaces in NPM 10.2.

The old poller used a single thread (limited performance and scalability). The new poller uses multi-threaded (core) environments. The new poller is highly scalable. 

Environment

NPM version 10.2 or later

Detail

Since the new poller is highly scalable, this article provides a few tips about how to measure and control the poller’s overall performance.

Collector - New poller represents generic term of new polling mechanism and you can understand that as set of services and components where Collector is the controller of the general polling activity and maintains all polling schedules and also it stores poll results into database

For the execution of polling (SNMP, ICMP, WMI request) Collector communicate with Job Engine which is service that runs polling jobs and returns the results back to the Collector.

 

What are the new metrics I can use to monitor the new poller?

When we introduced the new poller we also added new important metrics that you can monitor in main Orion web console and which can give you control of overall polling performance.

3 New metrics : 

Click Settings >  Polling Engines  to see the screen below
 

 

  • Polling Completion - Represents percentage delay in the primary polling mechanism (it represents all product polling - NPM, APM, UDT, etc.). This metric is computed from delay of every single poll job and it means that polling jobs are delayed but they are NOT discarded in case that value is lower than 100.
    For example, if the polling completion is 80%, it means that polling is in average 20 seconds behind. 1% represents 1 second of delay in the polling schedule in 10.2 release (this ratio can be changed in the future). Polling completion changes frequently since it reflects the average of the last 100 polling jobs. Polling completion is mostly impacted by CPU and memory resource. If the polling completion is significantly lower than 100% you should check your CPU and memory utilization.
     
  • Total Job Weight (Total Job Weight for NPM Jobs) - Represents the complexity of the actual polling plan. It serves as the base value for getting the polling rate. It is combination of all elements managed by Orion and their polling frequencies. 

    Total Job Weight is the sum of the “True Job Weight” for each job in the Job Engine. For getting the "True Job Weight" we use following formulas where every type of polling job has been predefined "weight" and polling frequency.
    Assuming a job has a weight of 100 and polling interval of 10 minutes:
    10 minutes (interval) / 1 minute(throttle ratio unit of time) = 10
    100 (job weight) / 10 (from above, interval/throttle ratio unit of time) = 10 (True Job Weight)

    Apply this for each job defined in the throttle group and summarize the true weights of each job.
    For example, we have 
    300 jobs with the same job weight and polling interval:

    10 (True weight for Job 1) + ... + 10 (True weight for Job 300) = 3000 (Total Job Weight)

    Currently, only NPM jobs have a weight larger than 1. For instance APM poll jobs won’t have a really big impact at the scale factor until there will be a lot of them. Right now, this is just a number that tells us how much “weight” is going through the system per unit time. We don’t have a number that would represent the high end of what can be run per unit time. 
     

  • Polling Rate - Is the current utilization of your Nodes, Volumes and Interfaces polling capacity is. Any value below 100 is acceptable. If it exceeds 85% then you are approaching the maximum amount of polling your server can handle. A notification banner in your Orion web console (see below) will be displayed. If the polling load is more than what the new poller can handle, for example, more than 100%, the polling intervals will automatically increase to handle the higher load. That means even if the CPU is not fully used, the new poller will increase polling frequencies if you reach the polling rate limit. Polling rate uses Job Total Weight as the base value. For example, on NPM polling jobs with total weight 3000, then we use following formula for getting polling rate value:

    (3000 (Total Job Weight)/2600 (Maximum polling load for NPM jobs)) x 100 = 115% (polling rate).

    In the example, the value exceeded 100% then we apply throttling which means we multiply frequencies of all NPM jobs by 1.15.

 

The new poller is multi-thread capable. A throttling mechanism prevents it from consuming all available resources for Nodes/Volumes/Interfaces polling only. The throttling mechanism allows your system to have enough resources for other applications like SolarWinds APM or UDT, as well as Orion Web Console performance. 
 

 

 

Advanced monitoring of new poller performance

The best way to monitor the performance of polling, results processing and storing and storing results to DB are performance counters.

The following is the complete list of available counters related to the new poller:

 

 

 

To double check your performance status, monitor the following closely:

  • Date Processor Pipe Line (DPPL) waiting items – This counter should not be growing constantly in time. Ideally, it should start from zero in between polling intervals. If this value keeps growing, your polling results are not being processed in the expected time and you can see gaps in charts or in poll reports. This is usually caused by slow hardware.
     
  • DPPL Avg. Time to Process item – Reflect the time of writing polling results to DB. The optimal value should be less than 0.500(ms) otherwise you will experience noticeable delays between result processing and storage to database.

 

  • Scale Factor: Orion.Standard.Polling - Represents scale factor/polling rate I mentioned above. It can tell you if the system is using throttling or not and what the current utilization is.

 

  • Messages in Queue – If this value is persistently growing then it means you are not able to process all poll results on time. This is usually because of slow hardware. But database connectivity and performance play a key role in the ability for the collector to store results and can result in a queue backup.
     

If you run perfmon from the Windows Start menu command line, and paste the attached counter list into the performance monitor window on your Orion server, you will see all the counters related to polling performance:

View:/cfs-file.ashx/__key/CommunityServer.Components.UserFiles/00.00.03.32.84/countersProperties.txt

 

 

Last modified
23:21, 22 Jun 2016

Tags

Classifications

Public