Submit a ticketCall us

WebinarWebinar: A checklist for planning your Network Performance Monitor (NPM) upgrade

Are you ready for your next upgrade? To help you plan smoothly, join this webcast to learn more about, SolarWinds® Orion® Installer, SolarWinds Upgrade Advisor, Upgrades Guides, Training Videos, and other resources available. We’ll share key upgrade planning considerations, lessons learned from customers with practical advice from SolarWinds Product Experts. We’ll also give practical tips to identify the estimated time needed and resources, how to prepare the business and IT staff for changes, ways to plan for required system changes, and more.

Register now.

Home > Success Center > Network Performance Monitor (NPM) > NPM Documentation > Network Performance Monitor Getting Started Guide > Get Started with SolarWinds NPM > Use SolarWinds NPM to identify and troubleshoot an interface that has a problem

Use SolarWinds NPM to identify and troubleshoot an interface that has a problem

Created by Chris.Moyer_ret, last modified by Chris.Moyer_ret on Oct 03, 2016

Views: 1,553 Votes: 0 Revisions: 8

Updated: December 4, 2018

By default, devices monitored by NPM are polled for data every nine minutes. It might take some time before all the nodes you added have data you can review.

Step 1: Determine there is a problem

In the topic Identify and troubleshoot a node that has a problem, alerts are triggered when a node goes down. Alerts can also be triggered when an interface has a problem, such as high utilization or the interface going down.

The Nodes with Problems widget provides information about the interfaces associated with each node. A square in the bottom-right corner of the node icon indicates that the node has an interface with a problem:

interfacedownicon.png - In this example, a red square indicates that one or more interfaces are down.

interfaceunknownicon.png - In this example, a gray square indicates that the status of one or more interfaces is unknown.

nodeswithproblem.png

In your environment, you might not have any down interfaces. To find an interface with issues that need to be investigated, click My Dashboards > Network > Network Top 10 to open the Network Top 10 view. Review the following widgets on this page.

Top 10 Interfaces by Percent Utilization

This widget shows the interface’s transmit and receive utilization as a percent of total interface speed. By default, utilization rates from 70 - 90% are yellow (warning), and utilization over 90% is red (danger). These thresholds are configurable.

Any interface with high utilization deserves more investigation.

top10iterfacesutilization.png

Top 10 Interfaces by Traffic

This widget shows how much actual traffic is on an interface. Usually, WAN interfaces will be on this list because of the volume of traffic they process.

top10interfacestraffic..png

Top 10 Errors & Discards Today

This widget shows:

  • Errors: A packet that was received but could not be processed because there was a problem with the packet.

  • Discards: A packet that was received without errors but was dropped, usually because interface utilization is near 100%.

    top10errorsdiscards..png

Step 2: Get more details about the interface

If an interface is down (red), that generally means there is no connection:

  1. Check the parent device to ensure it is operating.
  2. Check the cable for physical connectivity problems.

When you have found an interface with a problem (or, if all your interfaces are healthy, an interface with high utilization, errors, or discards), troubleshoot the issue:

  • Click the interface name in any widget. The Interface Details page opens.

    interfacename.png

  • Check the Percent Utilization widget for the last-polled value of transmit and receive utilization. If those values are high, you can also check the Percent Utilization – Line Chart to see the duration of the problem.

    percentutilizationlinechart.png

  • The Interface Downtime widget displays the interface status for the last 24 hours. If the interface status changed, you can see it in this widget. In the following example, the interface had one period when its status was unknown during the last 24 hours, but it is currently up.

    interfacedowntime.png

  • The Interface Errors & Discards widget can also indicate problems. Since this device has high discards, and high discards are generally caused by a full buffer, check the Node Details for this device and determine if the buffer is full.

    instanceerrorsanddiscards.png

Step 3: Get more details about the problem

The Node Details page can help you diagnose an interface problem. Click the node name at the top of the Interface Details page to open the Node Details page.

nodename.png

Examine the following widgets.

Min/Max/Average Response Time & Packet Loss

This widget shows the average load on the CPU for this node. In this case, the load spiked dramatically around 1:30 PM, which warrants further investigation.

maxavgresponsetimepacketloss.png

Network Latency & Packet Loss

This widget shows the latency (response time) and packet loss for the entire node. A spike in response time occurred at the same time as the spike in the average CPU load (shown above), implying a correlation between the events.

networklatency.png

These widgets indicate an unknown increase in traffic that occurred at approximately 1:30 PM, leading to higher interface utilization, CPU load, and dropped packets. Values are not yet critical and no alerts have been triggered, and so it might not be a concern, but if you wanted to continue troubleshooting, you could perform the following actions:

  • Determine if there were any configuration changes around that time. If you have Network Configuration Manager, you can use it to look up configuration changes.
  • If you are monitoring traffic (for example, with NetFlow Traffic Analyzer), explore the cause of the traffic spike.
Last modified

Tags

Classifications

Public