Submit a ticketCall us

Get a crash course on Network Monitoring delivered right to your inbox
This free 7-day email course provides a primer to the philosophy, theory, and fundamental concepts involved in IT monitoring. Lessons will explain not only how to perform various monitoring tasks, but why and when you should use them. Sign up now.

Home > Success Center > Network Performance Monitor (NPM) > Network Performance Monitor Getting Started Guide > Troubleshoot network issues > Use SolarWinds NPM to identify and troubleshoot an interface that has a problem

Use SolarWinds NPM to identify and troubleshoot an interface that has a problem

Created by Chris.Moyer, last modified by Chris.Moyer on Oct 03, 2016

Views: 89 Votes: 0 Revisions: 8

Before you begin:

By default, devices monitored by NPM are polled for data every nine minutes. It might take some time before all the nodes you added have data you can review.

Step 1: Determine there is a problem

In the topic Identify and troubleshoot a node that has a problem, alerts are triggered when a node goes down. Alerts can also be triggered when an interface has a problem, such as high utilization or the interface going down.

The Nodes with Problems resource provides information about the interfaces associated with each node. A square in the bottom-right corner of the node icon indicates that the node has an interface with a problem:

File:Success_Center/New_Articles/NPM-Getting-Started-CHM/040/030/interfaceDownIcon.png - In this example, a red square indicates that one or more interfaces are down.

File:Success_Center/New_Articles/NPM-Getting-Started-CHM/040/030/interfaceUnknownIcon.png - In this example, a gray square indicates that the status of one or more interfaces is unknown.

File:Success_Center/New_Articles/NPM-Getting-Started-CHM/040/030/nodesWithProblem.png

In your environment, you might not have any down interfaces. To find an interface with issues that need to be investigated, click My Dashboards > Network > Network Top 10 to open the Network Top 10 view. Review the following resources on this page.

Top 10 Interfaces by Percent Utilization

This resource shows the interface's transmit and receive utilization as a percent of total interface speed. By default, utilization rates from 70 - 90% are yellow (warning), and utilization over 90% is red (danger). These thresholds are configurable.

Any interface with high utilization deserves more investigation.

File:Success_Center/New_Articles/NPM-Getting-Started-CHM/040/030/Top10IterfacesUtilization.png

Top 10 Interfaces by Traffic

This resource shows how much actual traffic is on an interface. Usually, WAN interfaces will be on this list because of the volume of traffic they process.

File:Success_Center/New_Articles/NPM-Getting-Started-CHM/040/030/Top10InterfacesTraffic..png

Top 10 Errors & Discards Today

This resource shows:

  • Errors: A packet that was received but could not be processed because there was a problem with the packet.
  • Discards: A packet that was received without errors but was dropped, usually because interface utilization is near 100%.

File:Success_Center/New_Articles/NPM-Getting-Started-CHM/040/030/Top10ErrorsDiscards..png

Step 2: Get more details about the interface

If an interface is down (red), that generally means there is no connection:

  1. Check the parent device to ensure it is operating.
  2. Check the cable for physical connectivity problems.

Once you have found an interface with a problem (or, if all your interfaces are healthy, an interface with high utilization, errors, or discards), click the interface name in any resource. The Interface Details page opens.

File:Success_Center/New_Articles/NPM-Getting-Started-CHM/040/030/InterfaceName.png

  • Check the Percent Utilization resource for the last-polled value of transmit and receive utilization. If those values are high, you can also check the Percent Utilization – Line Chart to see the duration of the problem.

    File:Success_Center/New_Articles/NPM-Getting-Started-CHM/040/030/percentUtilizationLineChart.png

  • The Interface Downtime resource displays the interface status for the last 24 hours. If the interface status changed, you can see it in this resource. In the following example, the resource shows that the interface had one period when its status was unknown during the last 24 hours, but it is currently up.

    File:Success_Center/New_Articles/NPM-Getting-Started-CHM/040/030/interfaceDowntime.png

  • The Interface Errors & Discards resource can also indicate problems. Since this device has high discards, and high discards are generally caused by a full buffer, check the Node Details for this device and determine if the buffer is full.

    File:Success_Center/New_Articles/NPM-Getting-Started-CHM/040/030/instanceErrorsAndDiscards.png

Step 3: Get more details about the problem

The Node Details page can help you diagnose an interface problem. Click the node name at the top of the Interface Details page to open the Node Details page.

File:Success_Center/New_Articles/NPM-Getting-Started-CHM/040/030/nodeName.png

Examine the following resources on this page.

Min/Max/Average Response Time & Packet Loss

This resource shows the average load on the CPU for this node. In this case, the load spiked dramatically around 1:30 PM, which warrants further investigation.

File:Success_Center/New_Articles/NPM-Getting-Started-CHM/040/030/maxAvgResponseTimePacketLoss_490x418.png

Network Latency & Packet Loss

This resource shows the latency (response time) and packet loss for the entire node. A spike in response time occurred at the same time as the spike in the average CPU load (shown above), implying correlation between the events.

File:Success_Center/New_Articles/NPM-Getting-Started-CHM/040/030/networkLatency.png

These resources indicate an unknown increase in traffic that occurred at approximately 1:30 PM, leading to higher interface utilization, CPU load, and dropped packets. Since values are not yet critical and no alerts have been triggered, it might not be a concern, but if you wanted to continue troubleshooting, you could perform the following actions:

  • Determine if there were any configuration changes around that time. If you have Network Configuration Manager, you can use it to look up configuration changes.
  • If you are monitoring traffic (for example, with Network Traffic Analyzer), explore the cause of the traffic spike.
Last modified
12:41, 3 Oct 2016

Tags

Classifications

Public