Submit a ticketCall us

Systems Monitoring for Dummies
Our new eBook will teach you the fundamentals and help you create monitors and alerts that are effective, meaningful, and actionable. Monitoring is more than a checkbox on your to-do list. This free eBook will give you practical advice to help you succeed in all aspects of monitoring – discovery, alerting, remediation, and troubleshooting. Don’t miss out on this indispensable resource for newbies, experienced IT pros, and everyone in between. Register Now.

Home > Success Center > Network Configuration Manager (NCM) > NCM configuration backups fail intermittently

NCM configuration backups fail intermittently

Created by Milton Harris, last modified by Melanie Boyd on Sep 13, 2017

Views: 56 Votes: 1 Revisions: 4

 Objectives 

 
In this document you will learn:

 

  • Which logs to examine for  NCM configuration backups fail intermittently
  • Which services are involved 
  • [Which support troubleshooting tools with they need to use, and why?]
  • What steps to take to resolve NCM configuration backup jobs failing intermittently

 

 Description 

One of the more common SolarWinds customer support calls involve NCM configuration backups failing intermittentlyThere are many factors that come into play so it is important that you understand how to address this issue in a logical manner. 

 

Customers might not always provide a lot of details but it will almost certainly be described as:

  • NCM configuration backups fail intermittently
  • Nightly Config Backup Jobs fail intermittently

 

 

 Scoping the Issue 

What to Ask

 

Here are a few questions to ask before jumping into the problem. You don't have to stop at these questions. Different scenarios might need you to probe differently, but this is a good starting point.

 

  • When was this issue first observed?
    • Try to establish what happened before the onset of the problem
    • Were there any changes made before the problem started?

 

  • Reproduce the issue
    • If you can't reproduce the issue, you can't confirm whether it's been fixed
    • If its a once-off occurrence, the most you can do is check the log files, and try to determine what occurred at that time.
    • Can it be easily reproduced or is it intermittent? Is there any pattern?
    • What are the expected results?

 

  • Narrow the focus
    • How Many Nodes are in your Nightly Config Backup Jobs?
    • Do you have session Tracing turned on?
    • Which Logs do you have turned on? 

 

  • Other Tips
    • If you have Session Tracing turned on it will try to download configs AND on top of that it will be forced to try and write a session trace for every node in which you're Downloading Configs here to this path: The session trace files can be found here: "%ALLUSERSPROFILE%\Application Data\SolarWinds\Logs\Orion\NCM\Session-Trace"
    • If you have Scheduled Jobs Logging turned on it will try to download Configs AND on top of that it will try and write Job Logs for every node in the job for which you're downloading. The log files can be found here: "${ALLUSERSPROFILE}\Application Data\SolarWinds\Logs\Orion\NCM\Logging"Configs  Data\SolarWinds\Logs\Orion\NCM\Session-Trace"
    • If you have Session Tracing and Scheduled Job logging, Inventory Monitor logging, Database update logging, RTCN logging and Security logging on top of it this can sometimes become too burdensome and cause NCM jobs to fail.
    • Supports recommendation is to turn logging and session tracing OFF unless you're troubleshooting a specific issue, for some reason some think that is how NCM gets it information but this is simply not true
       

 

Services Involved

 

Add Description Here

 

Service Name

Description

Software Package

Role

Solarwinds Job Engine JobEngineV1 JEV1

Legacy polling

15-20% polling

Solarwinds Job Enginev2 JobEngineV2 JEV2 80% -85% polling
       
       
       

Examine the Logs

Orion produces numerous log files. These are the log files that would likely yield clues for this particular issue, in order of importance:


Log File Name

Located in:      NPM Server: C:\ProgramData\Solarwinds\Logs\Orion

                          Diagnostics: SolarwindsDiagnostics.zip\LogFiles\Orion

  • Reproduce the issue first before looking at this log file
  • The newest entries are at the bottom
  •  
  •  
  • Handy search phrases -  take note of the time stamps so you can correlate them with Orion.Information.Service.log
] ERROR
This will jump to the next error in the log (you can generally ignore warnings)
 
 
 
  • Example errors:
2015-01-27 01:45:29,251 [19] ERROR SolarWinds.InformationService.Contract2.InfoServiceProxy - nodeId=22
 
2015-01-27 01:45:29,267 [19] ERROR SolarWinds.InformationService.Contract2.InfoServiceProxy - Error closing exception. System.ServiceModel.CommunicationObjectFaultedException: The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it is in the Faulted state<snipped stack trace>

Windows Event Logs

Located in:    NPM Server: Go to Start > Run > eventvwr

           Diagnostics: SolarWindsDiagnostics.zip\EventLogs

 

•    Filter Windows Events so you can focus on Application, System and SolarWinds Events

•    Correlate errors found in Windows Events with other problems found in the other log files


Using 

Description here:

 

Steps:

 

 

 Likely Root Causes 

 

Explore each of the potential root causes below to learn more about...

 

Root Cause 1 

Description -  Session Tracing could be turned on thus forcing the Database to Download Configs and then when its done to write a session trace to how it connected to the individual device

Root Cause 2

Description - Job Logging, Inventory Monitor logging, Database Updates logging, Real-Time Config Change Detection logging and Security logging could be turned on

 When to advance the case 

 

After exhausting all the possible solutions referenced in this module, you will need to seek the help of the advance team
 
Before advancing the case, make sure you have captured and documented the following information:
  • Debug diagnostics, reproducing the issue. Enable debugging for: 
    •  
    •  
  • Detailed scope of the issue
  • Everything you have done so far
  • Helpful screen captures that will help the advancement team to further troubleshoot the issue

 

 

 

Last modified

Tags

Classifications

Public