How to properly and safely collect debugs on an IOS router


The Cisco Live archives contain a large number of sessions that focus on troubleshooting various Cisco products and technologies. The following sessions supplement the topics that are covered in the course:


How to properly and safely collect debugs on an IOS router
Prepared By Steve Holl, CCIE#22739
Purpose:
It is quite often that someone is concerned about running debugs on a production gateway, thinking that it may cause performance impact. The purpose of this document is to de-mystify and clarify these conceptions.
If you don't want to understand the concepts and want to just know what  commands to enter, just use the'Recommended Configuration' section  below.
Is running debugs safe to do on production routers?
Debugs can be run safely in almost all environments where voice runs on an IOS router.  That being said, due diligence is recommended when turning debugs on in production, so keep an eye on 'show processor cpu history' when enabling debugs one-at-a-time to ensure there is minimal impact.
To prove the point that it is safe to run IOS debugs in production with this recommended configuration, I am able to run 'debug all' (yes, that's every debug IOS is capable of running) in my lab on a CME with 300 registered phones and only bring the CPU impact up by 40%, when default rate limiting and queuing are enabled.  I'd never recommend enabling 'debug all' in any real-life circumstance since the majority of the messages will be rate limited and dropped before being logged, but it demonstrates how effective disabling the console/monitor and rate/queue limiting allow for verbose debugs to run stable on a busy router.  Typically only 10-20% CPU overhead is needed for even the most verbose debugs.
I don't think I need this; I debug to a syslog server.
In my experience, 80% of the time I receive debugs from a syslog server, they contain dropped messages and render an accurate analysis impossible; one can observe this behavior when sequence numbers are enabled on the IOS side.  This is because syslog is UDP by default, and the messages usually end up being rate-limited heavily.  There are ways to debug to the syslog more reliably (use TCP for syslog, or Reliable Delivery and Filtering via BEEP) but those concepts are outside the scope of this document.
Why can't I just log to the monitor with 'terminal monitor'?
We want to debug to the log to prevent any debug messages which are sent in bursts from being dropped.  For example, an H.245 debug may have 40-50 lines of debugs all arriving at the same millisecond, and is usually too fast for the monitor to print out and large chuncks of the debug output will end up missing from the screen output.  By logging to the buffer, we eliminate scenarios where the terminal monitor would have dropped messges.
Recommended Configuration:
Router(config)# service sequence-numbers
Router(config)# service timestamps debug datetime localtime msec
Router(config)# logging buffered 10000000 debug
Router(config)# no logging console
Router(config)# no logging monitor
Router(config)# default logging rate-limit
Router(config)# default logging queue-limit
Router(config)# voice iec syslog
<Enable debugs, then wait for issue to occur.>
...
<Enable session capture to txt file in terminal program.>
Router# terminal length 0
Router# show logging
What do these commands do?
Now we will run down through each of these commands and explain how they behave, since scenarios may dictate some deviation from this template.
service timestamps debug datetime localtime msec- Ensures that local router time is written to all debugs, with millisecond accuracy.  This is useful for finding calls based on time.  Generally speaking, millisecond times allow for you to group debug lines into logical related events when two lines occur within the same millisecond.
logging buffered 10000000 debug- Tells the router to send debugs to its local buffer log in system memory.  The buffer size is set in bytes, and is 10 MB here.  The size of the buffer which you may need depends upon call volume, duration of time the buffer needs to store, system memory still available (leverage 'show memory statistic history' and 'show memory summary' for this).
no logging console- By default, the router sends debugs to the console.  In IOS, the console has the highest priority out of any process.  It also runs at very slow speeds (commonly 9600bps).  Due to this, if debugs are sent to the router faster than the console speed, it can starve console input, and/or cause the CPU to go to 100%.
To alleviate this behavior, when running any debugs in IOS, it is imperative that sending debugs to the console is disabled by entering this command.
no logging monitor- This command prevents the router from sending debugs in real-time to the router's VTY (telnet/SSH) session.  Since we will be pulling debugs reactively, we don't want anything to scroll in real-time.  Also, the terminal monitor has a habit of dropping messages if they arrive in bursts, like most voice debugs do.
default logging rate-limit- By default, the router does rate limit messages.  This is usually recommended to be left on to ensure router stability.  If a TAC engineer suspects that the router is dropping debugs before they make it to the router's logging buffer, they may ask this to be increased to a larger value or disabled.  Note that changing this in environments with high traffic volume may cause CPU instability, since it will ensure every debug message makes it to the logging buffer.
default logging queue-limit- By default, the router does queue messages, as well.  There is a finite amount of memory the router will store in queue while waiting to be written to the logging buffer.  This is usually recommended to be left on to ensure router stability.  If a TAC engineer suspects that the router is dropping debugs before they make it to the router's logging buffer, they may ask this to be increased to a larger value or disabled.  Note that changing this in busy environments may cause CPU instability for the same reasons as mentioned previously.
service sequence-numbers- This command writes the sequence number of the debug in the line.  This is useful (essentially required) when sending to a syslog server,  to identify if any debug messages to the syslog server have been  dropped in the network.  The sequence number will be the first item in  the debug, before the timestamp and actual message.  Note that this is  different from the timestamp/sequence number may write on the syslog log  files, if applicable.

001033:  *Apr 27 14:29:25.867: %IPPHONE-6-REG_ALARM: NAME=SEP000A10000075  Load=P00308000500 Parms=Status/IPaddr  Last=CM-closed-TCP
Note that sequence numbers are written after
rate-limiting, so they won't be useful to identify if IOS is dropping debugs before writing them to the buffer log.
voice iec syslog- This command prints out extra information in scenarios where the router is the origin of a disconnect.  This is specific to voice, and is useful to TAC engineers.
%VOICE_IEC-3-GW: C SCRIPTS: Internal Error (Incompatible protocols): IEC=1.1.47.11.23.0 on callID 31102

The ouput can be decoded with:
Router# show voice iec description 1.1.47.11.23.0
    IEC Version: 1
    Entity: 1 (Gateway)
    Category: 47 (no resource (47))
    Subsystem: 11 (C SCRIPTS)
    Error: 23 (Incompatible protocols)
    Diagnostic Code: 0
How do I run these debugs?
Once you have the debugs setup, enable the relevant debugs which have been requested to you by the TAC engineer, CSC member, or via the Multiservice Voice Debug Lookup tool.
What debugs should I be running?
While we are talking about how to collect debugs, you may wonder what debugs you should collect to troubleshoot your call issues.  We have a very useful tool to assist you with this, called the Multiservice Voice Debug Lookup.
Personally, here is what I like to collect as a starting point for the common call failures across IOS gateways:
H.323
debug voip ccapi inout
debug h225 asn1
debug h245 asn1
debug cch323 all
debug ip tcp transaction
SIP
debug voip ccapi inout
debug ccsip messages
debug voip rtp session named-event
MGCP
debug voip ccapi inout
debug mgcp packet
debug ip tcp transaction
<Be sure to enable appropriate POTS debugs, too.>
ISDN
debug voip ccapi inout
debug isdn q931
Analog or Non-ISDN POTS
debug voip ccapi inout
debug vpm signal
How do I collect the debugs?
Once you have the buffer configured and debugs enabled, it will write all output to the buffer.  The buffer is a rolling buffer, so when you reach the limit of the defined buffer size, then the oldest information at the top of the buffer is removed to make way for the newest information to be added at the bottom of the log.
After the issue occurs, issue 'terminal length 0' to prevent having to hit <enter> or <space> every page.  Then, issue 'show logging' and the current buffer content will dump to the screen.  Telnet is preferred over console connections due to faster transmission capability.  Ideally, have your terminal application that runs on your desktop configured to log all of this session's terminal output to a local .txt file.  My favorite applications for this are PuTTy or SecureCRT.  Mac's Terminal application handles large buffers very well, and select all/copy/paste into notepad after the log is dumped to the screen work best for me there.
How do I disable debugs?
After debugs have been collected for an issue, you can disable them all with:
Router# undebug all
This isn't necessary, but if you want to restore synchronous logging to the console and VTY sessions, enable them again with:
Router(config)# logging console
Router(config)# logging monitor
What do I do if the router is too busy, and I can't pull debugs for an issue before the log overwrites itself?
There are some advanced techniques we have in TAC by using TCL and EEM to leverage this issue, if necessary.  Consult TAC if you feel this step is necessary.