Debugging Managed Production Applications with WinDbg / by Matt Wrock

Yesterday our issue tracking software was hanging and the vendor was not responding to our ticket requsts (They are nine hours ahead of us). The application is a .NET application so I decided to capture a crash dump and dive in with windbg. I have a love/hate relationship with windbg. I love it because it provides a vast wealth of informationvirtually telling me EVERYTHING thats going on with my process. It has saved my behind several times. I hate it because I don't use it frequently enough to have all of the cryptic commands memorized and often have to relearn and reresearch the commands I need to use in order to solve my problem. Windbg is not for the faint of heart. There is no drag and drop here. But if you have an app bugging out on a production server and don't want to attach a debuger to it, windbg is the tool for you.

This post is an adaptation of a document I created for my team and me a few years back. I use it like a cheat sheet to help me get started quickly.

When servers start crashing and/or hanging in production, often the only recourse you have is to capture a memory dump of the ailing process and analyze it using Microsoft’s Native debugger – WinDbg. Without this tool, you may just be shooting in the dark. These techniques can not only be applied to web applications but to any application – under managed or unmanaged code.

A memory dump will allow you to see everything going on in the captured process: executing threads and how long each have been running, stack traces of all threads and even the values of parameters passed to functions. They can also be used to troubleshoot memory leaks, allowing you to see what is in the heap.

A word of caution is in order: windbg is a pain to use. At least that has been my experience. There is almost no documentation included and the commands are very unintuitive, and this is compounded by the fact that you (hopefully) rarely use it.

There are three basic steps to this process:

  1. Preparing the debugging environment on the problem server.
  2. Actually capturing the dump while your process is crashing or crashed.
  3. Analyzing the dump in windbg

 

Preparing the Debugging Environment

There are a few steps to complete to get the server ready:

  1. Install Microsoft’s Debugging toolkit. Get the latest version at http://www.microsoft.com/whdc/DevTools/Debugging/default.mspx. Note that there is a 32 bit and 64 bit version. If you are running on a 64 bit server but you have a managed app that is compiled for 32 bit, you will need to use the 32 bit version of windbg to debug.
  2. Create an environment variable for path to symbol file (.pdb files that contain information that map native instructions tp function calls). Create a system environment variable called _NT_SYMBOL_PATH with the value: C:\symbols\debugginglabs*http://msdl.microsoft.com/download/symbols;C:\symbols\debugginglabs;C:\Program Files\Microsoft.Net\FrameworkSDK\symbols;C:\windows\system32
  3. Copy sos.dll from the Microsoft.net directory to the same directory where you installed the debugging toolkit. This file provides extensions to windbg for analyzing managed code.

 

Capturing a Memory Dump

This step can be a bit tricky depending on the circumstances of your crashing behavior. There are typically 3 ways to do this:

  1. Call up a test URL to see if the app has crashed or is locking
  2. Use Task Manager to see if the CPU is pinned
  3. Use Performance Monitor and look for queueing threads. If threads are queueing, that means that all available .net worker threads are busy which usually means something is wrong.

Once you have determined that the process has crashed, bring up a command prompt and navigate to the directory where you downloaded the debugging toolkit. Next type:

adplus.vbs –hang –pid [process ID of problem process]

If there are more than one worker process running and you are not sure which one is causing problems, repeat the above command for both processes.

This command will launch windbg in a separate window to load the process information. Just let it run and it will close when it completes.

Analyzing the Dump

  1. Open windbg.exe which is inside the directory that you extracted the debugging toolkit to.
  2. Go to File/Open Crash Dump and find the dump (.DMP) file you just captured. It will be in a subfolder of the debugging toolkit directory.
  3. type .load sos.dll to load the managed code extensions.

 

You are now ready to start troubleshooting. Below are some commands I commonly use to get useful information. At the end of this document are some links to some MS white papers with more detailed information on performance debugging.

Listing all threads and how long they have been running

!runaway

Note the thread IDs of any particularly long running threads. If you have several threads that have been running for minutes, that could point to a never ending loop that is eating CPU or just a long running background thread.

Listing Managed Threads

!threads

There are several noteworthy tidbits here:

Lock Count: If this is greater than 0, it means that the thread is waiting(blocking) for another thread. For instance it might be waiting for a DB query to come back or a response from a socket. If you have a bunch of these, it could be a tip that there is a bad query. See below on how to get the call stack of an individual thread to see exactly what it is doing.

Domain: This is the address of the app domain that the thread is running in. This is very helpful if you have several web sites running in the same worker process. Once you find the problem thread(s), you can use this to see which web app is causing the problem. Keep in mind that all asp.net workerprocess have a “default” app domain used for launching new app domains (there is one per web app) and handling GC.

Determine which Web Application a thread is running in

!dumpdomain [address]

This dumps a list of assemblies loaded into the domain which should tip you off as to which web app it is running in.

Get a summary information on the Threadpool

!threadpool

This tells you haw many threads are free/in use and what the CPU utilization was at the time of the capture.

Get the stack trace of a single thread including passed parameters

~[thread id]e !clrstack –p

Get the thread ID from !threads or use “*” to get a dump of ALL threads.

Get detailed information on an object

!dumpobj [address]

This gives info on all fields in the object.

More Resources

http://msdn.microsoft.com/en-us/library/ms954594.aspx
This is an old link but has good and thorough informatioin.

http://blogs.msdn.com/tess/ This is Tess Ferrandez's blog. She has tons of great posts on this subject and also on analyzing memory leaking problems.