We’ve had a few performance issues lately on the website I’m working on – while digging through code we’ve discovered a lot of technical debt which we’ve started paying more attention to. Also we discovered that the IIS on our farm servers went down on a daily basis – approximately one or two times per day the IIS on each server would recycle the application pool for reasons unknown to us. There was an entry in the Eventlog telling us that the IIS had deadlocked and hence a reset was issued. Surveillance didn't work as expected so we never got anything but random users once or twice a month complaining that "The site is slow sometimes"...
  We developed quite a lot of theories along the way but due to gut feeling and what we could see from log files etc. we suspected either thread pool starvation because of the way we made some synchronous long-running webservive calls to i.e. clear the server’s CMS cache or an infinite loop spinning up eating all available ressources before the IIS decided that it needed to recycle the application pool to come back to life. We had little else apart from a Performance Monitor telling us a few minutes in advance that a server was going down due to a massive amount of SQL User connections suddenly craving all ressources.
  Enter WinDbg – the coolest and geekiest debugging tool I have ever worked with. Until last week I had no knowledge of debugging using WinDbg apart from reading a few blogposts from time to time written by people who have made debugging their niche in the development community. I decided to take a dive into it because we needed some facts on the table instead of just creating theories. First of all I knew we had to collect some sort of dump of the process when it was in an unhealthy condition so se started to collect memory dumps using ADPlus and after a few hours of waiting the IIS on one server went down and generated a large 500+ MB dump named [something]_Process_Shutdown_[timestamps-something].dmp. I spent the next three days deep inside the dump, scratching my head and reading lots and lots of blogposts found on Google, peeking and poking and trying out commands which didn’t output anything useful. I wrote it off at first because I suspected my own missing competence for the lack of finding information until I came across a blogpost writing about false positive dumps… Which was exactly what I had. A false positive means that the dump was generated when the process was in a different state than expected – something you can only find out by debugging the content of the dump. What I had wasn’t a dump from a deadlocked process, oh no – I had a dump from the process right after the IIS had decided to recycle the application pool so every thread except the Finalizer had no ID which means it had been shut down nicely and was basicly just waiting for the garbage collector to put and end to it's misery. That was also why there were no callstacks on the threads, nothing useful in the heap… Heureka!!! Nothing but a lot of useless bytes in a big file, basicly.
  A lot wiser but still with no data I started looking for a way to collect a dump when the IIS was in the deadlocked state. I managed to locate an MSDN-article which described how to set the metadata property OrphanWorkerProcess on the IIS to “true” so instead of the IIS automaticly recycling the application pool I could kick off a script invoking a WinDbg through the commandline which could write a dump of the process to the disc for later analysis. It took a while but after a bit of trial and error I managed to get the script working. The script worked fine apart from the IIS not calling filename.cmd with a parameter whenever the worker process was orphaned so whenever the IIS executed the script the script invoking the debugger failed because there was no process ID named %1… It took only another two days to figure out what was going on, thank you very much Microsoft. MSDN is an invaluable ressource for me but I didn't have anything nice to say about the author of the article for about 10-12 minutes   :o)
  As of tonight - art break - I managed to get a positive-positive dump and it took only 15 minutes to figure out exactly what was going on. The last week’s trial and error, peeking and poking did prove to be a good investment – here’s what I did:
  I downloaded the dump generated by WinDbg to my local machine and fired up WinDbg on my local machine. I had earlier downloaded the .NET Framework version from the farm server because even though you have version v2.0.50727 installed on your local machine there is a chance that it isn’t the exact same revision as the one you have installed on your server. In my case my local workstation’s copy of aspnet_isapi.dll is v2.0.50727.4016 while the server has 2.0.50727.3053. I figure it has something to do with the fact that I’m running Vista while the farm server is a Windows 2003. It hasn’t got any effect in your daily life but it is essential while debugging a crash dump that you have the *exact* same assemblies and symbol files as the machine where the dump was generated otherwise you might not be able to extract information out of the dump or – even worse – you might get corrupted or incorrect data which could very easy lead you in a completely wrong direction. So I zipped and downloaded the .NET Framework from the farmserver to my local machine and used the command “.exepath+ [path]” to search for matching assemblies in [path] if no matches in the default locations were found.
  I loaded the dump into WinDbg using and thought back on our initial suspects. Our first suspicion was the infinite loop because it was the strongest theory we had. Debugging threading issues such as two threads deadlocking eachother would also be harder while an infinite loop would without a doubt reveal itself in a very long stacktrace on one of the active worker threads - given that I didn't have a false positive crash dump that is(!). So I used !threads to get a list of threads – here’s the output:
  0:028> !threads 
ThreadCount: 52 
UnstartedThread: 0 
BackgroundThread: 39 
PendingThread: 0 
DeadThread: 6 
Hosted Runtime: no 
                                      PreEmptive   GC Alloc           Lock 
       ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception 
  13    1  52c 000ebc28   1808220 Enabled  00000000:00000000 000db760     0 MTA (Threadpool Worker) 
  15    2 1008 000f7ce0      b220 Enabled  00000000:00000000 000db760     0 MTA (Finalizer) 
  16    3 164c 00110208    80a220 Enabled  00000000:00000000 000db760     0 MTA (Threadpool Completion Port) 
  17    4 11c0 00113840      1220 Enabled  00000000:00000000 000db760     0 Ukn 
  11    5  314 0012f818   880a220 Enabled  00000000:00000000 000db760     0 MTA (Threadpool Completion Port) 
  18    6 12a0 0015ff88   180b220 Enabled  00000000:00000000 000db760     0 MTA (Threadpool Worker) 
  19    7 1760 0015a5c8   200b020 Enabled  00000000:00000000 001147a0     0 MTA 
  20    8 11fc 00164c40   200b020 Enabled  00000000:00000000 001147a0     0 MTA 
  21    9  f80 00158dc8   200b020 Enabled  00000000:00000000 001147a0     0 MTA 
  22    a 14dc 001597a8   200b020 Enabled  00000000:00000000 001147a0     0 MTA 
  23    b  170 0012a638   200b020 Enabled  00000000:00000000 001147a0     0 MTA 
  24    c 152c 00170c58   200b020 Enabled  00000000:00000000 001147a0     0 MTA 
  25    d 1668 06b33c10   200b020 Enabled  00000000:00000000 001147a0     0 MTA 
  27    e  e70 06b6fb40   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  28    f  5e4 06b5e420   180b220 Enabled  00000000:00000000 001147a0     1 MTA (Threadpool Worker) System.Data.SqlClient.SqlException (1ee19b2c) (nested exceptions) 
  29   10 1470 06b6a650   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  30   11  fa4 06b7dbe8   180b220 Enabled  00000000:00000000 000db760     0 MTA (Threadpool Worker) 
  10   12 148c 06b80e60       220 Enabled  00000000:00000000 000db760     0 Ukn 
   7   14 1640 06b8e1b0       220 Enabled  00000000:00000000 000db760     0 Ukn 
  37   16  fd4 08f340d8   180b220 Enabled  00000000:00000000 000db760     0 MTA (Threadpool Worker) 
  39   18  8bc 09097448   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  40   17 104c 090532b8   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  41   19 1238 090967e0   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  42   1a 13fc 090a8bd8   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  43   1b 1408 09055108   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  44   1c 1580 090559c0   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  45   1d  184 09056278   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  46   1e 13dc 09056b30   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  47   1f 12e8 090573e8   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  48   20 1458 06c04298   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  49   21  93c 06c04878   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  50   22  d5c 06c052a8   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  51   23 15ec 06c05cd8   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  52   24  29c 06c06738   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  53   25 1098 06c07198   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  54   26 13cc 06c07bf8   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  55   27 12f0 06c08658   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  56   28  ce4 06c090b8   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  57   29  e60 06c09b00   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
  58   2a  d98 06c0a4e0   200b220 Enabled  00000000:00000000 001147a0     0 MTA 
   6   15 1060 090a6980       220 Enabled  00000000:00000000 000db760     0 Ukn 
   4   2b  590 090ddae8       220 Enabled  00000000:00000000 000db760     0 Ukn 
   3   13  cd4 1712e4f8       220 Enabled  00000000:00000000 000db760     0 Ukn 
XXXX   2c    0 17133d50   1801820 Enabled  00000000:00000000 000db760     0 Ukn (Threadpool Worker) 
XXXX   2d    0 17104970   1801820 Enabled  00000000:00000000 000db760     0 Ukn (Threadpool Worker) 
XXXX   2e    0 090ac858   1801820 Enabled  00000000:00000000 000db760     0 Ukn (Threadpool Worker) 
  69   2f  99c 090ad248   180b220 Enabled  00000000:00000000 000db760     0 MTA (Threadpool Worker) 
XXXX   30    0 090b5dd0   1801820 Enabled  00000000:00000000 000db760     0 Ukn (Threadpool Worker) 
  70   31  fcc 090b70e8   180b220 Enabled  00000000:00000000 000db760     0 MTA (Threadpool Worker) 
XXXX   32    0 090b83e0   1801820 Enabled  00000000:00000000 000db760     0 Ukn (Threadpool Worker) 
XXXX   33    0 090bb380   1801820 Enabled  00000000:00000000 000db760     0 Ukn (Threadpool Worker) 
  75   35 1134 1718fb70   200b220 Enabled  00000000:00000000 001147a0     1 MTA
There was an immediate suspect – thread #28 appeared to have a nested SqlException in it… So I switched to the thread by executing the command “~28s” and now I could get a callstack on the thread by using “!clrstack” with the following output:
…074fe5d4 0fc5c6f5 dgiCore.Log.ExceptionLogPublisher.Publish(System.Exception, System.Collections.Specialized.NameValueCollection, System.Collections.Specialized.NameValueCollection) 
[snip] 
074fe920 0fc5c6f5 dgiCore.Log.ExceptionLogPublisher.Publish(System.Exception, System.Collections.Specialized.NameValueCollection, System.Collections.Specialized.NameValueCollection) 
[snip] 
074fec6c 0fc5c6f5 dgiCore.Log.ExceptionLogPublisher.Publish(System.Exception, System.Collections.Specialized.NameValueCollection, System.Collections.Specialized.NameValueCollection) 
[snip] 
074fee20 0f97d9e2 DGI.Web.layouts.DGI.Loginbox.btnLogin_Click(System.Object, System.EventArgs) 
[snip] 
074ff09c 01e1adf5 ASP.layouts_main_aspx.ProcessRequest(System.Web.HttpContext) 
[snip] 
074ff180 66083e1c System.Web.HttpRuntime.ProcessRequestInternal(System.Web.HttpWorkerRequest) 
074ff1b4 66083ac3 System.Web.HttpRuntime.ProcessRequestNoDemand(System.Web.HttpWorkerRequest) 
074ff1c4 66082c5c System.Web.Hosting.ISAPIRuntime.ProcessRequest(IntPtr, Int32) 
074ff3d8 79f68cde [ContextTransitionFrame: 074ff3d8] 
074ff40c 79f68cde [GCFrame: 074ff40c] 
074ff568 79f68cde [ComMethodFrame: 074ff568] 
I was quite surprised to find out that a click on the Submit-button on our login-page proved to be the cause of an infinite loop… What parameters was the methods being called with? Enter the command “!clrstack –p” and somewhere along the massive output of nested exceptions I found the call invoked by the user clicking the “Login” button:
074fed68 0d6a755b dgiCore.Security.Authenticate(System.String, System.String, System.String, dgiCore.Log.Exceptions.Fejl ByRef) 
    PARAMETERS: 
        username =  
        password =  
        applicationName =  
        fejl = 0x074fee24
Empty username and password? Normally there would be a reference to an address in the memory… Just for the fun of it I tried to issue a login from the website under suspicion with an empty username and password and what do you know? The website hung until the IIS decided to recycle the application pool and come back to life with a new process ID... If you can imagine my surprise sitting late in the evening and just watching every little piece of the puzzle fall into place watching the IIS on the server choking harder and harder before it's inevitable death and immediate resurrection. It was in that exact splitsecond I fell in deep, unconditional love with WinDbg.
I won’t go into detail about the error itself and how it was fixed (I won’t say too much but part of it might include a bit of Javascript form validation) but I have had a major success using WinDbg (and Google) to track down and fix an error which by default only revealed itself through an entry in the server’s Event log. WinDbg is indeed a very useful tool in a developer’s toolbox and now I’ve got some experience with it I can only imagine how many hours I’ve wasted earlier on trying to come up with grand unified theories about the behaviour of a piece of code where I would now just attach a debugger / create a dump and check for myself what’s actually going on. It took me four to five days effectively to get to know the mindset and commands neccessary to make it useful but boy... They will come back a hundred times over the next three decades of programming. Try it out for yourself  :o)