I encountered the following error this week:
Event Description: SQL Server failed with error code 0xc0000000 to spawn a thread to process a new login or connection. Check the SQL Server error log and the Windows event logs for information about possible related problems.
This SQL Server hosts the databases for a few utility apps and has always ticked along without any concerns. However, more and more databases had been added over time and most recently Altiris was added and this is when the problems began.
Searching online, I came across this question and answer in which Jonathan Kehayias suggested the problem could be caused by threadpool starvation. It doesn’t seem to be a common problem so I had to look into to it to find out more.
In his first answer, Jonathan suggested that scaling out or splitting the workload across more SQL servers would work better than scaling up because the max number of worker threads increase would not be sufficient by scaling up. I looked into the server but realised it was had a very low spec of only 2 CPUS and 4GB memory to support 20 databases of varying sizes so in this case, maybe scaling up would help. Had I read on, I would have seen that Jonathan didn’t expect the guy’s server to have such low spec (1041 databases 1CPU 2GB RAM) and he also advised scaling up.
Measuring the threadpool waits
Querying the DMV sys.dm_os_wait_stats returns a count of waiting tasks since the last restart. I ran the query below and found that this system had experienced 18,000 threadpool waits per day since its last restart. 300,000 in total.
where wait_type = ‘THREADPOOL’
As the stats are cumulative, I wanted to know how often the threadpool waits were happening right now. I set up a sql agent job to collect the waiting_tasks_count from sys.dm_os_wait_stats along with the date of collection and scheduled this to run every minute. I found that I was receiving threadpool waits at a rate of 822 per hour, roughly 14 per minute.
How to fix
Fortunately for me, this box was a virtual machine (VM) and as such, adding CPU and memory was a fairly seamless process. I doubled the CPUs and increased memory to 16GB to give SQL Server a fighting chance. I could go further if needed.
Once this was added, I continued to monitor the count of threadpool waits every minute and found they had reduced from 822 per hour to <1 per hour.
To summarise, although I’m sure it’s not always the answer, I found that beefing up a very low spec server had a dramatic improvement on threadpool waits. I should add this was a 64bit SQL Server running 2008 R2.
Whilst deciding what to do and getting more information on threadpool starvation, I read a number of other resources which are listed below.
As always, your comments and suggestions for improvement are welcome.