Performance Tuning Archives - John McCormack DBA

Firefighting – When your SQL Server is in big trouble

6th January 2021 By John McCormack Leave a Comment

When your SQL Server is in big trouble

It’s so important to stay calm when your SQL Server is in big trouble. I’m talking about when blocking is through the roof, when CPU is pinned and people are yelling from all different directions. Staying calm isn’t just about a state of mind, you need to have a process that you work through, that you have practised and you know inside out.

Part 1

How bad is it?

Silhouette of a firefighter in front of a blaze

In this post, I want to describe what I call a priority 2 problem. It is serious and business impacting but the server is online. It needs immediate attention however or it could escalate to a P1.

P1 – Business critical system(s) are unavailable
P2 – Serious impairment of functionality on a critical system
P3 – Performance is slower than usual but requests are completing
P4 – Performance is generally sub optimal and should be improved

Get a colleague on comms

One thing that is guaranteed to happen is that people who are not directly involved in fixing the issue, will want updates. They have good reason to want updates and are well meaning, but the constant need to reply to emails or Teams messages will seriously impact the speed at which you are working.

Back in the good old pre-covid days of office working, this could easily be achieved by someone sitting near you and the two of you communicating verbally. With remote working being more and more common now, I recommend you have some kind of audio channel open that you can speak when you need to and silence is ok too. This could be a phone call on speaker, or a teams call. The technology isn’t the main thing here, the idea is that you can express updates vocally to someone capable of communicating with the wider organisation.

Where possible, your colleague should be technically capable of answering related questions. They should open a chat in Teams or Slack or whatever software your organisation prefers and provide regular updates there. They should answer questions, keep to the update schedule and most importantly, divert people away from you.

A pre practiced routine

Now the scenarios can be different, in this one I want to talk about how I would handle a SQL Server which appears to be functioning (barely) but is extremely slow. In other words, it is a priority 2 incident as defined above. Helpdesk supervisors are calling tech support to advise and customers are calling them as the website is unusable. On top of that, the MI team and data scientists are contacting the DBAs directly because their queries won’t run.

Have a script or checklist

In my script, I tend to use well established community stored procedures. The sort of ones that most DBAs know about and many use. If you start trying to write new queries in the middle of a slow down, that is going to cost you time. Stick with what works, what is established and what you have used before and know well.

I’ve mentioned these in previous posts but the main things I am going to run are:

sp_whoisactive – https://github.com/amachanic/sp_whoisactive
sp_blitzfirst – https://www.brentozar.com/first-aid/
sp_blitzlock – https://www.brentozar.com/first-aid/

sp_whoisactive

I always run this first because it it designed specifically for showing you current database activity, and it has saved my bacon so many times in the past. Simply running this parameter free will show you what is running on your SQL Server right now. It is ordered by query duration descending so the long runners are at the top. It can give you blocking information, information on wait types and on open transactions as well. For advanced features, you need to use some parameters.

sp_blitzfirst

If you can find what you need with sp_whoisactive, you may not even need to use this stored procedure, Where this procedure comes into its own is it tells you when certain things have recently changed or when they out of the ordinary.

e.g. sp_blitzfirst will tell you if:

The plan cache has been recently erased
You have high CPU utilization from a program other than SQL Server
How long each wait type has been waiting during the previous 5 seconds

If something stands out that you don’t understand, there will be data in the URL column that you can copy/paste into a web browser. The web page will give you an explanation of the problem.

sp_blitzlock

This proc is all about deadlocks. You might not need to run it if the first two have given you enough information to fix the issue in hand. However, if deadlocks are at the root of your problems, this will tell you which queries are involved and when. It’s also easier to read than deadlock graphs.

There are more things that I’ll run in the fullness of time but we’ll leave them for later. I only want to focus on these 3 things for now to zero in on the immediate problem.

The script

USE dba

/*
	To run this, you also need to install:
	- sp_whoisactive
	- sp_blitzfirst
	- sp_blitzlock
*/

-- Raises an error if you run the whole script in error
RAISERROR ('Dont run it all at once',20,-1) WITH LOG

-- sp_whoisactive with various parameters
EXEC sp_whoisactive @find_block_leaders = 1 --,@get_locks = 1
EXEC sp_whoisactive @sort_order = 'sql_text' -- Tells at a glance if you have a lot of the same query running. For the F5 report runner troublemakers


-- What has been hurting us in the last 5 seconds. Look for wait stats, and anything out of the ordinary, such as the plan cache has been recently erased.
EXEC dbo.sp_BlitzFirst @expertmode = 1


-- Are we experiencing deadlocks 
EXEC sp_BlitzLock

-- Deadlocks in last hour
DECLARE	@StartDateBlitz datetime = (SELECT DATEADD(HH,-1,GETDATE())), @EndDateBlitz DATETIME = (SELECT GETDATE())
EXEC sp_BlitzLock @EndDate = @EndDateBlitz, @StartDate = @StartDateBlitz
GO



/*  Some other things to consider

	Have the usual optimisation jobs run as expected. Stats/indexes etc
	If one proc has regressed badly, could it help to clear only that plan from the cache or to recompile it.
	EXEC sp_blitz -- Although not as handy as the others for real time issues. Better used as a health check
	EXEC sp_blitzcache -- More useful for helping you identify the resource hungry queries, allowing you 
	EXEC sp_readerrorlog 0,1, 'memory'
*/

The calm after the storm

I will dedicate a full post to this however for now, it is important to say that slow queries or resource intensive queries should be identified and added to a backlog for fixing. If one particular query is likely to go bad again, it should be treated as a priority to fix, in order that we don’t see the same issue escalating again.

You should also do a SQL Server Health Check in order to satisfy that you don’t have an sub optimal configuration causing you issues.

Related to: When your SQL Server is in big trouble

Locks, blocks and deadlocks in SQL Server

Zero Budget DBA – SQLBITS 2020

Your SQL Server indexes can’t always be perfect

27th November 2020 By John McCormack 1 Comment

Perfection is not the aim

Your SQL Server indexes can’t always be perfect. What I mean by that is you can’t index for every query on a busy SQL server. In this post I’m going to describe just one table from a server I have been tuning recently. It had 26 indexes, and that was not unusual for this database. This most likely happened over time as a result of developers creating an index per query using the table, as well as following some perceived best practices (with the best of intentions).

Did you say 26 indexes?

Yes, one of the heavily used tables on this database had 26 indexes. It had one clustered index on an ID column which is fairly standard. There were 25 further non clustered indexes including an index per foreign key. A shared code base runs across several customers. Some customer’s use all the functionality, some only use parts of it, but they all get the same database schema. And all of those indexes.

Current usage and how to fix

Remove unused indexes

9 of the non clustered indexes on this table were read from less than once per day but were written to up to 18,000 times per day. The servers had been online for 180 days at the time of checking and some of the index uses were in single or double figures. These needed to be dropped as they were not bringing anything to the party.

I got this info by running sp_blitzindex and comparing the index usage stats against the server uptime. (I took care to run on all production servers for all customers, including read only secondaries).

Combine similar indexes

This left 16 remaining non clustered indexes to review.

9 of which had been used several million times with a maximum of 32 Million seeks.
- 5 out of those 9 appeared to be heavily overlapping other indexes.
  - I needed to see if these can be combined into 1 or 2 indexes.
- 4 out of those 9 are fairly unique.
  - Those 4 will be retained.
7 have been used hundreds of thousands of times and don’t appear to have much overlap.

This leaves us with 4 non clustered indexes being retained, 9 being dropped and 12 to be reviewed in more detail. This is the most difficult stage of the process as I need to decide which indexes can be merged and which can be dropped. Its always best to be cautious with this part, because removig the wrong index could slow down an important process.

Of the 12 indexes that needed a more detailed look, I decided to drop 8, and create 3 new ones. A net reduction of 5 indexes for this stage. Multiple groups of indexes had the same keys but different include columns. Some of the include column ranges were very wide and included large data types such as large NVARCHARs. Replacement indexes used the same keys, in the same order, but had a more targeted range of include columns.

Perfection? No. Better? Yes.

In total, we have a net reduction of 14 non clustered indexes. It still leaves us with 11 non clustered indexes. Your SQL Server indexes can’t always be perfect but 11 is better than 25. I’d really have loved to get that count down to 5 but it is a lot better than it was. This process should be repeated after a month of the new index structure being in place with the aim of further consolidation if possible. The new indexes won’t be perfect, some queries may not be served as well by the new consolidated indexes or by having a rarely used index removed but I was trying to improve the overall health of the SQL Server. Reducing the number of writes that were hammering the storage is a step in the right direction.

How did we get so many indexes?

Index creation was all developer led so when new functionality was rolled out, indexes were created for each new process. This resulted in a lot of overlapping indexes. I don’t blame the developers, there should just have been a process for reviewing index requests, and comparing them to what we already had.

How to avoid in future

I see two key approaches to helping avoid a repeat of this death by indexing.

Don’t hoard the analysis. Sharing the results with the development team will help increase awareness of this death by indexing scenario.
Implementing an approval process for pull requests where a DBA should be in the approvers list if the change includes a new index.

Verify improvements

Index changes can affect multiple queries all using the same table so just testing one query or stored procedure is not enough to release with confidence. Where possible, use a load testing environment which mirrors production, and replay a repeatable workload. By baselining a typical workload, then making the changes and taking a new benchmark, you can measure the improvements overall and for specific database calls.

With this client, there was a load testing environment with synthetic API calls to the application that attempted to mirror production. Not everyone has this luxury when testing but capturing and replaying a workload, and measuring the differences is vital for this type of performance tuning. It can help you catch any serious regressions as well as giving you the confidence that your performance will improve.

If you don’t alreqady have a suitable load testing environment, the tool I recommend to get you going is an open source project called WorkloadTools by Gianluca Sartori.

How DBATools can help with performance tuning

20th May 2020 By John McCormack 3 Comments

How DBATools can help with performance tuning.

DBATools is well known in the SQL Server community for the ease at which it allows you to automate certain tasks but did you know that DBATools can help with performance tuning your SQL Server. As my job recently changed to have more of a performance tilt, I wanted to investigate which commands would be useful in helping me with performance tuning. It turned out there are quite a few.

Compression

Page and row compression is a hidden gem when it comes to performance tuning SQL Server. Simply compressing some large indexes can sometimes be enough on it’s own to get you over the line in terms of performance improvement. With compression, it shouldn’t be a guessing game. SQL Server has built in stored procedures to let you know whether a table or index would benefit from page,row or no compression.

[sql]

EXEC sp_estimate_data_compression_savings ‘dbo’, ‘DB1’, NULL, NULL, ‘ROW’ ;

EXEC sp_estimate_data_compression_savings ‘dbo’, ‘DB1’, NULL, NULL, ‘PAGE’ ;

[/sql]

But if you want to test a whole DB or instance at a time, DBA tools makes this a lot easier.
[Shell]
Test-DbaDbCompression -SqlInstance localhost -Database WideWorldImporters | Select-Object TableName, IndexName, IndexId, PercentScan, PercentUpdate, RowEstimatePercentOriginal, PageEstimatePercentOriginal, CompressionTypeRecommendation, SizeCurrent, SizeRequested, PercentCompression | Out-GridView
[/Shell]

Piping to Out-Gridview is optional but I love the way the output is tabulated and easy to read. I use this output to create a list of objects to be compressed. I then do some load testing, before and after the changes and compare the results. This gives me the confidence to roll out the changes to production.

Hint: Having a way of replaying a consistent load and measuring results before and after is essential when performance tuning by making wholesale changes. If you don’t have a way of doing this, I recommend looking into WorkloadTools.

Test-DBACompression example output — Output of Test-DBADBCompression

Find unwanted code hints

When I took over a new environment, there was A LOT of blocking. Part of this was down to the code which was in need of optimsation, but it was also due to a lot of hints peppered throughout the stored procedures that made blocking inevitable. I found several examples of REPEATABLE READ and SERIALIZABLE throughout the code base. Asking developers about why we needed this level of isolation, no one thought we did and said this was done by developers who had moved on.

So, what is the easiest way to find every example of REPEATABLE READ in your code base, DBATools of course.

[shell]
Find-DbaStoredProcedure -SqlInstance SQL123 -Database WideWorldImporters -Pattern ‘REPEATABLE READ’ | Select-Object * | out-gridview
[/shell]

Just like before, if you can get a replay running that simulates production, then you can measure whether your blocking and response times go down.

Find-DBAStoredProcedure example output — Output of Find-DBAStoredProcedure

Check disk speeds

Knowing that your disks are performing well gives you the confidence to go out and work on your code to seek performance improvements. But if you have slow disks, this can also be a bottleneck. Knowing you have a problem and rectifying it means you can see performance improvements before you have even looked at your indexes or stored procedures.

A common bottleneck can be tempdb, it’s always good to know you have tempdb on high performing disks. (If using Azure VMs, you can utilise the D drive for tempdb)

[shell]
Test-DBADiskSpeed -SQlInstance SQL123 -Database tempdb | Out-Gridview
[/shell]

Test-DBADiskSpeed example output — Output of Test-DBADiskSpeed

Community tools

Easily install some of the best free community tools around which will help you evaluate and monitor your SQL Server.

[shell]

# Install the First responder kit from Brent Ozar Unlimited. Includes sp_blitz, sp_blitzcache and sp_blitzIndex which all help with performance tuning. See https://www.brentozar.com/ for more info.
Install-DbaFirstResponderKit -SqlInstance SQL123 -Database dba

# Installs sp_whoisactive by Adam Machanic which is a priceless tool in terms of evaluating what is running on your instance right now. See http://whoisactive.com/ for more info.
Install-DbaWhoIsActive -SqlInstance SQL123 -Database dba

# Install the free community monitoring tool SQLWatch. See https://sqlwatch.io/ for more info.
Install-DbaSqlWatch -SqlInstance SQL123

[/shell]

External Links

A full list of commands can be reviewed on https://dbatools.io/commands/.