SQL Performance Archives - John McCormack DBA

Perfection is not the aim

Your SQL Server indexes can’t always be perfect. What I mean by that is you can’t index for every query on a busy SQL server. In this post I’m going to describe just one table from a server I have been tuning recently. It had 26 indexes, and that was not unusual for this database. This most likely happened over time as a result of developers creating an index per query using the table, as well as following some perceived best practices (with the best of intentions).

Did you say 26 indexes?

Yes, one of the heavily used tables on this database had 26 indexes. It had one clustered index on an ID column which is fairly standard. There were 25 further non clustered indexes including an index per foreign key. A shared code base runs across several customers. Some customer’s use all the functionality, some only use parts of it, but they all get the same database schema. And all of those indexes.

Current usage and how to fix

Remove unused indexes

9 of the non clustered indexes on this table were read from less than once per day but were written to up to 18,000 times per day. The servers had been online for 180 days at the time of checking and some of the index uses were in single or double figures. These needed to be dropped as they were not bringing anything to the party.

I got this info by running sp_blitzindex and comparing the index usage stats against the server uptime. (I took care to run on all production servers for all customers, including read only secondaries).

Combine similar indexes

This left 16 remaining non clustered indexes to review.

9 of which had been used several million times with a maximum of 32 Million seeks.
- 5 out of those 9 appeared to be heavily overlapping other indexes.
  - I needed to see if these can be combined into 1 or 2 indexes.
- 4 out of those 9 are fairly unique.
  - Those 4 will be retained.
7 have been used hundreds of thousands of times and don’t appear to have much overlap.

This leaves us with 4 non clustered indexes being retained, 9 being dropped and 12 to be reviewed in more detail. This is the most difficult stage of the process as I need to decide which indexes can be merged and which can be dropped. Its always best to be cautious with this part, because removig the wrong index could slow down an important process.

Of the 12 indexes that needed a more detailed look, I decided to drop 8, and create 3 new ones. A net reduction of 5 indexes for this stage. Multiple groups of indexes had the same keys but different include columns. Some of the include column ranges were very wide and included large data types such as large NVARCHARs. Replacement indexes used the same keys, in the same order, but had a more targeted range of include columns.

Perfection? No. Better? Yes.

In total, we have a net reduction of 14 non clustered indexes. It still leaves us with 11 non clustered indexes. Your SQL Server indexes can’t always be perfect but 11 is better than 25. I’d really have loved to get that count down to 5 but it is a lot better than it was. This process should be repeated after a month of the new index structure being in place with the aim of further consolidation if possible. The new indexes won’t be perfect, some queries may not be served as well by the new consolidated indexes or by having a rarely used index removed but I was trying to improve the overall health of the SQL Server. Reducing the number of writes that were hammering the storage is a step in the right direction.

How did we get so many indexes?

Index creation was all developer led so when new functionality was rolled out, indexes were created for each new process. This resulted in a lot of overlapping indexes. I don’t blame the developers, there should just have been a process for reviewing index requests, and comparing them to what we already had.

How to avoid in future

I see two key approaches to helping avoid a repeat of this death by indexing.

Don’t hoard the analysis. Sharing the results with the development team will help increase awareness of this death by indexing scenario.
Implementing an approval process for pull requests where a DBA should be in the approvers list if the change includes a new index.

Verify improvements

Index changes can affect multiple queries all using the same table so just testing one query or stored procedure is not enough to release with confidence. Where possible, use a load testing environment which mirrors production, and replay a repeatable workload. By baselining a typical workload, then making the changes and taking a new benchmark, you can measure the improvements overall and for specific database calls.

With this client, there was a load testing environment with synthetic API calls to the application that attempted to mirror production. Not everyone has this luxury when testing but capturing and replaying a workload, and measuring the differences is vital for this type of performance tuning. It can help you catch any serious regressions as well as giving you the confidence that your performance will improve.

If you don’t alreqady have a suitable load testing environment, the tool I recommend to get you going is an open source project called WorkloadTools by Gianluca Sartori.

More handy links

A successful performance tuning project

How DBATools can help with performance tuning

Performance tuning project

I’m coming to the end of what has been a successful performance tuning project for SQL Server. I wanted to share some of the things that made it a success.

Corporate buy in

The company had a goal to improve the page load times of a number of key pages within our application. This was to improve the user experience for our customers. They acknowledged that the database code, both indexes and stored procedures needed optimsation but so too did aspects of the application code. It was good to see this acknowledged as I’ve been in many situations where the database takes all the blame.

The company approved a considerable amount of resource in terms of personnel to work solely on this stability and optimisation project. It included senior developers, testers and a project manager. I was brought in to look at the database performance. Whilst some business as usual (BAU) priorities did come in from time to time, a large core of the team was protected from this and allowed to get on with the work of making the system go faster, thus improving the customer experience.

Daily standups

We held daily standups where we covered what we had done since the last meeting, what we were working on and if anything was blocking our progress. These were kept short so as to not get in the way of the development work, but allowed everyone an overview of what the other team members were working on. Often, as a result of these, side conversations spun up and team members helped out others who were looking for a bit of assistance. (Or simply to bounce ideas around)

Collaboration

The team were willing to help each other. When Pull Requests (PRs) were submitted, these were swiftly approved where there were no objections, or challenged in a positive way which helped get the best overall result. When the API calls were showing as slow, but nothing was obvious on the SQL server, heads were put together to use the tools at our disposal to get to the root cause. This often included Azure App Insights which I had not previously used, and this helped us get the end to end transaction details. We could pull out the SQL for any areas which were slow and work on making it perform better.

Measuring improvements

The Azure Instance class for the SQL Server had previously been doubled so there was no appetite to scale it up again. The hope was that we may eventually be able to scale back down after a period of stability.

The system previously had issues with blocking, high CPU and slow durations so I wanted to reduce page reads, CPU and duration for all of the SQL calls I was working on. I wouldn’t consider a PR if at least 2 of these metrics were not improved. My main focus was on reducing duration of calls but I didn’t want to improve one thing, and make others worse as a consequence. In my own tickets, I always made sure to document the before and after metrics from my standalone testing so as to give confidence that they would be included in upcoming releases.

We also used Apdex which is a standardised way of measuring application performance. It ranks page views on whether the user is satisfied, tolerating or frustrated. The more we move users out of the frustrated and tolerating groups, and in to satisfied, the higher the Apdex score will be. As our project moved through release cycles, we were able to see steady increases in our Apdex scores. Apdex also allowed us to identify what was hurting us most and create tickets based on this information.

Top Quality Load Test Environment

We had a top quality load test environment which used production masked backups for the databases. I set up the availability groups to match production, the servers were all sized the same as production and had the same internal settings such as tempdb size, sp_configure settings and trace flags etc. We were able to replay the same tests over and over again using Gatling, and our testers made really useful reports available to help us analyse the performance of each hotfix. If it was a proven fix, it was promoted to a release branch, if it wasn’t, it was binned.

End Game

This intensity was kept up for the almost 2 months and it was ultimately transformative for the business. Whilst there are still many further improvements that can be made, the specialised squad is being disbanded and team members are being reallocated to other squads. Performance should be a way of life now, rather than an afterthought or another performance tuning project.

We can be happy that we improved the Apdex scores, sped up a huge number of regularly used SQL transactions, and removed the large CPU peaks that dominated our core business hours.

If you enjoyed this, you may also enjoy some of these other posts.