#tsql2sday Archives - John McCormack DBA

Free SQL Tools to make your life easier

9th February 2021 By John McCormack 2 Comments

I’ve written and spoken about free SQL tools to make your life easier on a few occasions. It was the subject of my presentation at SQLBits 2020 so I was delighted to see Mikey Brownowski (b|t) choose tooling as the subject for February’s T-SQL Tuesday #135. #tsl2sday.

In this month’s t-SQL Tuesday, Mikey asks us to “Write a blog post about the most helpful and effective tools you use or know of.” I use quite a few free tools. I love the way the SQL Community embraces sharing and I know I wouldn’t have achieved a fraction of my career success without free tools. In my SQLBits talk, I discussed the following free tools:

For this post, I will focus on Statistics Parser, written by Richie Rump (b|t). My blog has info on some other free sql tools as well.

Statistics Parser

Legend

Legend has it that Richie Rump wrote it during a Brent Ozar conference session. I asked him about this and he told me:

Well, I started it in one of his training classes. It was in Atlanta. It was the last day and it was mentioned that there was an Excel spreadsheet that parsed out statistics io output. I found that as interesting but odd. So I started out writing out a web based parser right there in class. The core of it was finished on the plane ride home.
Richie Rump 2021

So what does Statistics Parser do?

If you run SET STATISTICS TIME,IO ON before you run a query in SSMS, you will get information back on how much data was accessed and how long it took. Things like logical reads, physical reads, CPU time and elapsed time etc.

If you are only querying one or two tables, it is easy enough to just read this in the messages window. But what about those complex stored procs or queries hitting multiple tables and views that return a very long list of outputs? The output can be long and intimidating and certainly hard to understand at a glance.

Statistics Parser is a web page which allows you to paste in the statistics time,io output from the SSMS messages tab, and it formats into neat tables showing how much IO happens for each table. It is immediately easier to read and you get a handy % column on the right hand side showing you which tables are being read the most. I find this really useful for query tuning because it lets me know where my biggest pain points are. For complex queries, which touch a lot of tables, it just makes it easy to see at a glance where you should initially focus your attention. It also shows worktable and workfile tables which serves as a handy hint that tempdb is in play.

A really handy feature is that you can open multiple browser tabs, but give each tab it’s own name. Then you can paste the output from the original query in to a tab you have named as “before” or “original”, then give a name to each tab as you try something new. e.g. “index_abc_added” or “fixed_cursors”. I like to do this when working through query tuning options against a restored copy of production. I use it for tuning CPU and reads. I quite often find that if I can make a big enough impact on the reads, the CPU will also come down.

via GIPHY

How to use Statistics Parser

Run the following query in the same SSMS window as a query you are about to troubleshoot:
1. SET STATISTICS TIME,IO ON
Run your query
Copy/paste the message from SSMS into statisticsparser.com
Click Parse button
Open more tabs as needed

Thanks Richie

Thanks for making a great tool, free and easy to use. And thanks for answering my questions.

FYI

Attendees of SQLBits 2020 can still login using their personal link and see all the recorded content so my talk is there. In it, I spend a few minutes demonstrating Statistics Parser. I’m not sure if they will make it free to everyone else in future, like they’ve done in previous years.

Featured Image

Image by Lu Lettering from Pixabay

Automate Your Stress Away – Run sp_whoisactive on a schedule

8th September 2020 By John McCormack 1 Comment

Automate collection of sp_whoisactive on a schedule

This post is part of the September 2020 edition of #tsql2sday. Thanks to Elizabeth Noble for hosting (blog|twitter) and for coming up with a great topic. SQL Server has a great tool for scheduling repeatable tasks called SQL Agent. SQL Server also has a great community, who have written SQL Scripts so established and well known that some might think they come with SQL Server (or at least should). This post combines using SQL Agent with one of those invaluable community scripts, to help solve a real world problem.

When I took over a fairly troublesome SQL Server, I was always being asked what was running when the SQL Server was slow. That was fine when it was during the day and I could look. But often I would be asked, why was the server slow at 9pm last Friday? Or some other time. It’s not always the easiest thing to do as SQL Server doesn’t natively keep this information for you.

A quick but effective way to automate this is to download and schedule Adam Machanic’s sp_whoisactive stored procedure. I run this every 10 minutes but I have seen it scheduled as often as every 30-60 seconds. It depends on your system and your needs.

How to automate running sp_whoisactive on a schedule

Download sp_whoisactive from GitHub if you don’t have it already
Run the code to create the stored procedure on your SQL Server
Use my t-sql below to create an agent job that collects the data (and deletes older rows)
Change any of the parameters to suit your own needs
Review data next time you are asked “What was running at X time”? by simply querying the table

Create SQL Agent Job

! Important ! I use a utility database called DBA for stored procedures such as sp_whoisactive. If you don’t have a database called DBA, please either create one or change the reference to the name of the database you want to create your table in.

USE [msdb]

GO

BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0

IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

END

DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'sp_whoisactive_log_to_table',
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N'Log output of sp_whoisactive every x minutes',
@category_name=N'[Uncategorized (Local)]',
@owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Write output to table',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=3,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'IF NOT EXISTS ( SELECT 1 FROM dba.sys.tables WHERE NAME = ''sp_whoisactive_log'')
BEGIN
CREATE TABLE dba.dbo.sp_whoisactive_log
(
id INT IDENTITY(1,1) PRIMARY KEY,
[dd hh:mm:ss:mss] VARCHAR(20) NULL,
session_id SMALLINT NOT NULL,
sql_text XML NULL,
login_name sysname NULL,
wait_info NVARCHAR(4000) NULL,
CPU VARCHAR(30) NULL,
tempdb_allocations VARCHAR(30) NULL,
tempdb_current VARCHAR(30) NULL,
blocking_session_id SMALLINT NULL,
blocked_session_count VARCHAR(30) NULL,
reads VARCHAR(30) NULL,
writes VARCHAR(30) NULL,
physical_reads VARCHAR(30) NULL,
query_plan XML NULL,
locks XML NULL,
used_memory VARCHAR(30) NULL,
[status] VARCHAR(30) NULL,
open_tran_count VARCHAR(30) NULL,
percent_complete VARCHAR(30) NULL,
[host_name] sysname NULL,
[database_name] sysname NULL,
[program_name] sysname NULL,
start_time datetime NULL,
login_time datetime NULL,
request_id SMALLINT NULL,
collection_time datetime NULL
)
CREATE INDEX idx_collection_time ON dba.dbo.sp_whoisactive_log (collection_time)
END

/* Load data into table.
If you want to change parameters, you will likely need to add columns to table
*/
EXEC sp_whoisactive @get_locks = 1, @find_block_leaders = 1, @get_plans = 1, @destination_table = ''sp_whoisactive_log''',
@database_name=N'DBA',
@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback

EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Delete old rows',
@step_id=2,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'DELETE dba.dbo.sp_whoisactive_log
WHERE collection_time < DATEADD(MONTH,-1,GETDATE())',
@database_name=N'master',
@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Every x minutes',
@enabled=1,
@freq_type=4,
@freq_interval=1,
@freq_subday_type=4,
@freq_subday_interval=10,
@freq_relative_interval=0,
@freq_recurrence_factor=0,
@active_start_date=20200209,
@active_end_date=99991231,
@active_start_time=0,
@active_end_time=235959
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO

This job will run sp_whoisactive on a schedule every 10 minutes and record the output in the table dba.dbo.sp_whoisactive_log. It also deletes any rows older than 1 month.

Hope this helps
John

Deleting data from a data warehouse

9th January 2018 By John McCormack Leave a Comment

This post about deleting data from a data warehouse is my first post in the #tsql2sday series. This month’s host is Arun Sirpal. (blog)

T-SQL Tuesday #98 – Your Technical Challenges Conquered

Please write about and share with the world a time when you faced a technical challenge that you overcame.

Deleting data from a data warehouse

I was tasked with deleting a large amount of data from our data warehouse. This was because we had sold off a small part of our company (based in a different country) and as such, all associated data had to be removed from our data warehouse. The data warehouse pooled the data from our various OLTP data sources and stored it in one data warehouse.

Challenges

Find out the scope of the data
Identify the data to be purged
Identify the order for the purge to take place
- Why was this important?
Create a process for deleting data that will not interfere with our daily loads
Test, and then deploy with confidence

Find out the scope of the data & Identify the data to be purged

I had to identify which tables held the data. I created a tracking table and inserted the names of all of the tables in our two databases which held customer data.
Looping through all of the tables, I identified which tables had 0 rows where sourceid = 5 and marked these as completed in my tracking table.
The remaining tables containing rows where sourceid = 5 would be the tables where the purge was needed.

Identify the order for the purge to take place

It was not possible or practical to just delete the data based on a list of table names where data was held. I had to consider referential integrity and identify a precedence order in which to carry out the deletes, table by table. Many of our tables had Foreign Key constraints so SQL Server simply prevents you from deleting out of order. Some of our tables had multiple Foreign Key relationships and some went down through many levels. This made determining the order of deletions a difficult task.

Explanation:
If you try to delete a row in a primary key table, the delete will fail when the primary key value corresponds to a value in the foreign key constraint of another table. To make this change, you must do your delete of the foreign key data in the foreign key table first, and then delete from the primary key table.

I wrote a stored procedure which reviewed all of the foreign keys in the database and identified the correct order in which tables could be deleted from. Some tables had no foreign key relationships so these could be done in any order. They were the lowest precedence and were done first. The next set of tables were those foreign key tables referenced by a table with a primary key, but did not reference any other tables. These tables were processed next. This process continued on creating a hierarchy which eventually allowed me to identify the correct order in which the tables could be processed.

Create a process for deleting data that will not interfere with our daily loads

Over the years I have used SSIS for many large administration projects. It is useful where a complex workflow has to be identified and where there are many moving parts. SSIS can do so much more than just ETL. With this in mind, I created an SSIS package which would contain all of the code.

A main outline of the SSIS package

Steps to create objects only do so if the object doesn’t already exist.

Create and pre-populate (with table names and expected row counts) a metadata tracking table. As the package was running through, this would allow me to tell at a glance how many rows had been deleted, how many tables had been completed and how far the task had progressed overall. This information was important and also allowed me to provide regular updates to management.
Create a stored procedure which would determine the order in which tables should be purged.
Run the stored procedure and update the metadata tracking table with the precedence values.
Delete the rows. The deletes would be done one table at a time, in optimised batches following the precedence order set out in the metadata tracking table to prevent error. The number of rows deleted for each table was updated in the tracking table.

With the package created, I set up a SQL Agent Job and set a schedule for times outside of the data loads. I also added in a step to ensure the loads were not still running when the job went to run.

Test, and then deploy with confidence

With a process created that worked, I had to set about testing it on a large scale to ensure a smooth process by the time my package was going to target our production database. Fortunately, we had 4 full scale dev and QA environments, each of them had a full set of data. There were slight differences in data volumes due to refresh dates, and also some newer tables existed that weren’t in production.

Having so many environments allowed me to focus on batch size to get the most efficient number of rows to delete per run. I set this as a user variable in SSIS which allowed me to pass in the value via the agent job step. If I felt it needed adjusted, I could just adjust the job step without amending the package.

Overall Result

The production run completed over a couple of weeks. In that time, several billion rows of data over hundreds of tables were deleted.

Alternative options

As I was confident all related data would be deleted, I could have disabled the Foreign Key constraints and deleted in any order. I decided against this for the following reasons.

The full purge would take two weeks. As each of the tables also held data that should remain, and also because we would still be running daily loads, I felt this was unacceptable as it could lead to periods of time where the referential integrity was in doubt.
I could forget to re-enable some constraints.
Once re-enabled, the foreign key constraints would be untrusted. They would need to be checked to reset the is_not_trusted value to 0 in sys.foreign_keys.