Scripts Archives - Page 2 of 4 - John McCormack DBA

When should I stripe SQL Server database backups

23rd March 2020 By John McCormack Leave a Comment

What is a striped backup?

First of all a definition of what we mean when we say stripe SQL Server database backups. When we talk about striping a backup, it simply means the overall backup is distributed across several files, often on different drives. Microsoft’s description.

What are the benefits?

Time
1. Writing to multiple files can save time. This is because the files can be written to the file system in parallel. This is useful if your backups are taking too long and you would like/need them to complete faster.
Storage
1. If a backup drive is getting full, you can split your backup across multiple drives thus saving space on each drive. The total space used will be similar to keeping the backup on a single file but you may see a bit more overall space taken up.

Striped vs Single File

It’s a balancing act. My advice is stick with the defaults until you have an issue or would like something to perform better. Then it’s a good time to experiment and see how you get on. As a rule, the bigger your backups become, the more useful you may find striping your database backups.

How to stripe my backups?

We will discuss T-SQL and Powershell. For Powershell, I’m using the DBATools PowerShell module. DBATools covers far more scenarios than Microsoft’s own default module for SQL Server and is being used and tested all day every day. If you wish to run the powershell commands below, you will need to have dbatools installed. For the T-SQL commands, these all run natively in SSMS.

T-SQL

-- Split file across two drives as not enough space on E:
BACKUP DATABASE StackOverFlow2010
TO DISK = N'E:\Backup\StackOverFlow2010\StackOverFlow2010_1.bak',
DISK = N'F:\Backup\StackOverFlow2010\StackOverFlow2010_2.bak'
WITH COMPRESSION;

T-SQL (Using Ola Hallengren’s DatabaseBackup procedure)

EXECUTE dbo.DatabaseBackup
@Databases = 'StackOverFlow2010',
@Directory = 'E:\Backup\StackOverFlow2010\,F:\Backup\StackOverFlow2010\',
@BackupType = 'FULL',
@Compress = 'Y',
@NumberOfFiles = 2

Powershell (DBATools)

Backup-DbaDatabase -SqlInstance localhost -Database StackOverflow2010 -Path E:\Backup\StackOverflow2010\,F:\Backup\StackOverflow2010\ -CompressBackup

Restoring striped backups

It is therefore worth testing your backups and restore scripts every so often. A backup that cannot be restored is not really a backup.

T-SQL

RESTORE DATABASE StackOverFlow2010
FROM DISK = N'E:\Backup\StackOverFlow2010\StackOverFlow2010_1.bak',
DISK = N'F:\Backup\StackOverFlow2010\StackOverFlow2010_2.bak'

Powershell (DBATools)

Restore-DbaDatabase -SqlInstance localhost -DatabaseName StackOverflow2010 -Path E:\Backup\StackOverflow2010\,F:\Backup\StackOverflow2010\ -WithReplace

The curious case of the space in char(1)

10th August 2019 By John McCormack 1 Comment

What happens when you store a space in a char(1) column?

I was asked by a colleague why his where clause wasn’t being selective when filtering on a space value. The column was a char(1) data type. To understand the curious case of the space in char(1), we need to understand how the char data type works and also a bit more about the need for it in this scenario.

Scenario

My colleague had built a user table which included a gender column. A char(1) was chosen because one letter could be used to record the gender of the user. It was also small which helped with performance and overall size of the table. All non specified or non binary genders were assigned a ‘ ‘. (A space) The assumption was that the space character would behave the same as an M or F when used in the SELECT clause.

How the char column works

A char is fixed-length data type and any value that fall short of the fixed length is padded out with empty spaces. This means that inserting ” into the gender column has the same affect as entering ‘ ‘. This can be tested using the LEN function:

[sql]

DECLARE @tbl TABLE (name char(10))

INSERT INTO @tbl VALUES (”) — empty char(1)
INSERT INTO @tbl VALUES (‘ ‘) — a space in char(1)
INSERT INTO @tbl VALUES (‘John’)
INSERT INTO @tbl VALUES (‘John ‘)

SELECT name,LEN(name) as len_name FROM @tbl

[/sql]

The spaces do not affect the LEN of the value of the char column

Build a test table – including putting a space in char(1)

[sql]

CREATE TABLE dbo.People (Description NVARCHAR (20),Gender CHAR(1))

INSERT INTO dbo.People(Description,Gender) VALUES(N’Male’, ‘M’)
INSERT INTO dbo.People(Description,Gender) VALUES(N’Female’, ‘F’)
INSERT INTO dbo.People(Description,Gender) VALUES(N’Other with space’, ‘ ‘)
INSERT INTO dbo.People(Description,Gender) VALUES(N’Other without space’, ”)
INSERT INTO dbo.People (Description,Gender) VALUES(N’Other as O’, ‘O’)

[/sql]

Query / Results

[sql]

— Select all rows and show the LEN of the gender column
SELECT
Description,
Gender,
LEN(Gender) AS Len_Gender
FROM dbo.People

[/sql]

[sql]

— We get the same result with either query so the space is ignored
SELECT * FROM dbo.People WHERE Gender = ‘ ‘
SELECT * FROM dbo.People WHERE Gender = ”

[/sql]

[sql]

— We get the row we want when we use a value such a O instead of a space
SELECT * FROM dbo.People WHERE Gender = ‘O’

[/sql]

Workarounds

The best option is use to a different character like a ‘O’ or something else suitable, anything other than a space in char(1) would work. If you weren’t too concerned about space used in this column, using a longer varchar data type would allow user to enter anything they wanted. Failing that, a TINY int would allow 256 options and still only uses 1 byte.

Its 2019

Yes, I know there a whole discussion about whether we should be storing gender and possible values but this is simply a post about using CHAR(1).

Deleting data from a data warehouse

9th January 2018 By John McCormack Leave a Comment

This post about deleting data from a data warehouse is my first post in the #tsql2sday series. This month’s host is Arun Sirpal. (blog)

T-SQL Tuesday #98 – Your Technical Challenges Conquered

Please write about and share with the world a time when you faced a technical challenge that you overcame.

Deleting data from a data warehouse

I was tasked with deleting a large amount of data from our data warehouse. This was because we had sold off a small part of our company (based in a different country) and as such, all associated data had to be removed from our data warehouse. The data warehouse pooled the data from our various OLTP data sources and stored it in one data warehouse.

Challenges

Find out the scope of the data
Identify the data to be purged
Identify the order for the purge to take place
- Why was this important?
Create a process for deleting data that will not interfere with our daily loads
Test, and then deploy with confidence

Find out the scope of the data & Identify the data to be purged

I had to identify which tables held the data. I created a tracking table and inserted the names of all of the tables in our two databases which held customer data.
Looping through all of the tables, I identified which tables had 0 rows where sourceid = 5 and marked these as completed in my tracking table.
The remaining tables containing rows where sourceid = 5 would be the tables where the purge was needed.

Identify the order for the purge to take place

It was not possible or practical to just delete the data based on a list of table names where data was held. I had to consider referential integrity and identify a precedence order in which to carry out the deletes, table by table. Many of our tables had Foreign Key constraints so SQL Server simply prevents you from deleting out of order. Some of our tables had multiple Foreign Key relationships and some went down through many levels. This made determining the order of deletions a difficult task.

Explanation:
If you try to delete a row in a primary key table, the delete will fail when the primary key value corresponds to a value in the foreign key constraint of another table. To make this change, you must do your delete of the foreign key data in the foreign key table first, and then delete from the primary key table.

I wrote a stored procedure which reviewed all of the foreign keys in the database and identified the correct order in which tables could be deleted from. Some tables had no foreign key relationships so these could be done in any order. They were the lowest precedence and were done first. The next set of tables were those foreign key tables referenced by a table with a primary key, but did not reference any other tables. These tables were processed next. This process continued on creating a hierarchy which eventually allowed me to identify the correct order in which the tables could be processed.

Create a process for deleting data that will not interfere with our daily loads

Over the years I have used SSIS for many large administration projects. It is useful where a complex workflow has to be identified and where there are many moving parts. SSIS can do so much more than just ETL. With this in mind, I created an SSIS package which would contain all of the code.

A main outline of the SSIS package

Steps to create objects only do so if the object doesn’t already exist.

Create and pre-populate (with table names and expected row counts) a metadata tracking table. As the package was running through, this would allow me to tell at a glance how many rows had been deleted, how many tables had been completed and how far the task had progressed overall. This information was important and also allowed me to provide regular updates to management.
Create a stored procedure which would determine the order in which tables should be purged.
Run the stored procedure and update the metadata tracking table with the precedence values.
Delete the rows. The deletes would be done one table at a time, in optimised batches following the precedence order set out in the metadata tracking table to prevent error. The number of rows deleted for each table was updated in the tracking table.

With the package created, I set up a SQL Agent Job and set a schedule for times outside of the data loads. I also added in a step to ensure the loads were not still running when the job went to run.

Test, and then deploy with confidence

With a process created that worked, I had to set about testing it on a large scale to ensure a smooth process by the time my package was going to target our production database. Fortunately, we had 4 full scale dev and QA environments, each of them had a full set of data. There were slight differences in data volumes due to refresh dates, and also some newer tables existed that weren’t in production.

Having so many environments allowed me to focus on batch size to get the most efficient number of rows to delete per run. I set this as a user variable in SSIS which allowed me to pass in the value via the agent job step. If I felt it needed adjusted, I could just adjust the job step without amending the package.

Overall Result

The production run completed over a couple of weeks. In that time, several billion rows of data over hundreds of tables were deleted.

Alternative options

As I was confident all related data would be deleted, I could have disabled the Foreign Key constraints and deleted in any order. I decided against this for the following reasons.

The full purge would take two weeks. As each of the tables also held data that should remain, and also because we would still be running daily loads, I felt this was unacceptable as it could lead to periods of time where the referential integrity was in doubt.
I could forget to re-enable some constraints.
Once re-enabled, the foreign key constraints would be untrusted. They would need to be checked to reset the is_not_trusted value to 0 in sys.foreign_keys.

SQL server wait statistics query – With link to help

23rd May 2016 By John McCormack Leave a Comment

SQL Server Wait Types Library

Following the release of SQL Server Wait Types Library by Paul Randal via SQLSkills.com, I have updated my query which finds the most prolific wait types to include a link to the waits library for that particular wait type. The handy thing about the way Paul has named the posts is that they all end with the exact wait type name so this name can be pulled out from a query and appended to a fixed URL. This means we can copy the contents of the cell in the help_url column into our browser to find the entry in the SQL Server Wait Types Library.

More about the query

As sys.dm_os_wait_stats is reset when SQL server is restarted, it’s worth knowing the start time of the instance so you can get information on averages per minute or hour (or some other time scale). This is included in the query below. Please bear in mind that the contents of this DMV can be cleared out by running DBCC SQLPERF (‘sys.dm_os_wait_stats’, CLEAR); so server start time is not completely reliable as a measure for working out averages.

SQL server wait statistics query

[SQL]

— Declare variables
DECLARE
@sqlserver_start_time DATETIME,
@current_time DATETIME,
@mins INT,
@hours INT;

— Set variables
SET @sqlserver_start_time = (SELECT sqlserver_start_time FROM sys.dm_os_sys_info); — Needed to work when collection in dm_os_wait_stats started
SET @current_time = (SELECT SYSDATETIME());
SET @mins = (SELECT DATEDIFF(MINUTE,@sqlserver_start_time,@current_time));
SET @hours = (SELECT @mins/60 AS Integer);

— Uncomment PRINT statements for more info
— PRINT @sqlserver_start_time
— PRINT @current_time
— PRINT @mins
— PRINT @hours

SELECT TOP 10
wait_type,
wait_time_ms,
waiting_tasks_count,
waiting_tasks_count/@hours as waiting_tasks_count_PerHour,
waiting_tasks_count/@mins as waiting_tasks_count_PerMin,
wait_time_ms/waiting_tasks_count AS avg_wait_time_ms,
‘https://www.sqlskills.com/help/waits/’+wait_type as help_url — SQLSkills resource on waits. If web page not complete, find another source.
FROM sys.dm_os_wait_stats
ORDER BY wait_time_ms DESC — Overall time spent waiting by wait type. (Could change to waiting_tasks_count DESC)

[/SQL]

The EXCEPT operator in T-SQL

19th January 2016 By John McCormack Leave a Comment

The except operator in t-sql

When I first read about the except operator in t-sql and saw how simple it was to use, it knew I would be using it a lot. And I have.

I first came across it when studying for the 70-461 exam (Querying Microsoft SQL Server 2012). I keep trying to tell my colleagues about it but it seems they already know. So the only logical thing to do was write a blog post about it.

What does it do?

The EXCEPT OPERATOR compares two sets of data and performs set difference. In other words, it returns all of the rows from query 1 EXCEPT when an identical row is returned from query 2. This makes EXCEPT a quick and easy way to compare two sets of data without using 3^rd party tools or more complicated JOIN syntax.

What is the syntax?

[Read more…]

What is a striped backup?

What are the benefits?

Striped vs Single File

How to stripe my backups?

T-SQL

Powershell (DBATools)

Restoring striped backups

T-SQL

Powershell (DBATools)

What happens when you store a space in a char(1) column?

Scenario

How the char column works

Build a test table – including putting a space in char(1)

Query / Results

Workarounds

Its 2019

Further reading:

T-SQL Tuesday #98 – Your Technical Challenges Conquered

Deleting data from a data warehouse

Challenges

Find out the scope of the data & Identify the data to be purged

Identify the order for the purge to take place

Create a process for deleting data that will not interfere with our daily loads

Test, and then deploy with confidence

Overall Result

Alternative options

SQL Server Wait Types Library

More about the query

SQL server wait statistics query