sql Archives - John McCormack DBA

What is an availability group listener – An analogy

13th October 2020 By John McCormack Leave a Comment

This post is part of the October 2020 edition of #tsql2sday. Thanks to Rob Volk for hosting (blog|twitter) and for coming up with a great topic. “Data Analogies, or: Explain Databases Like I’m Five!” I genuinely can’t wait to read some of the responses. I think it’s going to be a very educational series of posts. I’ve chosen to explain how an availability group listener works by using an analogy.

What is an availability group listener – An Analogy

Let’s say you run a business and have multiple phone numbers. You may have an office number, a mobile number and a fancy 0800 number to give your business the impression of being much bigger, like your national competitors. You put the 0800 number on your van, website, all of your advertising and your business cards. You’re not actually a national scale business though and you don’t have a receptionist to handle the calls into your 0800 number. So what happens to those calls?

They are routed through to your preferred number, usually your office number, but you can change it to your mobile number when you are out of office. You could even set a preference for it to try to route the call to your office first, then try your mobile phone if the office is unavailable. Customers that have your mobile number or office number can still call you directly on those but they will not be rerouted if either of those phones are unavailable. If you change your mobile number, you can just update the routing to use your new number and the customer is unaware of the change.

What is an availability group listener – A slightly (but not much) deeper dive

What’s in a name

Like the analogy above, the AG listener uses an address that can route SQL Server connections to a primary or secondary replica within an availability group. It includes a DNS name which is unique within a domain, an IP address or several, and a listener port designation.

Connections to SQL Server where availability groups are in use should use the listener name instead of the server name. This means that if any replica in the AG is unavailable, connections will just be routed to the available replicas, meaning no loss of service to the client. You can utilise your SQL Server resources more evenly if using a listener because there is an option to direct certain connections to a readable secondary replica. You can also offset backups to secondary replicas. All of this spreads the load more evenly across available replicas. If you just route everything through the primary, it can be overworked whilst your secondary replicas are doing next to nothing. The secondary replicas still need to be fully licensed so it is a shame, and a waste of money for them to sit by idly.

Port

If you’re looking for simplicity, you should designate the listener port as 1433, as no specific port declaration is required in the connection string. However, if you need to, you can designate a different port, but must include it in your connection string.

Seemless

If your primary replica becomes unavailable and you have automatic failover enabled, it means that your old secondary replica can switch seemlessly to becoming the new primary. No new connections will notice a thing.

Spread the load

If you have more than one secondary, you can spread the load across a set of readable secondaries. This means that all replicas are taking a share of the workload. Prior to SQL Server 2016, only one preferred replica would receive all of the of the read intent traffic.

Thanks again to Rob for coming up with an innovative topic.

John

If you liked this post, why not read:

Test read intent connections to an AG Listener

A successful performance tuning project

5th June 2020 By John McCormack Leave a Comment

Performance tuning project

I’m coming to the end of what has been a successful performance tuning project for SQL Server. I wanted to share some of the things that made it a success.

Corporate buy in

The company had a goal to improve the page load times of a number of key pages within our application. This was to improve the user experience for our customers. They acknowledged that the database code, both indexes and stored procedures needed optimsation but so too did aspects of the application code. It was good to see this acknowledged as I’ve been in many situations where the database takes all the blame.

The company approved a considerable amount of resource in terms of personnel to work solely on this stability and optimisation project. It included senior developers, testers and a project manager. I was brought in to look at the database performance. Whilst some business as usual (BAU) priorities did come in from time to time, a large core of the team was protected from this and allowed to get on with the work of making the system go faster, thus improving the customer experience.

Daily standups

We held daily standups where we covered what we had done since the last meeting, what we were working on and if anything was blocking our progress. These were kept short so as to not get in the way of the development work, but allowed everyone an overview of what the other team members were working on. Often, as a result of these, side conversations spun up and team members helped out others who were looking for a bit of assistance. (Or simply to bounce ideas around)

Collaboration

The team were willing to help each other. When Pull Requests (PRs) were submitted, these were swiftly approved where there were no objections, or challenged in a positive way which helped get the best overall result. When the API calls were showing as slow, but nothing was obvious on the SQL server, heads were put together to use the tools at our disposal to get to the root cause. This often included Azure App Insights which I had not previously used, and this helped us get the end to end transaction details. We could pull out the SQL for any areas which were slow and work on making it perform better.

Measuring improvements

The Azure Instance class for the SQL Server had previously been doubled so there was no appetite to scale it up again. The hope was that we may eventually be able to scale back down after a period of stability.

The system previously had issues with blocking, high CPU and slow durations so I wanted to reduce page reads, CPU and duration for all of the SQL calls I was working on. I wouldn’t consider a PR if at least 2 of these metrics were not improved. My main focus was on reducing duration of calls but I didn’t want to improve one thing, and make others worse as a consequence. In my own tickets, I always made sure to document the before and after metrics from my standalone testing so as to give confidence that they would be included in upcoming releases.

We also used Apdex which is a standardised way of measuring application performance. It ranks page views on whether the user is satisfied, tolerating or frustrated. The more we move users out of the frustrated and tolerating groups, and in to satisfied, the higher the Apdex score will be. As our project moved through release cycles, we were able to see steady increases in our Apdex scores. Apdex also allowed us to identify what was hurting us most and create tickets based on this information.

Top Quality Load Test Environment

We had a top quality load test environment which used production masked backups for the databases. I set up the availability groups to match production, the servers were all sized the same as production and had the same internal settings such as tempdb size, sp_configure settings and trace flags etc. We were able to replay the same tests over and over again using Gatling, and our testers made really useful reports available to help us analyse the performance of each hotfix. If it was a proven fix, it was promoted to a release branch, if it wasn’t, it was binned.

End Game

This intensity was kept up for the almost 2 months and it was ultimately transformative for the business. Whilst there are still many further improvements that can be made, the specialised squad is being disbanded and team members are being reallocated to other squads. Performance should be a way of life now, rather than an afterthought or another performance tuning project.

We can be happy that we improved the Apdex scores, sped up a huge number of regularly used SQL transactions, and removed the large CPU peaks that dominated our core business hours.

If you enjoyed this, you may also enjoy some of these other posts.

Efficient maintenance of SSISDB

7th August 2019 By John McCormack 3 Comments

Maintenance of SSISDB within SQL Server

The SSIS Server Maintenance Job is used for maintenance of SSISDB. It manages the retention of operations records in SSISDB. I noticed it had been turned off by someone last year and it hadn’t run since. Therefore, SSISDB had become bloated and there was only 10MB left on the data drive meaning the database could no longer auto grow.

Nobody was aware, why would they be? After all, nothing was failing. We didn’t have disk space monitoring enabled so the only time we found out there was a problem was when the disk had filled up.

I made 2 unsuccessful attempts at running the SSIS Server Maintenance Job. However, after several hours of processing and still no available free space in the database, I knew the job wasn’t coping with the sheer number of rows it had to delete. The deletes all happen from the parent table (internal.operations) and then all child tables using using cascading deletes. This approach maintains referential integrity but is not great for performance.

Due to this, I needed a new approach to the maintenance of SSISDB. As we hadn’t maintained these tables for 13/14 months, I was asking too much of SQL Server to let me delete everything at once. (Truncates wouldn’t do because I had to keep the last 2 weeks data).

A bit of investigation showed me that these were the related tables.

internal.event_message_context
internal.event_messages
internal.executable_statistics
internal.execution_data_statistics
internal.execution_component_phases
internal.execution_data_taps
internal.execution_parameter_values
internal.execution_property_override_values
internal.executions
internal.operation_messages
internal.extended_operation_info
internal.operation_os_sys_info
internal.validations
internal.operation_permissions
internal.operations

My approach

Amend the retention period (days) in catalog.properties to 400 (because 14 was unmanageable with > 400 days of history)
Write a delete script or find a reliable one that does this work due to SSISDB’s native stored procedures failing to cope
Ensure SSISDB is in SIMPLE recovery model because it will reduce t-log growth
Run the delete script and see how it performs and how much space is freed up in order that the days for deletions can be optimised
Repeat steps 1-4 (each time lowering retention period (days)) until I achieve my target retention period of 14
Ensure this never happens again 😎 (because it’s no fun getting 300 failure emails an hour)

1. Reduce retention period (days) in catalog.properties to 400 (This allowed me delete rows based on only 22,000 IDs)

To do this in T-SQL:

[sql]
EXEC [SSISDB].[catalog].[configure_catalog] @property_name=N’RETENTION_WINDOW’, @property_value=400
[/sql]

To do this is SSMS:

Right click SSISDB from Integration Services Catalog in SQL Server Management Studio. Then, amend Retention Period (days) to a suitable initial value – in my case 400 days.

2. Script out or find a reliable script that does this work manually.

I struck gold with a superb script from Tim Mitchell which honours the Retention Period (days) value. I decided this was better than writing my own. Please follow the link to review this script along with other useful information (or likewise, get the latest version from Tim’s Github).

3. Ensure SSISDB is in SIMPLE recovery model (as it helps with transaction log)

[sql]
SELECT DB_NAME(database_id) AS DBName,recovery_model_desc
FROM sys.databases
WHERE DB_NAME(database_id) = ‘SSISDB’
[/sql]
If it’s not, I recommend you ALTER database to SIMPLE recovery in order to minimise logging. It’s not an essential step but will save bloating your transaction log.

[sql]
ALTER DATABASE [SSISDB] SET RECOVERY SIMPLE WITH NO_WAIT
[/sql]

4. Run script (from step 2) and see how it performs and how much space is freed up

[sql]
SELECT NAME,
size/128/1024.0 AS FileSizeGB,
size/128/1024.0 – CAST(FILEPROPERTY(NAME, ‘SpaceUsed’) AS INT)/128/1024.0 AS FreeSpaceGB,
(size/128/1024.0 – CAST(FILEPROPERTY(NAME, ‘SpaceUsed’) AS INT)/128/1024.0) / (size/128/1024.0) * 100 AS PercentFree
FROM SSISDB.sys.database_files
WHERE NAME = ‘SSISdata’;
[/sql]

5. Repeat steps 1-4 (each time lowering retention period (days)) until you achieve your target retention period

I started small by reducing the retention period (days) to 390. This allowed me to measure the time and impact of removing 10 days of data. I then went down to a retention period of 360 days. I found 14 days to be the best performing decrements so I kept doing this until I had only 14 days of data remaining. Following this, I kept the new script in place (scheduled nightly) via SQL Agent. There was no need to continue using the SSISDB cleanup stored procedure internal.cleanup_server_retention_window.

6. Ensure this never happens again

They say prevention is better than cure. Here are some ideas on how to implement this and ensure that your maintenance of SSISDB is ongoing:

SQL Agent Job to check if date of oldest id is greater than expected (And alert of not)
Write a check in PowerShell or even better, consider writing a command for DBATools.
Professional monitoring tools may alert on this but I haven’t checked.

Style over substance

6th January 2016 By John McCormack 2 Comments

When it comes to managing emails, I can easily receive about 600 a day. This is probably far too many (although that’s a subject for another day). The challenge with receiving so many emails, especially so many system generated errors and warnings is that they can fail to stand out. Important information can be missed. A lack of formatting can also make it difficult to understand what the problem is, if any.

Here is an example of a boring system generated message.

To get around this, I always try to use some pretty HTML in my output just to help with the appearance and to make the message easier to read. I don’t go overboard, just a header tag, something like <h4> is usually enough (unless you want it really big) and of course I like to include my output in a table, usually with a splash of colour in the table header.

[Read more…]

An alternative to sp_msforeachdb

23rd December 2015 By John McCormack Leave a Comment

It’s well known that the sp_msforeachdb stored procedure is undocumented however it can be handy. The trouble with the proc is that it is unreliable. What’s more, it could be dropped from future SQL Server releases causing all your automated scripts to fail.

This little script shows you a more reliable solution and saves you from relying on an undocumented procedure.

USE Master
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp

SELECT name,0 as completed
INTO #tmp
FROM sys.databases
WHERE name NOT IN('tempdb','Training2012') -- A list of DBs you don't wish to include

DECLARE @dbname sysname
DECLARE @cmd NVARCHAR(4000)

WHILE EXISTS(SELECT 1 FROM #tmp WHERE completed = 0)
BEGIN

SET @dbname = (SELECT TOP 1 NAME FROM #tmp WHERE completed = 0) -- You can ORDER BY name if you care about the order
SET @cmd = 'BACKUP DATABASE '+@dbname+' TO DISK = ''D:\Backup\'+@dbname+'.Bak'''
EXEC sp_executesql @cmd
UPDATE #tmp SET completed = 1 WHERE name = @dbname

END

Let’s break up what the script actually does:

DROP temp table if it already exists
1. [sql]IF OBJECT_ID(‘tempdb..#tmp’) IS NOT NULL DROP TABLE #tmp[/sql]
Select into a temporary table with two columns. (Name and Completed). We set completed=0 for each row created.
1. [sql]SELECT name,0 as completed INTO #tmp FROM sys.databases[/sql]
The WHERE clause is optional
1. [sql]WHERE name NOT IN(‘tempdb’,’Training2012′)[/sql]
Create a @dbname variable (for each DB name) and a @cmd variable (for building up the T-SQL command).
1. [sql]DECLARE @dbname sysname
  DECLARE @cmd NVARCHAR(4000)[/sql]
The LOOP (Create and execute a new statement for each DB name and exit the loop once all DBs have been handled).
1. [sql]WHILE EXISTS (SELECT 1 FROM #tmp WHERE completed = 0)
  BEGIN
  SET @dbname = (SELECT TOP 1 NAME FROM #tmp WHERE completed = 0) — You can ORDER BY name if you care about the order
  SET @cmd = ‘BACKUP DATABASE ‘+@dbname+’ TO DISK = ”D:\Backup\’+@dbname+’.Bak”’
  EXEC sp_executesql @cmd
  UPDATE #tmp SET completed = 1 WHERE name = @dbname
  END[/sql]

This approach is simple and reliable and it avoids the use of undocumented stored procedures and cursors. The example above demonstrates a native backup but this command can be anything you like.