In this post, I discuss 3 methods (of varying complexity) to carry out some disk space monitoring with SQL server. This can be ad-hoc or you can build it into SQL agent jobs to run on a schedule. Now the first thing I’d like to say is this isn’t a replacement for good monitoring software, it’s more of a compliment to monitoring software. It’s a double check in case something goes wrong with your monitoring. Having some handy scripts is useful as well, because it lets you find out info on an ad-hoc basis, rather than relying on alerts or checking manually. [Read more…]
Search Results for: Adobe Workfront Project Manager Professional Updated Torrent - AD0-E903 Training Vce - Adobe Workfront Project Manager Professional Pdf Exam ☸ Search for ➽ AD0-E903 🢪 and obtain a free download on ➡ www.pdfvce.com ️⬅️ 💮AD0-E903 Exam Cram
SQL server wait statistics query – With link to help
SQL Server Wait Types Library
Following the release of SQL Server Wait Types Library by Paul Randal via SQLSkills.com, I have updated my query which finds the most prolific wait types to include a link to the waits library for that particular wait type. The handy thing about the way Paul has named the posts is that they all end with the exact wait type name so this name can be pulled out from a query and appended to a fixed URL. This means we can copy the contents of the cell in the help_url column into our browser to find the entry in the SQL Server Wait Types Library.
More about the query
As sys.dm_os_wait_stats is reset when SQL server is restarted, it’s worth knowing the start time of the instance so you can get information on averages per minute or hour (or some other time scale). This is included in the query below. Please bear in mind that the contents of this DMV can be cleared out by running DBCC SQLPERF (‘sys.dm_os_wait_stats’, CLEAR); so server start time is not completely reliable as a measure for working out averages.
SQL server wait statistics query
[SQL]
— Declare variables
DECLARE
@sqlserver_start_time DATETIME,
@current_time DATETIME,
@mins INT,
@hours INT;
— Set variables
SET @sqlserver_start_time = (SELECT sqlserver_start_time FROM sys.dm_os_sys_info); — Needed to work when collection in dm_os_wait_stats started
SET @current_time = (SELECT SYSDATETIME());
SET @mins = (SELECT DATEDIFF(MINUTE,@sqlserver_start_time,@current_time));
SET @hours = (SELECT @mins/60 AS Integer);
— Uncomment PRINT statements for more info
— PRINT @sqlserver_start_time
— PRINT @current_time
— PRINT @mins
— PRINT @hours
SELECT TOP 10
wait_type,
wait_time_ms,
waiting_tasks_count,
waiting_tasks_count/@hours as waiting_tasks_count_PerHour,
waiting_tasks_count/@mins as waiting_tasks_count_PerMin,
wait_time_ms/waiting_tasks_count AS avg_wait_time_ms,
‘https://www.sqlskills.com/help/waits/’+wait_type as help_url — SQLSkills resource on waits. If web page not complete, find another source.
FROM sys.dm_os_wait_stats
ORDER BY wait_time_ms DESC — Overall time spent waiting by wait type. (Could change to waiting_tasks_count DESC)
[/SQL]
Firefighting – When your SQL Server is in big trouble
When your SQL Server is in big trouble
It’s so important to stay calm when your SQL Server is in big trouble. I’m talking about when blocking is through the roof, when CPU is pinned and people are yelling from all different directions. Staying calm isn’t just about a state of mind, you need to have a process that you work through, that you have practised and you know inside out.
How bad is it?
In this post, I want to describe what I call a priority 2 problem. It is serious and business impacting but the server is online. It needs immediate attention however or it could escalate to a P1.
P1 – Business critical system(s) are unavailable
P2 – Serious impairment of functionality on a critical system
P3 – Performance is slower than usual but requests are completing
P4 – Performance is generally sub optimal and should be improved
Get a colleague on comms
One thing that is guaranteed to happen is that people who are not directly involved in fixing the issue, will want updates. They have good reason to want updates and are well meaning, but the constant need to reply to emails or Teams messages will seriously impact the speed at which you are working.
Back in the good old pre-covid days of office working, this could easily be achieved by someone sitting near you and the two of you communicating verbally. With remote working being more and more common now, I recommend you have some kind of audio channel open that you can speak when you need to and silence is ok too. This could be a phone call on speaker, or a teams call. The technology isn’t the main thing here, the idea is that you can express updates vocally to someone capable of communicating with the wider organisation.
Where possible, your colleague should be technically capable of answering related questions. They should open a chat in Teams or Slack or whatever software your organisation prefers and provide regular updates there. They should answer questions, keep to the update schedule and most importantly, divert people away from you.
A pre practiced routine
Now the scenarios can be different, in this one I want to talk about how I would handle a SQL Server which appears to be functioning (barely) but is extremely slow. In other words, it is a priority 2 incident as defined above. Helpdesk supervisors are calling tech support to advise and customers are calling them as the website is unusable. On top of that, the MI team and data scientists are contacting the DBAs directly because their queries won’t run.
Have a script or checklist
In my script, I tend to use well established community stored procedures. The sort of ones that most DBAs know about and many use. If you start trying to write new queries in the middle of a slow down, that is going to cost you time. Stick with what works, what is established and what you have used before and know well.
I’ve mentioned these in previous posts but the main things I am going to run are:
- sp_whoisactive – https://github.com/amachanic/sp_whoisactive
- sp_blitzfirst – https://www.brentozar.com/first-aid/
- sp_blitzlock – https://www.brentozar.com/first-aid/
sp_whoisactive
I always run this first because it it designed specifically for showing you current database activity, and it has saved my bacon so many times in the past. Simply running this parameter free will show you what is running on your SQL Server right now. It is ordered by query duration descending so the long runners are at the top. It can give you blocking information, information on wait types and on open transactions as well. For advanced features, you need to use some parameters.
sp_blitzfirst
If you can find what you need with sp_whoisactive, you may not even need to use this stored procedure, Where this procedure comes into its own is it tells you when certain things have recently changed or when they out of the ordinary.
e.g. sp_blitzfirst will tell you if:
- The plan cache has been recently erased
- You have high CPU utilization from a program other than SQL Server
- How long each wait type has been waiting during the previous 5 seconds
If something stands out that you don’t understand, there will be data in the URL column that you can copy/paste into a web browser. The web page will give you an explanation of the problem.
sp_blitzlock
This proc is all about deadlocks. You might not need to run it if the first two have given you enough information to fix the issue in hand. However, if deadlocks are at the root of your problems, this will tell you which queries are involved and when. It’s also easier to read than deadlock graphs.
There are more things that I’ll run in the fullness of time but we’ll leave them for later. I only want to focus on these 3 things for now to zero in on the immediate problem.
The script
USE dba /* To run this, you also need to install: - sp_whoisactive - sp_blitzfirst - sp_blitzlock */ -- Raises an error if you run the whole script in error RAISERROR ('Dont run it all at once',20,-1) WITH LOG -- sp_whoisactive with various parameters EXEC sp_whoisactive @find_block_leaders = 1 --,@get_locks = 1 EXEC sp_whoisactive @sort_order = 'sql_text' -- Tells at a glance if you have a lot of the same query running. For the F5 report runner troublemakers -- What has been hurting us in the last 5 seconds. Look for wait stats, and anything out of the ordinary, such as the plan cache has been recently erased. EXEC dbo.sp_BlitzFirst @expertmode = 1 -- Are we experiencing deadlocks EXEC sp_BlitzLock -- Deadlocks in last hour DECLARE @StartDateBlitz datetime = (SELECT DATEADD(HH,-1,GETDATE())), @EndDateBlitz DATETIME = (SELECT GETDATE()) EXEC sp_BlitzLock @EndDate = @EndDateBlitz, @StartDate = @StartDateBlitz GO /* Some other things to consider Have the usual optimisation jobs run as expected. Stats/indexes etc If one proc has regressed badly, could it help to clear only that plan from the cache or to recompile it. EXEC sp_blitz -- Although not as handy as the others for real time issues. Better used as a health check EXEC sp_blitzcache -- More useful for helping you identify the resource hungry queries, allowing you EXEC sp_readerrorlog 0,1, 'memory' */
The calm after the storm
I will dedicate a full post to this however for now, it is important to say that slow queries or resource intensive queries should be identified and added to a backlog for fixing. If one particular query is likely to go bad again, it should be treated as a priority to fix, in order that we don’t see the same issue escalating again.
You should also do a SQL Server Health Check in order to satisfy that you don’t have an sub optimal configuration causing you issues.
Related to: When your SQL Server is in big trouble
Why I certified to become more than a SQL DBA
For this month’s T-SQL Tuesday, Andy Leonard (b|t) asks How Do You Respond When Technology Changes Under You?
It’s a great topic and inspired me to write a T-SQL Tuesday post this month. I’ve missed the last couple due to a combination of just not feeling like it, and the previous topics weren’t best suited to me. Hopefully this can inspire me to get writing again.
Hey, I thought I was the SQL DBA!
When I got my start with SQL Server, I was just that – A SQL DBA. I worked as a junior member of a big team, we had a large SQL Server estate and I just wanted to learn to become as experienced as the senior folk. I was able to learn and be mentored and was enjoying making progress. Then I heard we were going to start using “The Cloud”. Not only that but I was going to be responsible for the operation of our Kafka clusters, EMR and ElasticSearch.
A lot of these technologies were developer led, they knew why they wanted to use them and from an operations perspective, it was about making sure things didn’t fail, and errors were handled. I had no background in them, and didn’t really understand them well. Simply learning to use PuTTY to log into a linux VM was one of my early successes.
AWS Certifications FTW
The move to the cloud, in this case AWS was going ahead with or without me so I had to embrace it. Of course, I learned and collaborated with colleagues and got to an acceptable standard but it was only when I decided to do AWS certifications that things really took off. I found that by studying for the AWS Cloud Solutions Architect certification that I got an appreciation for all cloud services, not just the data ones like RDS. Studying for the AWS certifications improved my networking skills; I learned about object vs block storage and of course I learned all of the different data products (often managed services but not always).
I approached the certification process from a genuine perspective of learning the technology well, rather than just passing the exam. Simply passing the exam does not do much for you. Even if it gets you a job, you will likely be found out pretty quickly.
A further benefit to learning one cloud technology well is that when we started using Azure, I found the concepts were the same. I understood the networking, the security groups and PaaS vs IaaS. It made this transition much easier and meant that when I moved to a job that was predominantly based in Azure, I didn’t have the huge knowledge gap that I would have had, if I had not done my AWS certifications.
So for me, a structured learning approach is best. It’s not for everyone but I am certainly glad that my job is no longer just “SQL DBA”.
Efficient maintenance of SSISDB
Maintenance of SSISDB within SQL Server
The SSIS Server Maintenance Job is used for maintenance of SSISDB. It manages the retention of operations records in SSISDB. I noticed it had been turned off by someone last year and it hadn’t run since. Therefore, SSISDB had become bloated and there was only 10MB left on the data drive meaning the database could no longer auto grow.
Nobody was aware, why would they be? After all, nothing was failing. We didn’t have disk space monitoring enabled so the only time we found out there was a problem was when the disk had filled up.
I made 2 unsuccessful attempts at running the SSIS Server Maintenance Job. However, after several hours of processing and still no available free space in the database, I knew the job wasn’t coping with the sheer number of rows it had to delete. The deletes all happen from the parent table (internal.operations) and then all child tables using using cascading deletes. This approach maintains referential integrity but is not great for performance.
Due to this, I needed a new approach to the maintenance of SSISDB. As we hadn’t maintained these tables for 13/14 months, I was asking too much of SQL Server to let me delete everything at once. (Truncates wouldn’t do because I had to keep the last 2 weeks data).
A bit of investigation showed me that these were the related tables.
- internal.event_message_context
- internal.event_messages
- internal.executable_statistics
- internal.execution_data_statistics
- internal.execution_component_phases
- internal.execution_data_taps
- internal.execution_parameter_values
- internal.execution_property_override_values
- internal.executions
- internal.operation_messages
- internal.extended_operation_info
- internal.operation_os_sys_info
- internal.validations
- internal.operation_permissions
- internal.operations
My approach
- Amend the retention period (days) in catalog.properties to 400 (because 14 was unmanageable with > 400 days of history)
- Write a delete script or find a reliable one that does this work due to SSISDB’s native stored procedures failing to cope
- Ensure SSISDB is in SIMPLE recovery model because it will reduce t-log growth
- Run the delete script and see how it performs and how much space is freed up in order that the days for deletions can be optimised
- Repeat steps 1-4 (each time lowering retention period (days)) until I achieve my target retention period of 14
- Ensure this never happens again 😎 (because it’s no fun getting 300 failure emails an hour)
1. Reduce retention period (days) in catalog.properties to 400 (This allowed me delete rows based on only 22,000 IDs)
To do this in T-SQL:
[sql]
EXEC [SSISDB].[catalog].[configure_catalog] @property_name=N’RETENTION_WINDOW’, @property_value=400
[/sql]
To do this is SSMS:
Right click SSISDB from Integration Services Catalog in SQL Server Management Studio. Then, amend Retention Period (days) to a suitable initial value – in my case 400 days.
2. Script out or find a reliable script that does this work manually.
I struck gold with a superb script from Tim Mitchell which honours the Retention Period (days) value. I decided this was better than writing my own. Please follow the link to review this script along with other useful information (or likewise, get the latest version from Tim’s Github).
3. Ensure SSISDB is in SIMPLE recovery model (as it helps with transaction log)
[sql]
SELECT DB_NAME(database_id) AS DBName,recovery_model_desc
FROM sys.databases
WHERE DB_NAME(database_id) = ‘SSISDB’
[/sql]
If it’s not, I recommend you ALTER database to SIMPLE recovery in order to minimise logging. It’s not an essential step but will save bloating your transaction log.
[sql]
ALTER DATABASE [SSISDB] SET RECOVERY SIMPLE WITH NO_WAIT
[/sql]
4. Run script (from step 2) and see how it performs and how much space is freed up
[sql]
SELECT NAME,
size/128/1024.0 AS FileSizeGB,
size/128/1024.0 – CAST(FILEPROPERTY(NAME, ‘SpaceUsed’) AS INT)/128/1024.0 AS FreeSpaceGB,
(size/128/1024.0 – CAST(FILEPROPERTY(NAME, ‘SpaceUsed’) AS INT)/128/1024.0) / (size/128/1024.0) * 100 AS PercentFree
FROM SSISDB.sys.database_files
WHERE NAME = ‘SSISdata’;
[/sql]
5. Repeat steps 1-4 (each time lowering retention period (days)) until you achieve your target retention period
I started small by reducing the retention period (days) to 390. This allowed me to measure the time and impact of removing 10 days of data. I then went down to a retention period of 360 days. I found 14 days to be the best performing decrements so I kept doing this until I had only 14 days of data remaining. Following this, I kept the new script in place (scheduled nightly) via SQL Agent. There was no need to continue using the SSISDB cleanup stored procedure internal.cleanup_server_retention_window.
6. Ensure this never happens again
They say prevention is better than cure. Here are some ideas on how to implement this and ensure that your maintenance of SSISDB is ongoing:
- SQL Agent Job to check if date of oldest id is greater than expected (And alert of not)
- Write a check in PowerShell or even better, consider writing a command for DBATools.
- Professional monitoring tools may alert on this but I haven’t checked.
Further reading
- My other posts which mention SSIS
- SSISDB best practices
- « Previous Page
- 1
- …
- 3
- 4
- 5
- 6
- 7
- …
- 17
- Next Page »