John McCormack's blogs

Running SQL Server on RDS SQLBits presentation

2nd March 2019 By John McCormack Leave a Comment

Running SQL Server on RDS

I presented on Running SQL Server on RDS at SQLBits. It was a lightning talk, so only 5 minutes.

You can download my slides from here:

Further interesting links:

My more detailed post on running SQL Server on AWS (RDS vs EC2)

Amazon RDS for SQL Server FAQs

SQL Server Agent Information

Outstanding batch wouldn’t clear on sys.dm_fts_outstanding_batches

16th February 2019 By John McCormack Leave a Comment

Outstanding batch wouldn’t clear on sys.dm_fts_outstanding_batches

I had an issue where an outstanding batch wouldn’t clear on sys.dm_fts_outstanding_batches. This meant the full-text index was not working when queried using the CONTAINSTABLE syntax. I used a series of queries from Glen Allan Berry’s post to help me identify what the problem was with my full-text index.

I restarted the Full-Text service with no success. This seemed like the natural thing to do and my colleagues said that it had worked in the past when FTS had stalled. Unfortunately, this didn’t help the situation.

I then rebuilt the Full Text Catalog and still had no success. This took 5 minutes however the rebuild was blocked so would normally have been much quicker.

As the full-text index was built on an indexed view, my next idea was to rebuild the clustered index of the view. To my surprise, this actually solved the problem and I no longer had any outstanding FTS batches. My query using CONTAINSTABLE returned a success and the developers confirmed their application was fixed. In my opinion, this is well worth a try.

The point of sharing this post is the hope that it will help someone else with the same issue. It’s not my greatest ever post and I’m not certain why the index rebuild helped but its been a good lesson to me that I need to learn more about Full-Text Indexing for SQL Server. Anything else I learn will be used to add useful information to this blog post.

Any suggestions for good resources in the comments would be highly appreciated as this looks like an area that has been far from extensively covered by other bloggers.

A month in the life of a DBA contractor

8th February 2019 By John McCormack Leave a Comment

I left what I considered to be a great permanent job to become a DBA contractor in Glasgow. It was one where I could pick some of my own projects, work with new and interesting technology and work in a team which collaborated well, liked each other and were good at sharing knowledge.

My reasons for moving on were two-fold. First of all, I wanted some career progression and it unfortunately just wasn’t available in my permanent job. Secondly, I wanted to work more with SQL Server and Azure. My last job started out as a SQL DBA but as a lot of people in this line of work will know, the duties have evolved considerably over time. This is mostly due to large scale cloud adoption. In my case, my workplace started heavily using AWS along with other RDBMSs and data platforms. (RDS MySQL & Aurora, ElasticSearch, Elastic Map Reduce (EMR), Glue and Athena). This gives you a lot to learn. Don’t get me wrong, I threw myself into learning about cloud solutions and I loved working with AWS (hopefully I will again) but as time went on, I was worried I would start to lose some SQL Server knowledge.

So what has the first month been like?

I have been working exclusively on the Microsoft Data Platform. At my first client, I’ve worked with SQL Server on-premises, SQL Server running on Azure VMs, Azure SQL DB, Azure Analysis Services and Azure SQL Data Warehouse.

On-Premises

The on-prem SQL Servers are mostly for hosting the databases of legacy applications and 3rd party vendor products. Some of the versions are ‘rather old‘ although these instances actually give the business the fewest headaches. I’ve worked on decommissioning unused databases that were still online, server side traces and native backups.

Azure VMs running SQL Server (IaaS)

These host the databases that have the most active development work. I’ve looked at performance issues and SQL Server configurations to help improve performance. I’ve shared best practice with the team in terms of tempdb and helped to reduce the volume of unnecessary emails from servers. (You know the ones which someone set up years ago but never induce any action from the DBAs).

Azure Analysis Services (PaaS)

This was fun. I only did some basic configuration and permissions but as I’ve never used Analysis Services much over the years, it was cool to see what was involved.

Azure SQL Data Warehouse (Paas)

It was great to get my hands on tech like this. I spent time modernising their manual point and click refresh process with PowerShell. This was a bit of a learning curve but extremely rewarding. This process will make future refreshes much easier. (I’ll share the code once I’m confident it’s good, I’d still like a few more run-throughs before doing this). Next month, I’d like to try to implement a self service process for the developers that allows them to do their own refreshes, freeing up DBA time to work on more critical items.

Azure SQL DB (PaaS)

So far, I haven’t had to do too much with Azure SQL DB on this project. I’ve listed out the instances and instance types using PowerShell. My plan is to review if these are right sized or if efficiencies can be made and I’ll also make sure the alerting is set up correctly.

Plans for next month

I’d like to offer some in depth server health reviews and work with the developers to help them make their code run faster. There is a lot of blocking due to long running stored procedures and sub optimal code. Improving a few of the big hitters will make them much happier and will ease the strain on some of our servers – at least that’s the idea.

How do I run SQL Server on AWS?

6th November 2018 By John McCormack 3 Comments

How do I run SQL Server on AWS?

Running SQL Server on AWS can be done in 2 ways.

Relation Database Service (RDS): AWS’s managed solution where some of the administration (maintenance, backups and patching) is handled for you.
EC2: Your very own virtual machine in the cloud. With EC2, you manage SQL Server, just like you would do on-premises. This gives you full control over your SQL instance.

Why not just use Azure?

Azure may be the right choice for you. This purpose of this post is to show that you do have options and that Azure is not the only choice. Really, you should weigh up what you need from your cloud provider and make a choice. Managing a multi-cloud environment can be challenging. If you are working for a company already heavily invested it AWS, it may make more sense to stay with AWS. Even if not, AWS’s SQL Server offering is mature and RDS’s features are improving all the time so you should consider what you can achieve with AWS.

Differences/Similarities/Pros and Cons

Feature	EC2	RDS
Use SSMS	✔	✔
Use database as source or target for SSIS	✔	✔
Full control over the instance	✔
Automated backups		✔
Automated patching		✔
Always On Availability groups	✔	✔
AWS-managed Multi AZ deployment		✔
Log Shipping	✔
Replication	✔
Database mail	✔
Linked Servers	✔

A full list of included and excluded features can be viewed at https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_SQLServer.html

RDS key facts

Limited control
- The master user name you use when you create a DB instance is a SQL Server Authentication login that is a member of the processadmin, public, and setupadmin fixed server roles. It is not a member of the sysadmin fixed server role.
- Anyone who creates a db is assigned to db_owner role on that DB.
More administration tasks are handled automatically e.g. backups, OS patching and SQL patching
Bring Your Own licence is no longer available on RDS
License included (covers multi AZ)
- Standard
- Web (Must only be used for public and internet-accessible webpages, websites, web applications, and web services.)
- Enterprise
Backups
- Can still be pushed to S3
- Native backup and restore using .bak files to/from S3
- Versions: 2008 R2 – 2017
Limited to 30 databases per instance
Window/Mixed authentication
- http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_SQLServerWinAuth.html
Multi AZ is very simple to set up
TDE at rest
SSL for data in transit
No database Mail

EC2 approach

Full control of instance
Bring your own licence (BYOL) options
Licence included (AMI)
- Standard
- Enterprise
High Availability Disaster Recovery HADR
- You can use Always On Availability Groups
  - Automatic failover between AZs
  - Synchronous commit
  - Latency about 1ms to 2 ms
- Manual failover between Regions option for increase resiliency

Backups

EC2, you are responsible for all of your own backups. Backups can be pushed to S3 to save reduce costs. I have a full post dedicated to SQL Server backups on EC2. You could also store backups on EBS volumes attached to your instance however this is much more expensive than storing on S3.

RDS will handle backups for you. You can restore to a point in time in RDS withing the backup window you configure. Transaction logs are taken every 5 minutes so you may experience data loss up to 5 minutes. All databases are restored with 1 second of each other. The downside to this is that transactions spanning multiple databases may not be recovered consistently.

Summary

Features are being added to RDS all the time however if you need features in the excluded list, you would need to go with an EC2 installation. However if your business size doesn’t allow for DBAs or database specialists, RDS may be the best option as AWS take care of all the installation and maintenance for you.

Resources

Glasgow Super Meetup – AWS Athena Presentation

26th October 2018 By John McCormack 1 Comment

The Glasgow Super Meetup was a joint event between Glasgow Azure User Group, Glasgow SQL User Group and Scottish PowerShell & DevOps User Group. I did an AWS Athena Presentation to the group.

Speaking about AWS Athena at the Glasgow Super Meetup might seem like an odd choice since most attendees will use Azure heavily or be more interested in SQL Server, however I was pleasantly surprised by the interest that people took in the subject matter. It was only a lightning talk so there wasn’t time to answer questions however I was asked a number of questions during the break by attendees.

I showed how tables can be archived out of the database and into S3, at a fraction of the price yet the data can still be queried if needed using Athena. I stressed that Athena isn’t intended as a replacement for an RDBMS and as such, queries will be slower than SQL Server however it is much cheaper to store large amounts of data in flat files in object storage (such as S3), rather than expensive block storage which is used with databases. So if the use case fits, such as infrequently accessed archive data, then it is something to consider. I’ve uploaded my slides and also linked to a recording of the event. If you want to try the code, you’ll find it below.

Slides Recording

Demo

Description

As a proof of concept, I want to export the data from the Sales.SalesOrderHeader table in Adventureworks2012 to flat files using BCP. The data would be partitioned into unique days using the OrderDate column. This data is then exported to the local file system and then uploaded to Amazon S3. The next steps include creating a table in Athena, querying it to review the data and validating the correct data has been uploaded.

Code

Run select query with dynamic sql to generate PowerShell and BCP command. (Run query then select/copy full column and paste into PowerShell)
1. SELECT DISTINCT
  OrderDate,
  'New-Item -ItemType directory -Path C:\Users\jmccorma\Documents\SQL_to_S3_Demo\Output_Files\year='+CONVERT(varchar(4), OrderDate, 102)+'\month='+CONVERT(varchar(2), OrderDate, 101)+'\day='+CONVERT(varchar(2), OrderDate, 103)+' -ErrorAction SilentlyContinue' as PoSH_command,
  'bcp "SELECT SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, Status, OnlineOrderFlag, SalesOrderNumber, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, TotalDue, Comment, rowguid, ModifiedDate FROM [AdventureWorks2012].[Sales].[SalesOrderHeader] WHERE OrderDate = '''+convert(varchar, OrderDate, 23)+'''"
  queryout "c:\users\jmccorma\Documents\SQL_to_S3_Demo\Output_Files\year='+CONVERT(varchar(4), OrderDate, 102)+'\month='+CONVERT(varchar(2), OrderDate, 101)+'\day='+CONVERT(varchar(2), OrderDate, 103)+'\SalesOrderHeader.tsv" -c -t\t -r\n -T -S localhost\SQLEXPRESS' as bcp_command
  FROM [AdventureWorks2012].[Sales].[SalesOrderHeader]
Highlight column PoSH_command, copy and then paste into Powershell window
Highlight column bcp_command, copy and then paste into Powershell or command window
Upload from local file system to AWS S3. You must have an S3 bucket created for this and you must have configured an IAM user in AWS to do this programatically. You can upload manually using the AWS console if you prefer.
- aws s3 sync C:\SQL_to_S3_Demo\Output_Files s3://athena-demo-usergroup/ Change to your local file location and your s3 bucket
Create database and table in Athena (copy code into AWS console) and load partitions
- CREATE DATABASE adventureworks2012;
- -- Athena table created by John McCormack for Glasgow User Group
  CREATE EXTERNAL TABLE `SalesOrderHeader`(
  `SalesOrderID` INT,
  `RevisionNumber` TINYINT,
  `OrderDate` TIMESTAMP,
  `DueDate` TIMESTAMP,
  `ShipDate` TIMESTAMP,
  `Status` TINYINT,
  `OnlineOrderFlag` BOOLEAN,
  `SalesOrderNumber` STRING,
  `PurchaseOrderNumber` STRING,
  `AccountNumber` STRING,
  `CustomerID` INT,
  `SalesPersonID` INT,
  `TerritoryID` INT,
  `BillToAddressID` INT,
  `ShipToAddressID` INT,
  `ShipMethodID` INT,
  `CreditCardID` INT,
  `CreditCardApprovalCode` STRING,
  `CurrencyRateID` INT,
  `SubTotal` DECIMAL(12,4),
  `TaxAmt` DECIMAL(12,4),
  `Freight` DECIMAL(12,4),
  `TotalDue` DECIMAL(12,4),
  `Comment` STRING,
  `rowguid` STRING,
  `ModifiedDate` TIMESTAMP
  )
  PARTITIONED BY (
  `year` string,
  `month` string,
  `day` string)
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  ESCAPED BY '\\'
  LINES TERMINATED BY '\n'
  LOCATION
  's3://athena-demo-usergroup/'
  TBLPROPERTIES (
  'has_encrypted_data'='false',
  'skip.header.line.count'='0')
- MSCK REPAIR TABLE salesorderheader;
Run these queries in SSMS and Athena to review the data is the same
- This performs a row count and checks the sum of one particular column(territoryid). This is fairly rudimentary check and not guaranteed to be unique but it is a simple way of having a degree of confidence in the exported data.
- -- Validate Athena data is correct
  -- Athena
  SELECT COUNT(*) as row_count,SUM(territoryid) as column_sum FROM "adventureworks2012"."salesorderheader"
  WHERE year='2014'
  AND month = '01'
  AND day = '23';-- SQL Server
  SELECT COUNT(*) as row_count,SUM(territoryid) as column_sum FROM adventureworks2012.sales.salesorderheader
  WHERE OrderDate = '2014-01-23 00:00:00.000'
Now it is uploaded, you can query any way you like in Athena. It is worth noting that partitioning improves the performance of the query and makes the query cheaper because it scans less data. If partitioning data, you should use the partition key in your query otherwise it will scan all of data. Note the difference between the 2 queries below.
- -- Not using partition (12 seconds - scanned 7.53MB)
  SELECT * FROM "adventureworks2012"."salesorderheader"
  WHERE OrderDate = CAST('2014-01-23 00:00:00.000' as TIMESTAMP);-- Using Partition (1.8 seconds - scanned 15.55KB - 1/6 of the duration and 1/495 of cost)
  SELECT * FROM "adventureworks2012"."salesorderheader"
  WHERE year='2014'
  AND month = '01'
  AND day = '23';

Further resources:

John McCormack's blogs

Running SQL Server on RDS

Further interesting links:

Outstanding batch wouldn’t clear on sys.dm_fts_outstanding_batches

So what has the first month been like?

On-Premises

Azure VMs running SQL Server (IaaS)

Azure Analysis Services (PaaS)

Azure SQL Data Warehouse (Paas)

Azure SQL DB (PaaS)

Plans for next month

Further Reading

Why not just use Azure?

Differences/Similarities/Pros and Cons

RDS key facts

EC2 approach

Backups

Summary

Demo