Data Scotland 2019 presentation:
My Data Scotland 2019 presentation: You can run SQL Server on AWS
My Data Scotland 2019 presentation: You can run SQL Server on AWS
I presented on running SQL Server on AWS at DataGrillen in Germany. Specifically, I covered the differences between RDS and EC2, migration options and the AWS CLI. Despite being a daunting prospect, I enjoyed speaking at a major conference for the first time.
I received a few questions and managed to answer most. For the one I could not remember, I promised to provide an answer.
I was asked who provides the SSL certificates when using SSL connections with your RDS instance. The answer: When you create a SQL Server DB instance, Amazon RDS creates an SSL certificate for it. The SSL certificate includes the DB instance endpoint as the Common Name (CN) for the SSL certificate to guard against spoofing attacks. Read more.
Link to slides, please feel free to view or download.
DataGrillen is a community Data Platform event which is free to attend and draws speakers and attendees from across the world. It is so inclusive and welcoming, it’s easy to see why it’s a can’t miss event for so many many people.
Links to my other AWS content
I presented on Running SQL Server on RDS at SQLBits. It was a lightning talk, so only 5 minutes.
You can download my slides from here:
How do I run SQL Server on AWS?
Running SQL Server on AWS can be done in 2 ways.
Azure may be the right choice for you. This purpose of this post is to show that you do have options and that Azure is not the only choice. Really, you should weigh up what you need from your cloud provider and make a choice. Managing a multi-cloud environment can be challenging. If you are working for a company already heavily invested it AWS, it may make more sense to stay with AWS. Even if not, AWS’s SQL Server offering is mature and RDS’s features are improving all the time so you should consider what you can achieve with AWS.
|Use database as source or target for SSIS||✔||✔|
|Full control over the instance||✔|
|Always On Availability groups||✔||✔|
|AWS-managed Multi AZ deployment||✔|
A full list of included and excluded features can be viewed at https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_SQLServer.html
EC2, you are responsible for all of your own backups. Backups can be pushed to S3 to save reduce costs. I have a full post dedicated to SQL Server backups on EC2. You could also store backups on EBS volumes attached to your instance however this is much more expensive than storing on S3.
RDS will handle backups for you. You can restore to a point in time in RDS withing the backup window you configure. Transaction logs are taken every 5 minutes so you may experience data loss up to 5 minutes. All databases are restored with 1 second of each other. The downside to this is that transactions spanning multiple databases may not be recovered consistently.
Features are being added to RDS all the time however if you need features in the excluded list, you would need to go with an EC2 installation. However if your business size doesn’t allow for DBAs or database specialists, RDS may be the best option as AWS take care of all the installation and maintenance for you.
The Glasgow Super Meetup was a joint event between Glasgow Azure User Group, Glasgow SQL User Group and Scottish PowerShell & DevOps User Group. I did an AWS Athena Presentation to the group.
Speaking about AWS Athena at the Glasgow Super Meetup might seem like an odd choice since most attendees will use Azure heavily or be more interested in SQL Server, however I was pleasantly surprised by the interest that people took in the subject matter. It was only a lightning talk so there wasn’t time to answer questions however I was asked a number of questions during the break by attendees.
I showed how tables can be archived out of the database and into S3, at a fraction of the price yet the data can still be queried if needed using Athena. I stressed that Athena isn’t intended as a replacement for an RDBMS and as such, queries will be slower than SQL Server however it is much cheaper to store large amounts of data in flat files in object storage (such as S3), rather than expensive block storage which is used with databases. So if the use case fits, such as infrequently accessed archive data, then it is something to consider. I’ve uploaded my slides and also linked to a recording of the event. If you want to try the code, you’ll find it below.
As a proof of concept, I want to export the data from the Sales.SalesOrderHeader table in Adventureworks2012 to flat files using BCP. The data would be partitioned into unique days using the OrderDate column. This data is then exported to the local file system and then uploaded to Amazon S3. The next steps include creating a table in Athena, querying it to review the data and validating the correct data has been uploaded.
-- 1. Query the tables and generate PoSH and BCP commands SELECT DISTINCT OrderDate, 'New-Item -ItemType directory -Path C:\Users\jmccorma\Documents\SQL_to_S3_Demo\Output_Files\year='+CONVERT(varchar(4), OrderDate, 102)+'\month='+CONVERT(varchar(2), OrderDate, 101)+'\day='+CONVERT(varchar(2), OrderDate, 103)+' -ErrorAction SilentlyContinue' as PoSH_command, 'bcp "SELECT SalesOrderID, RevisionNumber, OrderDate, DueDate, ShipDate, Status, OnlineOrderFlag, SalesOrderNumber, PurchaseOrderNumber, AccountNumber, CustomerID, SalesPersonID, TerritoryID, BillToAddressID, ShipToAddressID, ShipMethodID, CreditCardID, CreditCardApprovalCode, CurrencyRateID, SubTotal, TaxAmt, Freight, TotalDue, Comment, rowguid, ModifiedDate FROM [AdventureWorks2012].[Sales].[SalesOrderHeader] WHERE OrderDate = '''+convert(varchar, OrderDate, 23)+'''" queryout "c:\users\jmccorma\Documents\SQL_to_S3_Demo\Output_Files\year='+CONVERT(varchar(4), OrderDate, 102)+'\month='+CONVERT(varchar(2), OrderDate, 101)+'\day='+CONVERT(varchar(2), OrderDate, 103)+'\SalesOrderHeader.tsv" -c -t\t -r\n -T -S localhost\SQLEXPRESS' as bcp_command FROM [AdventureWorks2012].[Sales].[SalesOrderHeader]
aws s3 sync C:\SQL_to_S3_Demo\Output_Files s3://athena-demo-usergroup/Change to your local file location and your s3 bucket
CREATE DATABASE adventureworks2012;
-- Athena table created by John McCormack for Glasgow User Group CREATE EXTERNAL TABLE `SalesOrderHeader`( `SalesOrderID` INT, `RevisionNumber` TINYINT, `OrderDate` TIMESTAMP, `DueDate` TIMESTAMP, `ShipDate` TIMESTAMP, `Status` TINYINT, `OnlineOrderFlag` BOOLEAN, `SalesOrderNumber` STRING, `PurchaseOrderNumber` STRING, `AccountNumber` STRING, `CustomerID` INT, `SalesPersonID` INT, `TerritoryID` INT, `BillToAddressID` INT, `ShipToAddressID` INT, `ShipMethodID` INT, `CreditCardID` INT, `CreditCardApprovalCode` STRING, `CurrencyRateID` INT, `SubTotal` DECIMAL(12,4), `TaxAmt` DECIMAL(12,4), `Freight` DECIMAL(12,4), `TotalDue` DECIMAL(12,4), `Comment` STRING, `rowguid` STRING, `ModifiedDate` TIMESTAMP ) PARTITIONED BY ( `year` string, `month` string, `day` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ESCAPED BY '\\' LINES TERMINATED BY '\n' LOCATION 's3://athena-demo-usergroup/' TBLPROPERTIES ( 'has_encrypted_data'='false', 'skip.header.line.count'='0')
MSCK REPAIR TABLE salesorderheader;
-- Validate Athena data is correct -- Athena SELECT COUNT(*) as row_count,SUM(territoryid) as column_sum FROM "adventureworks2012"."salesorderheader" WHERE year='2014' AND month = '01' AND day = '23';-- SQL Server SELECT COUNT(*) as row_count,SUM(territoryid) as column_sum FROM adventureworks2012.sales.salesorderheader WHERE OrderDate = '2014-01-23 00:00:00.000'
-- Not using partition (12 seconds - scanned 7.53MB) SELECT * FROM "adventureworks2012"."salesorderheader" WHERE OrderDate = CAST('2014-01-23 00:00:00.000' as TIMESTAMP);-- Using Partition (1.8 seconds - scanned 15.55KB - 1/6 of the duration and 1/495 of cost) SELECT * FROM "adventureworks2012"."salesorderheader" WHERE year='2014' AND month = '01' AND day = '23';