Guides Archives - John McCormack DBA

Using ChatGPT for the first time

7th December 2022 By John McCormack Leave a Comment

I played with ChatGPT for the first time today. For those who don’t know what ChatGPT is, I asked it and this was the response.

ChatGPT is an open-source chatbot framework that enables developers to quickly create natural language understanding chatbots using the GPT-3 language model.
https://beta.openai.com/playground

If that still doesn’t make sense, I asked it to explain in a less technical style.

ChatGPT is a tool that makes it easy for developers to create chatbots that can understand and respond to natural language conversations.
https://beta.openai.com/playground

[Read more…]

What is AWS Athena and why is it awesome?

27th January 2021 By John McCormack 3 Comments

AWS Athena

This post answers “What is AWS Athena” and gives an overview of what AWS Athena is and some potential use cases. I discuss in simple terms how to optimize your AWS Athena configuration for cost effectiveness and performance efficiency, both of which are pillars of the AWS Well Architected Framework. My Slides | AWS White Paper.

This post was originally published on March 2018, and has subsequently been updated.

AWS’s own documentation is the best place for full details on the Athena offering, this post hopes to serve as further explanation and also act as an anchor to some more detailed information. As it is a managed service, Athena requires no administration, maintenance or patching. It’s not designed for regular querying of tables in a way that you would with an RDBMS. Performance is geared around querying large data sets which may include structured data or semi-structured data. There are no licensing costs like you may have with some Relational Database Management Systems (RDBMS) such as SQL Server and costs are kept low, as you only pay when you run queries in AWS Athena.

More info on AWS Athena

Athena is a serverless interactive query service provided by AWS to query flat files in S3. It allows users to query static files, such as CSVs (which are stored in AWS S3) using SQL Syntax. The queries are made using ANSI SQL so many existing users of database technologies such as SQL Server or MySQL can adapt quickly to using ANSI. New users can learn the commands easily.

How does it save me money?

“Object based storage” like Amazon S3 is a lot cheaper than “block based storage” such as EBS. This means you can store large data sets as CSV files on Amazon S3 at a fraction of the price it would cost to store the data using EBS or in a relational database. You are then charged for each query (currently $5 per 5TB scanned). Clever use of compression and partitioning can reduce the amount of data scanned, meaning queries will be cheaper. AWS Athena is described as serverless which means the end user doesn’t need to manage or administer any servers, this is all done by AWS.

Save more using compression, partitioning and columnar data formats

If you notice from the previous paragraph that the query cost is $5 per 5TB scanned so the pricing is quite straightforward. Athena uses per megabyte charging, with a 10MB minimum. You can save by compressing, partitioning and/or converting data to a columnar format. The less data that needs to be scanned, the cheaper the query.

Compression
- As Athena natively reads compressed files, the same query that works against a CSV file will also work against data compressed into one of the following formats:
  - Snappy (.snappy)
  - Zlib (.bz2)
  - LZO
  - GZIP (.gz)
- As less data is scanned, the overall cost is lower
Partitioning
- Tables can be partitioned on any key. e.g. OrderDate
- If the query can use the key, there is no need to scan all the other partitions, only the relevant partition needs to be scanned.
- Compression and partitioning can be used together to further reduce the amount of scanned data.
Converting to columnar
- Columnar formats such as ORC and Parquet are supported
- Converting may add complexity to your workload
- However it will save money on querying due to the columnar format, data scanned is reduced and speed is improved
- Here’s a tutorial, it will require an intermediate knowledge of EMR

Use Cases

Apache Web Logs
AWS CloudWatch logs
System error logs
Huge, infrequently accessed data sets which were extracted to a flat file format in S3
Ad hoc querying of CSV files

Why is AWS Athena Awesome?

There is no infrastructure to configure
You only pay for what you scan
If you compress, partition and convert your data into columnar formats, you can save up to 90%
ANSI SQL Language is easy to learn or adapt from a dialect such as T-SQL
Athena integrates with Glue to automate your ETL

60 minute azure cost optimization review

22nd January 2021 By John McCormack Leave a Comment

If you have not had an initial consultation with me, please book a free 15 minute call with me. 60 minute azure cost optimization is only currently available for Microsoft Azure customers.

Free Advice

You can follow my blog series on cost optimization to review your own Azure subscription free of charge. My posts are heavily aimed towards data professionals, however the principles can be applied to almost all products in Azure.

Hourglass representing a 60 minute azure cost optimization session.

Delete unused instances to save money in Azure

22nd January 2021 By John McCormack Leave a Comment

My final tip in my series 7 ways for data teams to save money in Azure is to delete unused instances (and orphaned storage) to save money in Azure.

Delete unused instances

It’s easy to spin up instances in Azure, especially for developers if you haven’t implemented a tight permissions policy. Pet projects and Proof of Concepts (POCs) get created regularly and the person involved needs to remember to delete all resources or else you will be charged indefinitely.

Even services which can be stopped and deallocated such as Virtual Machines will cost you money if they are not deleted. Why? If you don’t delete unused instances in Azure, you still pay for all of the associated costs including the storage which can be significant. Moreover, you need to specifically delete your storage when your delete those VMs or you’ll keep paying for that too. That is known as orphaned storage or unattached storage, and it is not cheap.

A script to find unattached storage

Microsoft published a PowerShell script to find unattached storage in azure which you can run to list out anywhere it occurs in your subscriptions.

Use resource groups for POCs

I would encourage developers and engineers to use a resource group for POCs. This way, when it is done, the whole resource group can be obliterated in one go, leaving nothing behind.

Azure policies and tagging

Once you have got rid of all your old unused instances and orphaned storage, you may want to prevent team members from doing the same thing again. This is where a combination of Azure policies and tagging comes in handy. You can set policies to prevent new services being spun up without the appropriate tags. This Microsoft tutorial explains how to manage tag governance with Azure Policy.

Save with reserved instances – even up to 73%

21st January 2021 By John McCormack Leave a Comment

This post on how to save money with Azure Reserved instances is part of a 7 part series on cost optimisation with Azure.

Save with reserved instances

Did you say “Save 73% with reserved instances”? I did. In fact, Azure even advertise you can save up to 80% if you combine reserved instances (RI) with Azure Hybrid Benefit (AHB). So what do we mean by reserved instances?

What is a reserved instance?

A reserved instance is a way of reserving capacity in Azure. You are offered a discount by agreeing to a 1 or 3 year commitment. You can choose to pay an element up from to reduce your monthly bill, or you can just pay monthly. Paying monthly does not cost more than paying a portion of the cost upfront so it comes down to a preference for you or your company. Savings are made against compute costs, but not licence costs such as SQL Server Enterprise.

When would I reserve

If you know that you would need an instance for 12 months or longer, it is usually worth taking a reserved instance. A typical scenario for this would be a production SQL Server running in Azure VM or a production SQL Database. If you have a stable workload that you don’t see changing in the near future, reservations are a good bet.

Save with Reserved Instances (Examples)

For this example, you are running SQL Server enterprise on E8s 4s VMs in UK West. You have a cluster of 3 to support your availability groups.

Reservation	Compute cost	Windows licence cost	SQL licence cost	Total monthly cost
Pay as you go	£995.66	£600.66	£4,896.68	£6,493.04
1 yr reserved	£587.31	£600.66	£4,896.68	£6,084.65
3 yr reserved	£378.25	£600.66	£4,896.68	£5,875.59

_{^{3 x E8s v4 (8 vCPUs, 64 GB RAM); Windows – SQL Server; 3 year reserved; 0 managed OS disks – S4, 100 transaction units; Inter-region transfer type, 5 GB outbound data transfer from UK West to East Asia}}

Alternatively, look at the potential savings with Azure SQL DB. In this example, we are looking at a SQL DB, 8 vCore provisioned instance in UK West on General Purpose tier.

Reservation	Compute cost	SQL licence cost	Total monthly cost
Pay as you go	£828.17	£435.11	£1,263.29
1 yr reserved	£538.11	£435.11	£973.22
3 yr reserved	£372.67	£435.11	£807.78

^{1 x Single Database, vCore, RA-GRS backup storage, General Purpose, Provisioned, Gen 5, Local redundancy, 1 8 vCore instance(s) x 730 Hours, 32 GB Storage, 0 GB backup storage}

I hesitated to put prices on here because they will change more frequently than I can update the blog but it gives you a general idea. For up to date pricing, please always refer to the Azure Pricing Calculator.

Reserved Instance Allocation

Thankfully, you don’t need to specify which instance uses the allocation, it is just available at an hourly grain level. If you have a live instance that matches your reservation, the reservation credits are used for billing. If you drop an instance and you have another eligible instance type, it will automatically uses the allocation. The allocation is on a use it or lose it basis. If you have a reservation but no suitable resource to use the allocation, you lose it. If you have more instances than reservations, you pay on PAYG terms for anything above your reservation levels.

What type of resources can I reserve

As this post is directed at Azure SQL Data professionals; it is worth pointing out Virtual machines, SQL Database, Azure Synapse Analytics and Azure Blob Storage. It’s also encouraging to see discounts available for MySQL, MariaDB and PostgreSQL.

Azure services eligible to be reserved instances

What if my needs change?

Despite being a long term commitment, Reserved Instances are fairly flexible as Microsoft acknowledges that customers’ needs will change from time to time. You can exchange or cancel reservations as your needs change up to limits imposed by Microsoft.