This is a short post which aims to answer the question “How does the SQL Server engine process my select query”?
T-SQL is the dialect of SQL used with SQL Server and is a declarative language. A declarative language takes the user’s request (or query) and works out the best way how to return the data. This differs from an imperative language in which the programmer tells the program how it should process the statement. With this is mind, it should be fairly easy with some simple syntax just to tell SQL Server what we want and get the results back.
An example in English could be “Bring me all my black shoes from the cupboard”. In SQL (and with a bit of stretch of the imagination), this would be written as:
SELECT item FROM dbo.Cupboard WHERE item = 'shoes' AND colour = 'black';
Despite the way we write the query, the SQL Server engine has other ideas about the processing order. The table below shows the order in English versus the order in SQL.
So conversely, if we were to say this in English, it would be something like “From the cupboard, find all the black shoes and bring them to me”.
Why is it done this way?
First, I should point out we will be discussing 6 main clauses (but there are more). They are listed below in logical order with a description. If you read all 6 descriptions, it should be fairly clear why SQL Server processes the select query in this order. e.g. You cannot select something if the place it supposed to be (a table) does not exist. So we go to the table first.
|THE MAIN CLAUSES
(In logical order)
|FROM||The FROM clause is executed first. This also includes JOIN which is a way of combining data from multiple tables into one result set. Thinking about this logically, the table(s) being queried must be identified first or we would have nothing to SELECT from.|
|WHERE||A clause such as
|GROUP BY||You can retrieve a set of all distinct row values based on what is included in the SELECT. e.g.
SELECT Item, Colour FROM [TestDB].[dbo].[Cupboard] GROUP BY Item,ColourThis can also be aggregated. In this example, I show how many times these combinations of values are repeated throughout the table by using COUNT(*).
SELECT Item, Colour, Count(*) AS [Count] FROM [TestDB].[dbo].[Cupboard] GROUP BY Item,Colour
|HAVING||The HAVING clause is similar to WHERE in that it filters the data based on a predicate. The main difference is that it is evaluated AFTER the data has been grouped (and so is evaluated per group) unlike a WHERE clause which you can see is evaluated before a GROUP BY.
If we enhance the example above to include a HAVING clause (say for groups with a count > 1), the syntax would be:
SELECT Item, Colour, COUNT(*) as [Count] FROM [TestDB].[dbo].[Cupboard] GROUP BY Item,Colour HAVING COUNT(*) > 1
|SELECT||Whilst it is written at the beginning of the T-SQL query, the SELECT clause is evaluated near the end. Remember to include a DISTINCT after your SELECT clause if you want to ensure you return no duplicates.
At this stage, you are returning a relational result. You have not guaranteed the order in which you want to see the result set. This means the same query could return the same results in a different order time and time again.
|ORDER BY||The ORDER BY clause allows you to present the data in an order that you deem suitable. e.g.
Notice here that because this is evaluated last and unlike the WHERE clause, you can refer to column Aliases because they have now been created.