logical query processing Archives

Logical query processing

Logical query processing describes the way in which the SQL Server processes a select query. This post used to be called “How does the SQL Server engine process my select query” but no-one was really searching for that. It was originally posted in Feb 2016, and was updated on September 2020.

Declarative Language

T-SQL is the dialect of SQL used with SQL Server and is a declarative language. A declarative language takes the user’s request (or query) and works out the best way how to return the data. This differs from an imperative language in which the programmer tells the program how it should process the statement. With this is mind, it should be fairly easy with some simple syntax just to tell SQL Server what we want and get the results back.

An example in English could be “Bring me all my black shoes from the cupboard”. In SQL (and with a bit of stretch of the imagination), this would be written as:

SELECT item FROM dbo.Cupboard
WHERE item = 'shoes'
AND colour = 'black';

Despite the way we write the query, the SQL Server engine has other ideas about the processing order. The table below shows the order in English versus the order in SQL.

ENGLISH	SQL SERVER
SELECT	FROM
FROM	WHERE
WHERE	SELECT

So conversely, if we were to say this in English, it would be something like “From the cupboard, find all the black shoes and bring them to me”.

Why is it done this way?

SQL Server uses Logical query processing. First, I should point out we will be discussing 6 main clauses (but there are more). They are listed below in logical order with a description. If you read all 6 descriptions, it should be fairly clear why SQL Server processes the select query in this order. e.g. You cannot select something if the place it supposed to be (a table) does not exist. So we go to the table first.

THE MAIN CLAUSES (In logical order)	DESCRIPTION
FROM	The FROM clause is executed first. This also includes JOIN which is a way of combining data from multiple tables into one result set. Thinking about this logically, the table(s) being queried must be identified first or we would have nothing to SELECT from.
WHERE	A clause such as `WHERE WHERE item = 'shoes'`means that rows are filtered. Rows are not returned if the item does not equal shoes. Remember you must refer to the column names rather than an alias you have given as the WHERE clause is evaluated before the SELECT.
GROUP BY	You can retrieve a set of all distinct row values based on what is included in the SELECT. e.g. `SELECT Item, Colour FROM [TestDB].[dbo].[Cupboard] GROUP BY Item,Colour` This can also be aggregated. In this example, I show how many times these combinations of values are repeated throughout the table by using COUNT(). `SELECT Item, Colour, Count() AS [Count] FROM [TestDB].[dbo].Cupboard] GROUP BY Item,Colour`
HAVING	The HAVING clause is similar to WHERE in that it filters the data based on a predicate. The main difference is that it is evaluated AFTER the data has been grouped (and so is evaluated per group) unlike a WHERE clause which you can see is evaluated before a GROUP BY. If we enhance the example above to include a HAVING clause (say for groups with a count > 1), the syntax would be: `SELECT Item, Colour, COUNT() as [Count] FROM [TestDB].[dbo].[Cupboard] GROUP BY Item,Colour HAVING COUNT() > 1`
SELECT	Whilst it is written at the beginning of the T-SQL query, the SELECT clause is evaluated near the end. Remember to include a DISTINCT after your SELECT clause if you want to ensure you return no duplicates. At this stage, you are returning a relational result. You have not guaranteed the order in which you want to see the result set. This means the same query could return the same results in a different order time and time again.
ORDER BY	The ORDER BY clause allows you to present the data in a deterministic way. e.g.[sql]ORDER BY [Count] Desc[/sql] Without an order by, there is no guarantee that the same query will return the rows in same order if executed more than once. Notice here that because this is evaluated last and unlike the WHERE clause, you can refer to column Aliases because they have now been created.