Recently I encountered a case when I knew much more about the data
than the optimizer. Originally the performance was horrible, this is
why I had to have a look at the query in the first place. When I was
able to share my knowledge with the optimizer, it produced a better
plan, and the query ran dramatically faster.
The slow query
The following tables store one-munite commercials for every minut
for one year, and customer calls, one call per minute, for the same
year. The scripts that populate tables with test data are provided at
the end of this post. Here are the tables:
CREATE TABLE dbo.Commercials(
StartedAt DATETIME NOT NULL
CONSTRAINT PK_Commercials PRIMARY KEY,
EndedAt DATETIME NOT NULL,
CommercialName VARCHAR(30) NOT NULL);
GO
CREATE TABLE dbo.Calls(CallID INT
CONSTRAINT PK_Calls NOT NULL PRIMARY KEY,
AirTime DATETIME NOT NULL,
SomeInfo CHAR(300));
GO
CREATE UNIQUE INDEX Calls_AirTime
ON dbo.Calls(AirTime) INCLUDE(SomeInfo);
GO
Every commercial in my table lasts for at most one minute, and they
do not overlap. I can easily enforce both conditions with constraints (
Storing intervals of time with no overlaps"
), which are omitted in this post just to keep it simple.
The following query retrieves only 181 rows, and it runs very slowly:
SELECT s.StartedAt, s.EndedAt, c.AirTime
FROM dbo.Commercials s JOIN dbo.Calls c
ON c.AirTime >= s.StartedAt AND c.AirTime < s.EndedAt
WHERE c.AirTime BETWEEN '20080701' AND '20080701 03:00'
Table 'Calls'. Scan count 1, logical reads 11, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
Table 'Worktable'. Scan count 2,
logical reads 3338264, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table
'Commercials'. Scan count 2, logical reads 7166, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads
0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob
physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 71704 ms, elapsed time = 36316 ms.
Why is it so slow? I haven't mastered the fine art of adding images
to my posts yet, so I have to explain verbally. For every call the DB
engine scans all the commercials which begin before the time of the
call, which is expensive. The reason is simple: the optimizer does not
know that the commercials are short, and that the commercials do not
overlap, so it must scan all the potential matches, which are all the
commercials which begin before the time of the call.
Using CROSS APPLY to tell the optimizer that commercials do not overlap.
Because commercials do not overlap, we need at most one match.
Translating this information into plain SQL is easy, and the query runs
dramatically faster:
SELECT s.StartedAt, s.EndedAt, c.AirTime
FROM dbo.Calls c CROSS APPLY(
SELECT TOP 1 s.StartedAt, s.EndedAt FROM dbo.Commercials s
WHERE c.AirTime >= s.StartedAt AND c.AirTime < s.EndedAt
ORDER BY s.StartedAt DESC) AS s
WHERE c.AirTime BETWEEN '20080701' AND '20080701 03:00'
Table 'Commercials'. Scan count 181,
logical reads 1327, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table
'Calls'. Scan count 1, logical reads 11, physical reads 0, read-ahead
reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead
reads 0.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 31 ms.
Note: if you needed only one column from Commercials
table, you could easily use just a subquery. Because more than one
column is needed, CROSS APPLY is a better, a more performant choice choice without redundant code.
Note: If you are using the assumption that the commercials do not overlap, you have to
enforce that business rule in the database. Also to make sure that you don't forget that your query relies on that assumption, use a unit test do document it.
Also let me put it differently: If you are using the
assumption that the commercials do not overlap, use a unit test do
document it, so that that you don't forget that your query relies on
that assumption. Also you have to
enforce that business rule in the database.
Using another range condition to tell the optimizer that commercials are short.
Because commercials are short, there is no need to scan the
commercials that start more than maximum commercial's length before the
call. Again, translating this information into plain SQL is quite easy
too, and again the query runs much faster, even faster than the
previous one:
SELECT s.StartedAt, s.EndedAt, c.AirTime
FROM dbo.Commercials s JOIN dbo.Calls c
ON c.AirTime >= s.StartedAt AND c.AirTime < s.EndedAt
WHERE c.AirTime BETWEEN '20080701' AND '20080701 03:00'
AND s.StartedAt BETWEEN '20080630 23:45' AND '20080701 03:00'
Table 'Worktable'. Scan count 1,
logical reads 753, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table
'Calls'. Scan count 1, logical reads 11, physical reads 0, read-ahead
reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead
reads 0.
Table 'Commercials'. Scan count 1, logical reads 4,
physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical
reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 24 ms.
Note: If you are using the assumption that the commercials are short, you have to enforce that business rule in the database.
Also to make sure that you don't forget that your query relies on that
assumption, use a unit test do document it.
Also let me put it differently: If you are using the assumption that the commercials are short, use a unit test do document that.
Also you have to enforce that business rule in the database.
Setting up tables and test data
CREATE TABLE dbo.Numbers(n INT NOT NULL PRIMARY KEY)
GO
DECLARE @i INT;
SET @i = 1;
INSERT INTO dbo.Numbers(n) SELECT 1;
WHILE @i<1024000 BEGIN
INSERT INTO dbo.Numbers(n)
SELECT n + @i FROM dbo.Numbers;
SET @i = @i * 2;
END;
GO
INSERT INTO dbo.Commercials(StartedAt, EndedAt, CommercialName)
SELECT DATEADD(minute, n - 1, '20080101')
,DATEADD(minute, n, '20080101')
,'Show #'+CAST(n AS VARCHAR(6))
FROM dbo.Numbers
WHERE n<=24*365*60;
GO
INSERT INTO dbo.Calls(CallID,
AirTime,
SomeInfo)
SELECT n
,DATEADD(minute, n - 1, '20080101')
,'Call during Commercial #'+CAST(n AS VARCHAR(6))
FROM dbo.Numbers
WHERE n<=24*365*60;
GO
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/07/07/using-cross-apply-to-optimize-joins-on-between-conditions.aspx