Monday, January 14, 2013

Using CROSS APPLY to optimize joins on BETWEEN conditions


Recently I encountered a case when I knew much more about the data than the optimizer. Originally the performance was horrible, this is why I had to have a look at the query in the first place. When I was able to share my knowledge with the optimizer, it produced a better plan, and the query ran dramatically faster.

The slow query

The following tables store one-munite commercials for every minut for one year, and customer calls, one call per minute, for the same year. The scripts that populate tables with test data are provided at the end of this post. Here are the tables:

CREATE TABLE dbo.Commercials(
  
StartedAt DATETIME NOT NULL 
   
CONSTRAINT PK_Commercials PRIMARY KEY,
  
EndedAt DATETIME NOT NULL,
  
CommercialName VARCHAR(30) NOT NULL); GO
CREATE TABLE dbo.Calls(CallID INT 
  CONSTRAINT 
PK_Calls NOT NULL PRIMARY KEY,
  
AirTime DATETIME NOT NULL,
  
SomeInfo CHAR(300)); GO
CREATE UNIQUE INDEX Calls_AirTime
  ON dbo.Calls(AirTimeINCLUDE(SomeInfo); GO

Every commercial in my table lasts for at most one minute, and they do not overlap. I can easily enforce both conditions with constraints (
Storing intervals of time with no overlaps" ), which are omitted in this post just to keep it simple.
The following query retrieves only 181 rows, and it runs very slowly:
 SELECT s.StartedAts.EndedAtc.AirTime
FROM dbo.Commercials s JOIN dbo.Calls c 
  ON c.AirTime >= s.StartedAt AND c.AirTime s.EndedAt
WHERE c.AirTime BETWEEN '20080701' AND '20080701 03:00'
  
 Table 'Calls'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 2, logical reads 3338264, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Commercials'. Scan count 2, logical reads 7166, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 71704 ms,  elapsed time = 36316 ms.
Why is it so slow? I haven't mastered the fine art of adding images to my posts yet, so I have to explain verbally. For every call the DB engine scans all the commercials which begin before the time of the call, which is expensive. The reason is simple: the optimizer does not know that the commercials are short, and that the commercials do not overlap, so it must scan all the potential matches, which are all the commercials which begin before the time of the call.

Using CROSS APPLY  to tell the optimizer that commercials do not overlap.

 Because commercials do not overlap, we need at most one match. Translating this information into plain SQL is easy, and the query runs dramatically faster:

SELECT s.StartedAts.EndedAtc.AirTime
FROM dbo.Calls c CROSS APPLY(
  
SELECT TOP 1 s.StartedAts.EndedAt FROM dbo.Commercials s 
  WHERE c.AirTime >= s.StartedAt AND c.AirTime s.EndedAt
  ORDER BY s.StartedAt DESCAS s
WHERE c.AirTime BETWEEN '20080701' AND '20080701 03:00'


Table 'Commercials'. Scan count 181, logical reads 1327, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Calls'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 31 ms.

Note: if you needed only one column from Commercials table, you could easily use just a subquery. Because more than one column is needed, CROSS APPLY is a better, a more performant choice choice without redundant code.
Note: If you are using the assumption that the commercials do not overlap, you have to enforce that business rule in the database. Also to make sure that you don't forget that your query relies on that assumption, use a unit test do document it.
Also let me put it differently: If you are using the assumption that the commercials do not overlap, use a unit test do document it, so that that you don't forget that your query relies on that assumption. Also you have to enforce that business rule in the database.

Using another range condition to tell the optimizer that commercials are short.

 Because commercials are short, there is no need to scan the commercials that start more than maximum commercial's length before the call. Again, translating this information into plain SQL is quite easy too, and again the query runs much faster, even faster than the previous one:

SELECT s.StartedAts.EndedAtc.AirTime
FROM dbo.Commercials s JOIN dbo.Calls c 
  ON c.AirTime >= s.StartedAt AND c.AirTime s.EndedAt
WHERE c.AirTime BETWEEN '20080701' AND '20080701 03:00' AND s.StartedAt BETWEEN '20080630 23:45' AND '20080701 03:00'
 

Table 'Worktable'. Scan count 1, logical reads 753, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Calls'. Scan count 1, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Commercials'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 24 ms.

Note: If you are using the assumption that the commercials are short, you have to enforce that business rule in the database. Also to make sure that you don't forget that your query relies on that assumption, use a unit test do document it.
Also let me put it differently: If you are using the assumption that the commercials are short, use a unit test do document that. Also you have to enforce that business rule in the database. 


Setting up tables and test data

CREATE TABLE dbo.Numbers(INT NOT NULL PRIMARY KEY) GO
DECLARE @i INT; SET @i 1; INSERT INTO dbo.Numbers(nSELECT 1; WHILE @i<1024000 BEGIN
  INSERT INTO 
dbo.Numbers(n)
    
SELECT @i FROM dbo.Numbers;
  
SET @i @i 2; END; GO
INSERT INTO dbo.Commercials(StartedAtEndedAtCommercialName) SELECT DATEADD(minute1'20080101')
   ,
DATEADD(minuten'20080101')
   ,
'Show #'+CAST(AS VARCHAR(6))
  
FROM dbo.Numbers
  WHERE n<=24*365*60; GO
INSERT INTO dbo.Calls(CallID,
  
AirTime,
  
SomeInfo) SELECT 
   ,DATEADD(minute1'20080101')
   ,
'Call during Commercial #'+CAST(AS VARCHAR(6))
  
FROM dbo.Numbers
  WHERE n<=24*365*60; GO
                 
             

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/07/07/using-cross-apply-to-optimize-joins-on-between-conditions.aspx

No comments:

Post a Comment