Introduction to SQL Window Functions / 2
As we have seen in the previous part, window functions are an usefull tool to solve a lot of SQL problems with great performances. Let’s see some more tricks.
LAG and LEAD revised Link to heading
We have already discussed the window functions LAG
and LEAD
. As we have said, they are used to get respectively the previous or next value in the window. What we haven’t mentioned is that you can not only take the previous, but also the record before (or after) it and so on. The syntax is more generally LAG(expression, n)
, where using LAG(expression)
means n = 1
. So we can write queries like these:
SELECT
article_id,
year,
quantity,
LAG(quantity) OVER (PARTITION BY article_id
ORDER BY year) AS prevYear,
LAG(quantity, 2) OVER (PARTITION BY article_id
ORDER BY year) AS twoYearsAgo
FROM Sales s1
ORDER BY year, article_id
Try doing that without window functions!
ROW NUMBER Link to heading
The window function ROW_NUMBER
assigns an unique number to each record in the window. You might have already used this function without a window definition, but it actually is a window function. Notice that the number assigned to records with the same position in the window is random.
Let’s take a look at this query:
SELECT
article_id,
year,
quantity,
ROW_NUMBER() OVER (PARTITION BY article_id ORDER BY quantity)
FROM Sales s1
ORDER BY article_id, year
The data returned by this query is:
ARTICLE_ID YEAR QUANTITY ROW_NUMBER
Jeans 2008 31 1
Jeans 2009 33 3
Jeans 2010 36 4
Jeans 2011 33 2
Jeans 2012 38 5
Jeans 2013 41 7
Jeans 2014 40 6
Shirts 2008 63 1
Shirts 2009 66 2
Shirts 2010 68 3
Shirts 2011 72 6
Shirts 2012 69 4
Shirts 2013 71 5
Shirts 2014 74 7
It isn’t predictable which values Jeans 2009 and Jeans 2011 will have, since they both have the same quantity
.
RANK and DENSE RANK Link to heading
The window function RANK
will give the logical position of the record in the window. The difference between it and ROW_NUMBER
is records that order in the same position will have the same rank.
RANK
will skip missing values, meaning that you might have a sequence like 1, 2, 2, 4. DENSE_RANK
is almost identical, except that it won’t skip position, so it would return 1, 2, 2, 3. Because of this, RANK
might perform slightly faster in some situations. Let’s see the following query:
SELECT
article_id,
year,
quantity,
RANK() OVER (PARTITION BY article_id ORDER BY quantity),
DENSE_RANK() OVER (PARTITION BY article_id ORDER BY quantity)
FROM Sales s1
ORDER BY article_id, quantity
The results are:
ARTICLE_ID YEAR QUANTITY RANK DENSE_RANK
Jeans 2008 31 1 1
Jeans 2011 33 2 2
Jeans 2009 33 2 2
Jeans 2010 36 4 3
Jeans 2012 38 5 4
Jeans 2014 40 6 5
Jeans 2013 41 7 6
Shirts 2008 63 1 1
Shirts 2009 66 2 2
Shirts 2010 68 3 3
Shirts 2012 69 4 4
Shirts 2013 71 5 5
Shirts 2011 72 6 6
Shirts 2014 74 7 7
FIRST VALUE and LAST VALUE Link to heading
The functions FIRST_VALUE
and LAST_VALUE
will return, well, the first and last value of an expression over the window. The difference between them and MIN
or MAX
is that you can select the first value of an expression, but apply an ORDER BY
another expression! Sample time:
SELECT
article_id,
year,
quantity,
FIRST_VALUE(quantity) OVER (PARTITION bY article_id
ORDER BY year)
AS firstYear
FROM Sales s1
ORDER BY article_id, quantity
ARTICLE_ID YEAR QUANTITY FIRSTYEAR
Jeans 2008 31 31
Jeans 2009 33 31
Jeans 2011 33 31
Jeans 2010 36 31
Jeans 2012 38 31
Jeans 2014 40 31
Jeans 2013 41 31
Shirts 2008 63 63
Shirts 2009 66 63
Shirts 2010 68 63
Shirts 2012 69 63
Shirts 2013 71 63
Shirts 2011 72 63
Shirts 2014 74 63