I recently came across a problem that required having to perform a calculation in a query that involved a value in the current row and a value in the previous row. The problem is that SQL queries perform operations on a row-by-row basis; accessing data on different rows at the same time requires the query to do some extra work.
In SQL Server versions prior to 2012, you need to perform a join using a row enumerator to match up rows with previous or next rows. In 2012 and higher, there are two functions, Lag() and Lead(), that greatly simplify the process.
I’ll describe the problem I ran into and show you how to implement a solution using both SQL Server versions 2005 – 2008 R2 and SQL Server 2012.
Set Up the Data
Before I get to the problem, here’s a script you can run to set up an environment for this example in case you want to follow along by doing. The script creates a database called UserManagement and then it creates and populates a table called UserActivity. When you’re done with this article, you can simply drop the UserManagement database to clean things up.
--Create the UserManagement database
CREATE DATABASE UserManagement
--Select the database
--Create the UserActivity in the dbo schema
CREATE TABLE dbo.UserActivity (
[User] int NOT NULL,
Activity varchar(6) NOT NULL,
ActivityTime datetime NOT NULL
--Insert data into the UserActivity table
INSERT dbo.UserActivity ([User], Activity, ActivityTime)
VALUES (1, 'LogOn', CAST('2013-01-01 10:30' AS datetime))
, (1, 'LogOff', CAST('2013-01-01 10:45' AS datetime))
, (1, 'LogOn', CAST('2013-01-01 13:15' AS datetime))
, (1, 'LogOff', CAST('2013-01-01 15:30' AS datetime))
, (1, 'LogOn', CAST('2013-01-01 16:15' AS datetime))
, (1, 'LogOff', CAST('2013-01-01 17:00' AS datetime))
, (1, 'LogOn', CAST('2013-01-02 08:00' AS datetime))
, (1, 'LogOff', CAST('2013-01-02 10:45' AS datetime))
, (1, 'LogOn', CAST('2013-01-02 13:00' AS datetime))
, (1, 'LogOff', CAST('2013-01-02 16:45' AS datetime))
, (2, 'LogOn', CAST('2013-01-01 8:25' AS datetime))
, (2, 'LogOff', CAST('2013-01-01 11:30' AS datetime))
, (2, 'LogOn', CAST('2013-01-01 12:35' AS datetime))
, (2, 'LogOff', CAST('2013-01-01 14:45' AS datetime))
To understand the problem, let’s look at the UserActivity table after running the script above. Notice that each row is either a logon or logoff activity and that, when shown in order by user and activity time, it’s logon, logoff, logon, logoff, and so on.
ORDER BY [user], ActivityTime
From this table we need to calculate the idle time per user per day—the time between a logon and the previous logoff for the same user on the same day. Take a moment to verify that, for user 1 on 1/1/2013 (the first 6 rows of the UserActivity table), there are two blocks of idle time; one is from 10:45 to 1:15 (2 ½ hr., or 150 min.), and the other is from 3:30 to 4:15 (45 min). Note that blocks of idle time that span two days will not be used.
To calculate the idle time, we’ll need to find the difference between the time in a logon row and the time in the previous row. That’s the obstacle to be surmounted, here. Remember, SQL queries perform operations on a row-by-row basis. This business of accessing previous or next rows requires some minor acrobatics in SQL Server 2008 R2 and previous versions. Let’s see how to do it.
A Solution Using SQLServer 2005, 2008 or 2008 R2
We’ll start by creating two derived tables—one holding only the logon rows in UserActivity and the other holding only the logoff rows. We’ll name them LogOns and LogOffs.
Next, we’ll need to join them so that a logon row is paired up with its logoff row. To match up a row with its previous row, we can enumerate the rows of the UserActivity table and include that enumeration in the two derived tables (rows in LogOn will all have odd row numbers and rows in LogOff will all have even row numbers). We can then join row x in LogOn to row x-1 in LogOff.
To make sure matched rows are for the same user and activity date, we will also include those two fields in the join condition when we write the query.
Here’s how it works:
- Find a CTE called UserActivityWRowNum in the query. This is just the UserActivity table with an additional Row_Number() column—the enumeration we need in order by User and ActivityTime. The Row_Number() function enumerates rows in the order specified in the ORDER BY clause in the OVER() clause.Here’s the data produced by the CTE with its RowNumber column:
- The CTE is used in the main query to create the two derived tables, LogOns and LogOffs.
- The join condition matches a logon time with the time of the previous logoff as long as the user and activity date are the same.
When we run the query, we can see the same two blocks of idle time for user 1 on 1/1/2013 that we saw at the very start:
That’s how you can do it in SQL Server 2005, 2008 or 2008 R2. If you’re using SQL Server 2012 or higher, it’s a little easier.
A Solution Using SQLServer 2012 or Higher
SQL Server 2012 introduced the new Lag() and Lead() functions that encapsulate all the logic of “reaching” back or forward respectively by any number of rows. We just need to tell it what value to retrieve from a previous row (either a column or an expression that references one or more columns), how many rows to go either back or forward, and the order of the rows or the “scale” over which it knows what a “previous” or “next” row is. We use parameters to tell it the first two. The order of the rows is specified using the OVER() clause with ORDER BY.
In the query below, the Lag() function is used to determine the idle time. It is the second argument of the DATEDIFF() function. We’re telling the Lag() function to retrieve the activity time from 1 row “back” using the same order of User and ActivityTime (the Lead() function is similar but returns values in rows that follow rather than precede the current row). Note the same blocks of idle time of 150 minutes and 45 minutes for user 1 on 1/1/2013.
WHEN 'Logon' THEN
DATEDIFF(minute, Lag(ActivityTime, 1) OVER(ORDER BY [User], ActivityTime), ActivityTime)
We can tweak this to get the same results we got in the join solution we saw earlier. One way of doing that is shown in the query below where we create a CTE that puts corresponding values on the same row and then filters the data and calculates the idle time in the main query.
; WITH UserActivityWPairedTimes
, ActivityDate = CAST(ActivityTime As DATE)
, LogOnUser = [User]
, LogOffUser = Lag([User], 1) OVER(ORDER BY [User], ActivityTime)
, LogOnActivityTime = ActivityTime
, LogOffActivityTime = Lag(ActivityTime, 1) OVER(ORDER BY [User], ActivityTime)
, LogOnActivityDate = CAST(ActivityTime AS DATE)
, LogOffActivityDate = CAST(Lag(ActivityTime, 1) OVER(ORDER BY [User], ActivityTime) AS DATE)
SELECT LogOnUser [User]
, LogOffActivityTime LogOff
, LogOnActivityTime LogOn
, DATEDIFF(minute, LogOffActivityTime, LogOnActivityTime) IdleTime
WHERE Activity = 'LogOn'
AND LogOffActivityTime Is Not Null
AND LogOnUser = LogOffUser
AND LogOnActivityDate = LogOffActivityDate
Have fun writing queries that access data in previous or next rows!