Calcite | Level 5

## Calculate difference between two dates in the same column conditional to other columns

I have stacked user actions in a table and would like to calculate the number of seconds between each action for each user within that day. The issue is multiple users can access at once, so their actions are stacked, and I would like to only calculate that difference if the actions occurred on the same day.

example:

data have:

 id Timestamp User date 1390267 2/19/2018 20:01:33 Rick 2/19/2018 1390273 2/19/2018 20:02:36 Mike 2/19/2018 1390276 2/19/2018 20:03:15 Rick 2/19/2018 1390763 2/19/2018 22:33:49 Fred 2/19/2018 1390979 2/19/2018 22:54:18 Suzi 2/19/2018 1391119 2/19/2018 23:12:39 Fred 2/19/2018 1391124 2/19/2018 23:13:21 Fred 2/19/2018 1392279 2/20/2018 22:54:00 Suzi 2/20/2018 1391339 2/20/2018 23:12:00 Fred 2/20/2018

data want:

 id Timestamp User date Dif 1390267 2/19/2018 20:01:33 Rick 2/19/2018 0 1390273 2/19/2018 20:02:36 Mike 2/19/2018 0 1390276 2/19/2018 20:03:15 Rick 2/19/2018 87 1390763 2/19/2018 22:33:49 Fred 2/19/2018 0 1390979 2/19/2018 22:54:18 Suzi 2/19/2018 0 1391119 2/19/2018 23:12:39 Fred 2/19/2018 2171 1391124 2/19/2018 23:13:21 Fred 2/19/2018 42 1392279 2/20/2018 22:54:00 Suzi 2/20/2018 0 1391339 2/20/2018 23:12:00 Fred 2/20/2018 0
1 ACCEPTED SOLUTION

Accepted Solutions
Meteorite | Level 14

## Re: Calculate difference between two dates in the same column conditional to other columns

Here is another alternate way by using LAG and INTCK functions

``````proc sort data=have1;
by user TimeStamp;
run;

data want(drop=LAG);
format LAG DATETIME24.;
set have1;
LAG=LAG(TimeStamp);
by user;
if first.user then dif=0;
else dif=intCK("Second",LAG,TimeStamp);
run;``````
Thanks,
Suryakiran
6 REPLIES 6
Super User

## Re: Calculate difference between two dates in the same column conditional to other columns

Please indicate which rows are being compared to get the values of 87 and 2171. The only one I can duplicate is the 42.

The following possible solution assumes that the timestamp variable is an actual SAS datetime valued variable.

```proc sort data=have;
by user timestamp;
run;

data want;
set have;
by user timestamp;
difstamp = dif(timestamp);
if first.user then do;
dif=0;
end;
else dif=difstamp;
drop difstamp;
run;

proc sort data=want;
by timestamp;
run;```
Meteorite | Level 14

## Re: Calculate difference between two dates in the same column conditional to other columns

Here is another alternate way by using LAG and INTCK functions

``````proc sort data=have1;
by user TimeStamp;
run;

data want(drop=LAG);
format LAG DATETIME24.;
set have1;
LAG=LAG(TimeStamp);
by user;
if first.user then dif=0;
else dif=intCK("Second",LAG,TimeStamp);
run;``````
Thanks,
Suryakiran
Calcite | Level 5

## Re: Calculate difference between two dates in the same column conditional to other columns

thanks, though this doesn't reset at day I used it and used on there:

``````data want;
format LAG DATETIME24.;
set have;
LAG=LAG(timestamp);
by user date;
if first.date then dif=0;
else dif=intCK("Second",LAG,timestamp);
run;``````
Fluorite | Level 6

## Re: Calculate difference between two dates in the same column conditional to other columns

For a newbie it is a bit difficult to understand LAG=LAG(TimeStamp) why not pick var names that a different from SAS function names at these forums? Add a simple _ before the name at the very least.
Opal | Level 21

## Re: Calculate difference between two dates in the same column conditional to other columns

I think the proposed solutions have the right idea, but fail to account for changes in the day.  A slightly improved version:

proc sort data=have;

by user date timestamp;

run;

data want;

set have;

by user date;

dif = dif(timestamp);

if first.date them dif = 0;

run;

If desired, you can always re-sort the data set by TIMESTAMP at that point.

## Re: Calculate difference between two dates in the same column conditional to other columns

Your data appear to already by ordered by timestamp.  While the "proc sort; by user timestamp;" solution will work, it will be an expensive solution if the data set is big, especially if you want to re-sort to original order.  That a lot of disk input/output activity and disk storage.

To avoid that inefficiency, you can use a hash object, keyed on USER, holding the most recent timestamp and most recent date (in variables last_ts and last_dt).  Then you can retrieve from the hash object, compare DATE to LAST_DT and, if appropriate subtract TIMESTAMP minus LAST_TS:

``````data have;
input id dummy :mmddyy10. Timestamp :time8.0 User :\$4. date :mmddyy10.;
timestamp=dummy*24*60*60+timestamp;
drop dummy;
format timestamp datetime19.0 date date9.;
datalines;
1390267 2/19/2018 20:01:33 Rick 2/19/2018
1390273 2/19/2018 20:02:36 Mike 2/19/2018
1390276 2/19/2018 20:03:15 Rick 2/19/2018
1390763 2/19/2018 22:33:49 Fred 2/19/2018
1390979 2/19/2018 22:54:18 Suzi 2/19/2018
1391119 2/19/2018 23:12:39 Fred 2/19/2018
1391124 2/19/2018 23:13:21 Fred 2/19/2018
1392279 2/20/2018 22:54:00 Suzi 2/20/2018
1391339 2/20/2018 23:12:00 Fred 2/20/2018
run;

data want;
set have  ;
if _n_=1 then do;
if 0 then set have (keep=date timestamp rename=(date=last_dt timestamp=last_ts));
declare hash h (dataset:'have (keep=user date timestamp
rename=(date=last_dt timestamp=last_ts)');
h.definekey('user');
h.definedata('last_dt','last_ts');
h.definedone();
end;
rc=h.find();
if rc=0 and date=last_dt then dif=timestamp-last_ts;
else dif=0;
h.replace(key:user,data:date,data:timestamp);
drop last_: rc;
run;``````

The advantage is that no data sorting is required.  The disadvantage is the need to develop and understanding of the hash object, which was not intuitively obvious to me when I first came across it.

By the way, what if a user has 2 simultaneous records?  Then DIF is a true zero, as opposed to a zero representing "not appropriate".  In that case you might drop the "else dif=0;" statement, which will leave DIF as a missing value.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Discussion stats
• 6 replies
• 10187 views
• 0 likes
• 6 in conversation