I have stacked user actions in a table and would like to calculate the number of seconds between each action for each user within that day. The issue is multiple users can access at once, so their actions are stacked, and I would like to only calculate that difference if the actions occurred on the same day.
example:
data have:
id | Timestamp | User | date |
1390267 | 2/19/2018 20:01:33 | Rick | 2/19/2018 |
1390273 | 2/19/2018 20:02:36 | Mike | 2/19/2018 |
1390276 | 2/19/2018 20:03:15 | Rick | 2/19/2018 |
1390763 | 2/19/2018 22:33:49 | Fred | 2/19/2018 |
1390979 | 2/19/2018 22:54:18 | Suzi | 2/19/2018 |
1391119 | 2/19/2018 23:12:39 | Fred | 2/19/2018 |
1391124 | 2/19/2018 23:13:21 | Fred | 2/19/2018 |
1392279 | 2/20/2018 22:54:00 | Suzi | 2/20/2018 |
1391339 | 2/20/2018 23:12:00 | Fred | 2/20/2018 |
data want:
id | Timestamp | User | date | Dif |
1390267 | 2/19/2018 20:01:33 | Rick | 2/19/2018 | 0 |
1390273 | 2/19/2018 20:02:36 | Mike | 2/19/2018 | 0 |
1390276 | 2/19/2018 20:03:15 | Rick | 2/19/2018 | 87 |
1390763 | 2/19/2018 22:33:49 | Fred | 2/19/2018 | 0 |
1390979 | 2/19/2018 22:54:18 | Suzi | 2/19/2018 | 0 |
1391119 | 2/19/2018 23:12:39 | Fred | 2/19/2018 | 2171 |
1391124 | 2/19/2018 23:13:21 | Fred | 2/19/2018 | 42 |
1392279 | 2/20/2018 22:54:00 | Suzi | 2/20/2018 | 0 |
1391339 | 2/20/2018 23:12:00 | Fred | 2/20/2018 | 0 |
Here is another alternate way by using LAG and INTCK functions
proc sort data=have1;
by user TimeStamp;
run;
data want(drop=LAG);
format LAG DATETIME24.;
set have1;
LAG=LAG(TimeStamp);
by user;
if first.user then dif=0;
else dif=intCK("Second",LAG,TimeStamp);
run;
Please indicate which rows are being compared to get the values of 87 and 2171. The only one I can duplicate is the 42.
The following possible solution assumes that the timestamp variable is an actual SAS datetime valued variable.
proc sort data=have; by user timestamp; run; data want; set have; by user timestamp; difstamp = dif(timestamp); if first.user then do; dif=0; end; else dif=difstamp; drop difstamp; run; proc sort data=want; by timestamp; run;
Here is another alternate way by using LAG and INTCK functions
proc sort data=have1;
by user TimeStamp;
run;
data want(drop=LAG);
format LAG DATETIME24.;
set have1;
LAG=LAG(TimeStamp);
by user;
if first.user then dif=0;
else dif=intCK("Second",LAG,TimeStamp);
run;
thanks, though this doesn't reset at day I used it and used Astounding's if statement on there:
data want;
format LAG DATETIME24.;
set have;
LAG=LAG(timestamp);
by user date;
if first.date then dif=0;
else dif=intCK("Second",LAG,timestamp);
run;
I think the proposed solutions have the right idea, but fail to account for changes in the day. A slightly improved version:
proc sort data=have;
by user date timestamp;
run;
data want;
set have;
by user date;
dif = dif(timestamp);
if first.date them dif = 0;
run;
If desired, you can always re-sort the data set by TIMESTAMP at that point.
Your data appear to already by ordered by timestamp. While the "proc sort; by user timestamp;" solution will work, it will be an expensive solution if the data set is big, especially if you want to re-sort to original order. That a lot of disk input/output activity and disk storage.
To avoid that inefficiency, you can use a hash object, keyed on USER, holding the most recent timestamp and most recent date (in variables last_ts and last_dt). Then you can retrieve from the hash object, compare DATE to LAST_DT and, if appropriate subtract TIMESTAMP minus LAST_TS:
data have;
input id dummy :mmddyy10. Timestamp :time8.0 User :$4. date :mmddyy10.;
timestamp=dummy*24*60*60+timestamp;
drop dummy;
format timestamp datetime19.0 date date9.;
datalines;
1390267 2/19/2018 20:01:33 Rick 2/19/2018
1390273 2/19/2018 20:02:36 Mike 2/19/2018
1390276 2/19/2018 20:03:15 Rick 2/19/2018
1390763 2/19/2018 22:33:49 Fred 2/19/2018
1390979 2/19/2018 22:54:18 Suzi 2/19/2018
1391119 2/19/2018 23:12:39 Fred 2/19/2018
1391124 2/19/2018 23:13:21 Fred 2/19/2018
1392279 2/20/2018 22:54:00 Suzi 2/20/2018
1391339 2/20/2018 23:12:00 Fred 2/20/2018
run;
data want;
set have ;
if _n_=1 then do;
if 0 then set have (keep=date timestamp rename=(date=last_dt timestamp=last_ts));
declare hash h (dataset:'have (keep=user date timestamp
rename=(date=last_dt timestamp=last_ts)');
h.definekey('user');
h.definedata('last_dt','last_ts');
h.definedone();
end;
rc=h.find();
if rc=0 and date=last_dt then dif=timestamp-last_ts;
else dif=0;
h.replace(key:user,data:date,data:timestamp);
drop last_: rc;
run;
The advantage is that no data sorting is required. The disadvantage is the need to develop and understanding of the hash object, which was not intuitively obvious to me when I first came across it.
By the way, what if a user has 2 simultaneous records? Then DIF is a true zero, as opposed to a zero representing "not appropriate". In that case you might drop the "else dif=0;" statement, which will leave DIF as a missing value.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.