Greetings,
So what I'm looking to do is create a count as a number of events for each key_id based on unique dates (with a caveat). My data looks like this:
key_id | event_dt | event_ts
1142 | 10JUL2019 | 10JUL2019:18:45:12
1142 | 10JUL2019 | 10JUL2019:18:00:39
1142 | 09JUL2019 | 09JUL2019:16:32:51
1142 | 15JUL2019 | 15JUL2019:12:30:34
3462 | 01OCT2019 | 01OCT2019:14:29:43
3462 | 01OCT2019 | 01OCT2019:14:00:12
3462 | 01OCT2019 | 01OCT2019:14:45:01
3462 | 12NOV2019 | 12NOV2019:11:17:21
4444 | 04AUG2019 | 04AUG2019:11:20:35
4444 | 02MAR2019 | 02MAR2019:10:55:13
4444 | 14DEC2019 | 14DEC2019:14:32:17
4444 | 21OCT2019 | 21OCT2019:14:12:42
event_dt format is: DATE9.
event_ts format is: DATETIME19.
The caveat is that, I want to create a count (by key_id) of events with unique dates, however, if an event falls on the same day and is within 60 mins of the other, I don't want to count that as a second event. So, for the above data, I would desire something like:
key_id | count_event
1142 | 3
3462 | 2
4444 | 4
Thank you
Questions: Does your 60- minute filter cross date boundaries, i.e. is 11JUL2019:00:15:00 within the same event as 10JUL2019:23:45:00?
If yes, then
data have;
infile datalines dlm='|';
input key_id event_dt :date9. event_ts datetime18. ;
format event_dt date9. event_ts datetime18.;
datalines;
1142 | 10JUL2019 | 10JUL2019:18:45:12
1142 | 10JUL2019 | 10JUL2019:18:00:39
1142 | 09JUL2019 | 09JUL2019:16:32:51
1142 | 15JUL2019 | 15JUL2019:12:30:34
3462 | 01OCT2019 | 01OCT2019:14:29:43
3462 | 01OCT2019 | 01OCT2019:14:00:12
3462 | 01OCT2019 | 01OCT2019:14:45:01
3462 | 12NOV2019 | 12NOV2019:11:17:21
4444 | 04AUG2019 | 04AUG2019:11:20:35
4444 | 02MAR2019 | 02MAR2019:10:55:13
4444 | 14DEC2019 | 14DEC2019:14:32:17
4444 | 21OCT2019 | 21OCT2019:14:12:42
run;
proc sort data=have;
by key_id event_ts;
run;
data want (keep=key_id n_events;
set have;
by key_id;
if event_ts > intnx('minute',lag(event_ts),60) then n_events+1;
if first.key_id then n_events=1;
if last.key_id;
run;
But if your time stamps were to the second, rather than to the whole minute, you could have an error for the time span from, say, 13:00:32 to 14:00:58 which is more the 60 minutes (by an exce3ss of 36 seconds), but would not be detected because the INTNX function is using a granularity of 'minute'. You could address this via:
if event_ts > intnx('second',lag(event_ts),3600) then n_events+1;
Hey @mkeintz , good question - it does NOT actually cross boundaries. If it's a new day (but within 60 minutes from a previous day) it's a new event no matter what
How would I modify this to address the fact that no matter what if it's a new day, it's a new event, even if it's within 60 mins across the boundary?
HI @Time_Looper47 Go with genie mark's solution. I am bored, so my share of fun
data have;
input key_id @8 event_dt date9. @20 event_ts datetime20.;
format event_dt date9. event_ts datetime20.;
cards;
1142 | 10JUL2019 | 10JUL2019:18:45:12
1142 | 10JUL2019 | 10JUL2019:18:00:39
1142 | 09JUL2019 | 09JUL2019:16:32:51
1142 | 15JUL2019 | 15JUL2019:12:30:34
3462 | 01OCT2019 | 01OCT2019:14:29:43
3462 | 01OCT2019 | 01OCT2019:14:00:12
3462 | 01OCT2019 | 01OCT2019:14:45:01
3462 | 12NOV2019 | 12NOV2019:11:17:21
4444 | 04AUG2019 | 04AUG2019:11:20:35
4444 | 02MAR2019 | 02MAR2019:10:55:13
4444 | 14DEC2019 | 14DEC2019:14:32:17
4444 | 21OCT2019 | 21OCT2019:14:12:42
;
data want;
do until(last.key_id);
do until(last.event_dt);
set have;
by key_id event_dt notsorted;
if not first.event_dt and abs(dif(minute(event_ts)))<=60 then continue;
count_event=sum(count_event,1);
end;
end;
run;
What if you have dates
01OCT2019:14:00:12
01OCT2019:14:15:12
01OCT2019:15:10:12
The first two are within 60 mins, so are the last two, but not the first and last. Does that count as one or two distinct events? What's the rule for such cases?
@PGStats Good question - so the way these cases would be handled is, you go 60 mins from the FIRST event. So in your example it would count as 2 events (the first two get grouped into one, and since the last one is greater than 60 mins from the first, it would be a separate event.
Also note that: if an event falls on a separate day but is within 60 mins of the first event (lets say the events were 01OCT2019:11:30:12 and 02OCT2019:00:10:10) it counts as two separate events.
Hi @Time_Looper47 Can you try this?
proc sort data=have out=_have;
by key_id event_dt event_ts;
run;
data want;
set _have;
by key_id event_dt ;
retain t;
if first.key_id then do; count=1;t=event_ts;end;
else if first.event_dt then do; count+1;t=event_ts;end;
else if intck('minute',t,event_ts)>60 then do;
t=event_ts;
count+1;
end;
if last.key_id;
keep key_id count;
run;
This would work:
data have;
infile datalines dlm='|';
input key_id event_dt :date9. event_ts datetime18. ;
format event_dt date9. event_ts datetime18.;
datalines;
1142 | 10JUL2019 | 10JUL2019:18:45:12
1142 | 10JUL2019 | 10JUL2019:18:00:39
1142 | 09JUL2019 | 09JUL2019:16:32:51
1142 | 15JUL2019 | 15JUL2019:12:30:34
3462 | 01OCT2019 | 01OCT2019:14:29:43
3462 | 01OCT2019 | 01OCT2019:14:00:12
3462 | 01OCT2019 | 01OCT2019:14:45:01
3462 | 12NOV2019 | 12NOV2019:11:17:21
4444 | 04AUG2019 | 04AUG2019:11:20:35
4444 | 02MAR2019 | 02MAR2019:10:55:13
4444 | 14DEC2019 | 14DEC2019:14:32:17
4444 | 21OCT2019 | 21OCT2019:14:12:42
4444 | 21OCT2019 | 21OCT2019:23:12:42
4444 | 22OCT2019 | 22OCT2019:00:02:42 New day
4444 | 22OCT2019 | 22OCT2019:00:52:42 Same event
4444 | 22OCT2019 | 22OCT2019:01:12:42 New event
run;
proc sort data=have;
by key_id event_ts;
run;
data temp (keep=key_id event);
retain first_event_ts;
set have; by key_id;
if first.key_id or
event_dt > lag(event_dt) or
intnx("minute", event_ts, -60) > first_event_ts then do;
event + 1;
first_event_ts = event_ts;
end;
run;
proc sql;
create table want as
select
key_id ,
count (distinct event) as count_event
from temp
group by key_id;
select * from want;
quit;
Note: I added data to test your latest requirements.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.