BookmarkSubscribeRSS Feed
Time_Looper47
Obsidian | Level 7

Greetings,

 

So what I'm looking to do is create a count as a number of events for each key_id based on unique dates (with a caveat). My data looks like this:

 

key_id | event_dt | event_ts

1142 | 10JUL2019 | 10JUL2019:18:45:12

1142 | 10JUL2019 | 10JUL2019:18:00:39

1142 | 09JUL2019 | 09JUL2019:16:32:51

1142 | 15JUL2019 | 15JUL2019:12:30:34

3462 | 01OCT2019 | 01OCT2019:14:29:43

3462 | 01OCT2019 | 01OCT2019:14:00:12

3462 | 01OCT2019 | 01OCT2019:14:45:01

3462 | 12NOV2019 | 12NOV2019:11:17:21

4444 | 04AUG2019 | 04AUG2019:11:20:35

4444 | 02MAR2019 | 02MAR2019:10:55:13

4444 | 14DEC2019 | 14DEC2019:14:32:17

4444 | 21OCT2019 | 21OCT2019:14:12:42

 

event_dt format is: DATE9.

event_ts format is: DATETIME19.

 

The caveat is that, I want to create a count (by key_id) of events with unique dates, however, if an event falls on the same day and is within 60 mins of the other, I don't want to count that as a second event. So, for the above data, I would desire something like:

 

key_id | count_event

1142 | 3

3462 | 2

4444 | 4

 

 

Thank you

8 REPLIES 8
mkeintz
PROC Star

Questions: Does your 60- minute filter cross date boundaries,  i.e. is 11JUL2019:00:15:00 within the same event as 10JUL2019:23:45:00?

 

If yes, then 

 

data have;
  infile datalines dlm='|';
  input key_id  event_dt :date9.   event_ts  datetime18. ;
  format event_dt date9.  event_ts datetime18.;
datalines;
1142 | 10JUL2019 | 10JUL2019:18:45:12
1142 | 10JUL2019 | 10JUL2019:18:00:39
1142 | 09JUL2019 | 09JUL2019:16:32:51
1142 | 15JUL2019 | 15JUL2019:12:30:34
3462 | 01OCT2019 | 01OCT2019:14:29:43
3462 | 01OCT2019 | 01OCT2019:14:00:12
3462 | 01OCT2019 | 01OCT2019:14:45:01
3462 | 12NOV2019 | 12NOV2019:11:17:21
4444 | 04AUG2019 | 04AUG2019:11:20:35
4444 | 02MAR2019 | 02MAR2019:10:55:13
4444 | 14DEC2019 | 14DEC2019:14:32:17
4444 | 21OCT2019 | 21OCT2019:14:12:42
run;

proc sort data=have;
   by key_id event_ts;
run;

data want (keep=key_id n_events;
  set have;
  by key_id;

  if event_ts > intnx('minute',lag(event_ts),60) then n_events+1;
  if first.key_id then n_events=1;
  if last.key_id;
run;

But if your time stamps were to the second, rather than to the whole minute, you could have an error for the time span from, say, 13:00:32 to 14:00:58 which is more the 60 minutes (by an exce3ss of 36 seconds), but would not be detected because the INTNX function is using a granularity of 'minute'.  You could address this via:

 

  if event_ts > intnx('second',lag(event_ts),3600) then n_events+1;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Time_Looper47
Obsidian | Level 7

Hey @mkeintz , good question - it does NOT actually cross boundaries. If it's a new day (but within 60 minutes from a previous day) it's a new event no matter what

Time_Looper47
Obsidian | Level 7

@mkeintz @novinosrin 

 

How would I modify this to address the fact that no matter what if it's a new day, it's a new event, even if it's within 60 mins across the boundary?

novinosrin
Tourmaline | Level 20

HI @Time_Looper47  Go with genie mark's solution. I am bored, so my share of fun 

 


data have;
input key_id  @8 event_dt date9. @20 event_ts datetime20.;
format event_dt date9. event_ts datetime20.;
cards;
1142 | 10JUL2019 | 10JUL2019:18:45:12
1142 | 10JUL2019 | 10JUL2019:18:00:39
1142 | 09JUL2019 | 09JUL2019:16:32:51
1142 | 15JUL2019 | 15JUL2019:12:30:34
3462 | 01OCT2019 | 01OCT2019:14:29:43
3462 | 01OCT2019 | 01OCT2019:14:00:12
3462 | 01OCT2019 | 01OCT2019:14:45:01
3462 | 12NOV2019 | 12NOV2019:11:17:21
4444 | 04AUG2019 | 04AUG2019:11:20:35
4444 | 02MAR2019 | 02MAR2019:10:55:13
4444 | 14DEC2019 | 14DEC2019:14:32:17
4444 | 21OCT2019 | 21OCT2019:14:12:42
;

data want;
 do until(last.key_id);
  do until(last.event_dt);
   set have;
   by key_id event_dt notsorted;
   if not first.event_dt and abs(dif(minute(event_ts)))<=60 then continue;
   count_event=sum(count_event,1);
  end;
 end;
run;
PGStats
Opal | Level 21

What if you have dates

 

01OCT2019:14:00:12
01OCT2019:14:15:12
01OCT2019:15:10:12

 

The first two are within 60 mins, so are the last two, but not the first and last. Does that count as one or two distinct events? What's the rule for such cases?

PG
Time_Looper47
Obsidian | Level 7

@PGStats Good question - so the way these cases would be handled is, you go 60 mins from the FIRST event. So in your example it would count as 2 events (the first two get grouped into one, and since the last one is greater than 60 mins from the first, it would be a separate event.

 

Also note that: if an event falls on a separate day but is within 60 mins of the first event (lets say the events were 01OCT2019:11:30:12 and 02OCT2019:00:10:10) it counts as two separate events.

novinosrin
Tourmaline | Level 20

Hi @Time_Looper47  Can you try this?

 

proc sort data=have out=_have;
by key_id event_dt event_ts;
run;
data want;
 set _have;
 by key_id event_dt ;
 retain t;
 if first.key_id  then do; count=1;t=event_ts;end;
 else if first.event_dt then do; count+1;t=event_ts;end;
 else if intck('minute',t,event_ts)>60 then do;
  t=event_ts;
  count+1;
 end;
 if last.key_id;
 keep key_id count;
run;
PGStats
Opal | Level 21

This would work:

 

data have;
  infile datalines dlm='|';
  input key_id  event_dt :date9.   event_ts  datetime18. ;
  format event_dt date9.  event_ts datetime18.;
datalines;
1142 | 10JUL2019 | 10JUL2019:18:45:12
1142 | 10JUL2019 | 10JUL2019:18:00:39
1142 | 09JUL2019 | 09JUL2019:16:32:51
1142 | 15JUL2019 | 15JUL2019:12:30:34
3462 | 01OCT2019 | 01OCT2019:14:29:43
3462 | 01OCT2019 | 01OCT2019:14:00:12
3462 | 01OCT2019 | 01OCT2019:14:45:01
3462 | 12NOV2019 | 12NOV2019:11:17:21
4444 | 04AUG2019 | 04AUG2019:11:20:35
4444 | 02MAR2019 | 02MAR2019:10:55:13
4444 | 14DEC2019 | 14DEC2019:14:32:17
4444 | 21OCT2019 | 21OCT2019:14:12:42
4444 | 21OCT2019 | 21OCT2019:23:12:42  
4444 | 22OCT2019 | 22OCT2019:00:02:42  New day
4444 | 22OCT2019 | 22OCT2019:00:52:42  Same event
4444 | 22OCT2019 | 22OCT2019:01:12:42  New event
run;

proc sort data=have;
   by key_id event_ts;
run;

data temp (keep=key_id event);
retain first_event_ts;
set have; by key_id;
if 	first.key_id or 
	event_dt > lag(event_dt) or 
	intnx("minute", event_ts, -60) > first_event_ts then do;
		event + 1;
		first_event_ts = event_ts;
		end;	
run;

proc sql;
create table want as
select 
	key_id ,
	count (distinct event) as count_event
from temp
group by key_id;
select * from want;
quit;

Note: I added data to test your latest requirements.

PG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 651 views
  • 4 likes
  • 4 in conversation