BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
cobba
Obsidian | Level 7

I have the following data where there are clients with different events and a specific condition. If a client returns within 7 days with the same condition then it should actually be considered as the same event. If a client returns within 7 days with a different condition then it should be considered a separate event.

 

 

data have;
input client:$ eventid:$ condition:$ start:date9. finish:date9. ;
format start DATE9. finish DATE9. ;
datalines ; 
CL01 EV01 CN1 01AUG2019 07AUG2019
CL01 EV02 CN1 10AUG2019 18AUG2019
CL01 EV03 CN1 27AUG2019 30AUG2019
CL02 EV04 CN1 01AUG2019 07AUG2019
CL02 EV05 CN2 10AUG2019 18AUG2019
CL02 EV06 CN2 27AUG2019 30AUG2019
;
run;

 

 

I do not want to change the eventid already in the data. I'd rather create another field to indicate that the events should be grouped.

 

 

data want;
input client:$ eventid:$ condition:$ start:date9. finish:date9. client_condition_grp:$10.;
format start DATE9. finish DATE9. ;
datalines ; 
CL01 EV01 CN1 01AUG2019 07AUG2019 CL01CN1_1
CL01 EV02 CN1 10AUG2019 18AUG2019 CL01CN1_1
CL01 EV03 CN1 27AUG2019 30AUG2019 CL01CN1_2
CL02 EV04 CN1 01AUG2019 07AUG2019 CL02CN1_1
CL02 EV05 CN2 10AUG2019 18AUG2019 CL02CN2_1
CL02 EV06 CN2 27AUG2019 30AUG2019 CL02CN2_2
;
run;

 

 

I'm unsure how to tell SAS about the next or previous value in a table.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

For each ID, this program maintains a hash object (i.e. a dynamic lookup table) with an entry for each condition encountered.  Each entry will have the most recent FINISH value and the corresponding sequence number.

 

As new observations are read in, you can see if there is a prior hash dataitem  (i.e. hash "row") for the same condition.  If not set the sequence to 1.  But if yes, compare the current start to the most recent finish for that condition to determine whether to increment the sequence.  I.e. the sequence is always up to date for each condition.

 

Then just build the new client_condition_grp from the client id, condition, and sequence:

 

data have;
input client:$ eventid:$ condition:$ start:date9. finish:date9. ;
format start DATE9. finish DATE9. ;
datalines ; 
CL01 EV01 CN1 01AUG2019 07AUG2019
CL01 EV02 CN1 10AUG2019 18AUG2019
CL01 EV03 CN1 27AUG2019 30AUG2019
CL02 EV04 CN1 01AUG2019 07AUG2019
CL02 EV05 CN2 10AUG2019 18AUG2019
CL02 EV06 CN2 27AUG2019 30AUG2019
run;

data want (drop=_:);
  set have;
  by client;
  if _n_=1 then do;
    call missing(_prior_finish,_sequence);
    declare hash h ();
      h.definekey('condition');
      h.definedata('condition','_prior_finish','_sequence');
      h.definedone();
  end;
  if first.client then h.clear()     /*Blank slate at the start of each client */;

  if h.find()^=0 then _sequence=1    /*If this condition not yet in the hash */;
  else if start-7 > _prior_finish then _sequence=_sequence+1;
  _prior_finish=finish;
  h.replace()  /*Update the dataitem in the hash object */;

  length client_condition_group $9 ;
  client_condition_group=cats(client,condition,'_',_sequence);
run;

This program assumes data are sorted by START within CLIENT.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
mkeintz
PROC Star

For a given client, is it possible to have condition CN1 on day 1, CN2 on day 3, and another CN1 on day 6?

That's a five day separation between two CN1 events, but they are separated by a CN2 event.  Do the two CN1 events get the same value for the new eventid?

 

Also, what if you have three consecutive CN1 events on days 1, 6, and 11?  Are they all assigned the same new eventid?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
cobba
Obsidian | Level 7

For your first question, then yes you'd ignore the CN2 and join the 2 on either side.

 

And yes, for your second question. Just as long as the previous is less than 7 days then it chains along.

mkeintz
PROC Star

For each ID, this program maintains a hash object (i.e. a dynamic lookup table) with an entry for each condition encountered.  Each entry will have the most recent FINISH value and the corresponding sequence number.

 

As new observations are read in, you can see if there is a prior hash dataitem  (i.e. hash "row") for the same condition.  If not set the sequence to 1.  But if yes, compare the current start to the most recent finish for that condition to determine whether to increment the sequence.  I.e. the sequence is always up to date for each condition.

 

Then just build the new client_condition_grp from the client id, condition, and sequence:

 

data have;
input client:$ eventid:$ condition:$ start:date9. finish:date9. ;
format start DATE9. finish DATE9. ;
datalines ; 
CL01 EV01 CN1 01AUG2019 07AUG2019
CL01 EV02 CN1 10AUG2019 18AUG2019
CL01 EV03 CN1 27AUG2019 30AUG2019
CL02 EV04 CN1 01AUG2019 07AUG2019
CL02 EV05 CN2 10AUG2019 18AUG2019
CL02 EV06 CN2 27AUG2019 30AUG2019
run;

data want (drop=_:);
  set have;
  by client;
  if _n_=1 then do;
    call missing(_prior_finish,_sequence);
    declare hash h ();
      h.definekey('condition');
      h.definedata('condition','_prior_finish','_sequence');
      h.definedone();
  end;
  if first.client then h.clear()     /*Blank slate at the start of each client */;

  if h.find()^=0 then _sequence=1    /*If this condition not yet in the hash */;
  else if start-7 > _prior_finish then _sequence=_sequence+1;
  _prior_finish=finish;
  h.replace()  /*Update the dataitem in the hash object */;

  length client_condition_group $9 ;
  client_condition_group=cats(client,condition,'_',_sequence);
run;

This program assumes data are sorted by START within CLIENT.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 482 views
  • 2 likes
  • 2 in conversation