BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
twerwath
Obsidian | Level 7

I am working on a situation in which a specific event has occurred for an individual on a given date, and that person has multiple events and multiple personnel records. I want to keep only one record per event, the one closest to the event date (either before or after). I have an idea of how to do this, but am looking for the most efficient possible way since I am working with very large datasets.

 

I am also not sure what to do when there are two records that are equally close to the event date. In this case, either one could be used, but only one.

 

An example of what I have:

 

data have;
input id event_date record_date

datalines;
1 28JAN2019 24JAN2019. 
1 28JAN2019 26JAN2019. 

1 28JAN2019 3FEB2019. 

1 28JAN2019 4FEB2019. 

1 3FEB2019 24JAN2019. 
1 3FEB2019 26JAN2019. 

1 3FEB2019 3FEB2019. 

1 3FEB2019 4FEB2019. 

2 31JAN2019 28JAN2019. 
2 31JAN2019 1FEB2019. 

2 31JAN2019 3FEB2019. 

2 31JAN2019 4FEB2019. 

2 5FEB2019 28JAN2019. 
2 5FEB2019 1FEB2019. 

2 5FEB2019 3FEB2019. 

2 5FEB2019 4FEB2019. 

;

run;

 

An example of what I want:

 

data want;
input id event_date record_date

datalines;
1 28JAN2019 26JAN2019. 

1 3FEB2019 3FEB2019. 

2 31JAN2019 1FEB2019. 

2 5FEB2019 4FEB2019. 

;

run;

 

Thanks in advance for any help with this!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Well if your version support CUROBS option in set statement, you could get a little aggressive rather than defensive -

data have;
input id (event_date record_date) (:date9.);
format event_date record_date date9.;
datalines;
1 28JAN2019 24JAN2019. 
1 28JAN2019 26JAN2019. 
1 28JAN2019 3FEB2019. 
1 28JAN2019 4FEB2019. 
1 3FEB2019 24JAN2019. 
1 3FEB2019 26JAN2019. 
1 3FEB2019 3FEB2019. 
1 3FEB2019 4FEB2019. 
2 31JAN2019 28JAN2019. 
2 31JAN2019 1FEB2019. 
2 31JAN2019 3FEB2019. 
2 31JAN2019 4FEB2019. 
2 5FEB2019 28JAN2019. 
2 5FEB2019 1FEB2019. 
2 5FEB2019 3FEB2019. 
2 5FEB2019 4FEB2019.
2 5FEB2019 6FEB2019.  /*added one more to include same diff*/ 
;
run;

data want;
 do until(last.event_date);
  set have curobs=_n_;
  by id event_date;
  diff=abs(event_date-record_date);
  if min>. and diff>=min then continue;
  min=diff;
  k=_n_;
 end;
 set have point=k;
 drop k min diff;
run;

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20
data have;
input id (event_date record_date) (:date9.);
format event_date record_date date9.;
datalines;
1 28JAN2019 24JAN2019. 
1 28JAN2019 26JAN2019. 
1 28JAN2019 3FEB2019. 
1 28JAN2019 4FEB2019. 
1 3FEB2019 24JAN2019. 
1 3FEB2019 26JAN2019. 
1 3FEB2019 3FEB2019. 
1 3FEB2019 4FEB2019. 
2 31JAN2019 28JAN2019. 
2 31JAN2019 1FEB2019. 
2 31JAN2019 3FEB2019. 
2 31JAN2019 4FEB2019. 
2 5FEB2019 28JAN2019. 
2 5FEB2019 1FEB2019. 
2 5FEB2019 6FEB2019.  /*added one more to include same diff*/
2 5FEB2019 3FEB2019. 
2 5FEB2019 4FEB2019. 
;
run;


proc sql;
 create table want(drop=diff) as
 select distinct *
 from
 (select *, abs(event_date-record_date) as diff
 from have
 group by id,event_date
 having min(diff)=diff)
 group by id,event_date,diff
 having min(record_date)=record_date;
quit;

 


novinosrin
Tourmaline | Level 20
data have;
input id (event_date record_date) (:date9.);
format event_date record_date date9.;
datalines;
1 28JAN2019 24JAN2019. 
1 28JAN2019 26JAN2019. 
1 28JAN2019 3FEB2019. 
1 28JAN2019 4FEB2019. 
1 3FEB2019 24JAN2019. 
1 3FEB2019 26JAN2019. 
1 3FEB2019 3FEB2019. 
1 3FEB2019 4FEB2019. 
2 31JAN2019 28JAN2019. 
2 31JAN2019 1FEB2019. 
2 31JAN2019 3FEB2019. 
2 31JAN2019 4FEB2019. 
2 5FEB2019 28JAN2019. 
2 5FEB2019 1FEB2019. 
2 5FEB2019 3FEB2019. 
2 5FEB2019 4FEB2019.
2 5FEB2019 6FEB2019.  /*added one more to include same diff*/ 
;
run;

data want;
 do _n_=1 by 1 until(last.event_date);
  set have;
  by id event_date;
  diff=abs(event_date-record_date);
  if min>. and diff>=min then continue;
  min=diff;
  k=_n_;
 end;
 do _n_=1 to _n_;
  set have;
  if _n_=k then output;
 end;
 drop k min diff;
run;
novinosrin
Tourmaline | Level 20

Well if your version support CUROBS option in set statement, you could get a little aggressive rather than defensive -

data have;
input id (event_date record_date) (:date9.);
format event_date record_date date9.;
datalines;
1 28JAN2019 24JAN2019. 
1 28JAN2019 26JAN2019. 
1 28JAN2019 3FEB2019. 
1 28JAN2019 4FEB2019. 
1 3FEB2019 24JAN2019. 
1 3FEB2019 26JAN2019. 
1 3FEB2019 3FEB2019. 
1 3FEB2019 4FEB2019. 
2 31JAN2019 28JAN2019. 
2 31JAN2019 1FEB2019. 
2 31JAN2019 3FEB2019. 
2 31JAN2019 4FEB2019. 
2 5FEB2019 28JAN2019. 
2 5FEB2019 1FEB2019. 
2 5FEB2019 3FEB2019. 
2 5FEB2019 4FEB2019.
2 5FEB2019 6FEB2019.  /*added one more to include same diff*/ 
;
run;

data want;
 do until(last.event_date);
  set have curobs=_n_;
  by id event_date;
  diff=abs(event_date-record_date);
  if min>. and diff>=min then continue;
  min=diff;
  k=_n_;
 end;
 set have point=k;
 drop k min diff;
run;
novinosrin
Tourmaline | Level 20

You could even play with Hash contents for fun and to familiarize ways to park and pick object notations of hash methods-

 

data have;
input id (event_date record_date) (:date9.);
format event_date record_date date9.;
datalines;
1 28JAN2019 24JAN2019. 
1 28JAN2019 26JAN2019. 
1 28JAN2019 3FEB2019. 
1 28JAN2019 4FEB2019. 
1 3FEB2019 24JAN2019. 
1 3FEB2019 26JAN2019. 
1 3FEB2019 3FEB2019. 
1 3FEB2019 4FEB2019. 
2 31JAN2019 28JAN2019. 
2 31JAN2019 1FEB2019. 
2 31JAN2019 3FEB2019. 
2 31JAN2019 4FEB2019. 
2 5FEB2019 28JAN2019. 
2 5FEB2019 1FEB2019. 
2 5FEB2019 3FEB2019. 
2 5FEB2019 4FEB2019.
2 5FEB2019 6FEB2019.  /*added one more to include same diff*/ 
;
run;

data want;
 if _n_=1 then do;
  dcl hash h(dataset:'have(obs=0)');
  h.definekey('id');
  h.definedata(all:'y');
  h.definedone();
 end;
 do _n_=h.clear() by 0 until(last.event_date);
  set have;
  by id event_date;
  diff=abs(event_date-record_date);
  if min>. and diff>=min then continue;
  min=diff;
  h.replace();
 end;
 h.find();
 drop min diff;
run;
Kurt_Bremser
Super User

If the dataset is sorted (or at least grouped) by id and event_date, do this:

data want;
dist = 99999;
do until (last.event_date);
  set have;
  by id event_date;
  if abs(event_date - record_date) lt dist
  then do;
    dist = abs(event_date - record_date);
    rd = record_date;
  end;
end;
record_date = rd;
drop rd;
run;
twerwath
Obsidian | Level 7

Thank you for your solutions! The issue has been solved, this has been very helpful.

mkeintz
PROC Star

This program assume data are sorted by ID/EVENT_DATE/RECORD_DATE:

 

Once the data step has reached either then end of an ID/EVENT_DATE group, or has a RECORD_DATE>EVENT_DATE, ;you merely have to decide whether the current record is closer than the prior record.  If the prior record is close, re-read it.

 

This allows a way to read in ALL the variables that are associated with the closest record_date.

 

data want (drop=_:);
  set have ;
  by id event_date;

  _prior_date=ifn(first.event_date=1,'15oct1582'd,lag(record_date));

  /* Find obs after which no future obs can qualify ... */
  if ((_prior_date<event_date) and 
      (event_date<=record_date or last.event_date))
     or
     (first.event_date=1 and event_date<=record_date);

  /* Decide whether to retrieve prior record */
  if (event_date-_prior_date)<abs(record_date-event_date) then do;
    p=_n_-1;
    set have point=p;
  end;
run;


The _prior_date var is set to 15oct1582  (beginning of Gregorian calendar) every time a new event_date is encountered.  This will prevent the program from reaching back to the prior event.  

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 906 views
  • 1 like
  • 4 in conversation