DATA Step, Macro, Functions and more

Identify and Concatenate rows with same ID and consecutive coverage of a range of time into one row

Reply
Regular Learner
Posts: 1

Identify and Concatenate rows with same ID and consecutive coverage of a range of time into one row

[ Edited ]

I am dealing with health care enrollment data right now, facing the following problem.

Raw data looks like

ID             stard_date    end_date

P1            A                   B

P1            B+1               C

P1            C+180           D

P2              E                 F

P2              F+1             G                 

P3               H                I

 

The expected output is like

ID             stard_date    end_date

P1            A                   C

P1            C+180           D

P2              E                 G

P3               H                I

 

The criteria of difining  consecutive coverage or not could be parameterized, for example, in above example, if there is only 1 day gap, we consider it as consecutive, if the gap is large enough( 180 days), we don't concatenate them( see P1).

 

In addition, in order to make the ID as unique key, I am thinking about to change the ID as "ID+start date"

ID                       stard_date    end_date

P1+A                     A                   C

P1 +C+180           C+180            D

P2 +E                     E                  G

P3 +H                     H                  I

 

Thanks in advance

Super User
Posts: 9,681

Re: Identify and Concatenate rows with same ID and consecutive coverage of a range of time into one

Should be good if you post some real date value rather than character .

 

 

data have;
input ID $   stard_date : date9.   end_date : date9.;
format stard_date end_date  date9.;
cards;
P1            01jan2012                   01jun2012
P1            02jun2012              01jun2013
P1             01jun2014           01dec2014
P2            01jun2012               01dec2012
P2              02dec2012             01jun2013                
P2                01jun2014            01jun2015
;
run;
data temp;
 set have;
 n+1;
 temp=stard_date; output;
 temp=end_date; output;
 drop stard_date end_date;
 format temp date9.;
run;
data temp;
 set temp;
 by id n;
 if first.id or (first.n and temp gt lag(temp)+1) then group+1;
run;
data x;
 set temp;
 by group;
 if first.group or last.group;
run;
proc transpose data=x out=want(drop=_name_);
by id group;
var temp;
run;
PROC Star
Posts: 1,562

Re: Identify and Concatenate rows with same ID and consecutive coverage of a range of time into one

(ID + start date) is the unique key.

Ask a Question
Discussion stats
  • 2 replies
  • 194 views
  • 2 likes
  • 3 in conversation