Hi SAS community,
I would appreciate your help in below problem:
I have a scenario in which a data-set sorted by ID and begdate and Enddate.
I have tried the code below to achieve my results and have used SAS 9.4 version:
data want;
set have;
IF ID = lag(ID) AND DIFF = 1 or DIFF = 0 then prev = ENDDATE;
DIFF = BEGDATE- lag(ENDDATE)
lag_id = lag(id);
lag_begdate = lag(BEGDATE);
lag_enddate = lag(ENDDATE);
by id;
format BEGDATE date9. ENDDATE date9. prev date9. lag_enddate lag_begdate date9.;
run;
Some IDs have multiple rows some have single rows with begdate and enddate. I want to check if begdate in row2 is equal to row1 enddate or if begdate in row2 = row1 endate +1.
I basically want to see if the records can be clubbed to have a continuous period.
For Example:
If the row1 of ID #200 has an enddate - 2/20/2012 and row2 of ID #200 has begdate- 5/11/2012 then that row should stay the same in the output dataset.
for id #200 Row 2 begdate was compared to row 1 enddate and difference is 81 days. NE 0 or 1 day - So this record should stay same
for id #920 Row 2 begdate was compared to row 1 enddate and difference is 1 day EQ 0 or 1 day - so this should set my begdate for id #920 as row1 begdate and enddate as row2 enddate, thus indicating that it is a continuous period.
Similarly, for id #920 Row 3 begdate was compared to row 2 enddate and difference is 1 day EQ 0 or 1 day.- so this should set my begdate for id #920 as row1 begdate and enddate as row3 enddate, thus indicating that it is a continuous period.
Overall, I want the code to run through my data and loop by ID to compare and give me a record with a continuous period of the patient if there is no difference between consecutive records enddate and beginning date.
Here is a sample of dataset and my expected results:
ID | begdate | enddate |
200 | 4/11/2011 | 2/20/2012 |
200 | 5/11/2012 | 10/31/2013 |
920 | 1/23/2003 | 1/11/2016 |
920 | 1/12/2016 | 3/29/2016 |
920 | 3/30/2016 | 7/21/2017 |
OUTPUT | ||
200 | 4/11/2011 | 2/20/2012 |
200 | 5/11/2012 | 10/31/2013 |
920 | 1/23/2003 | 7/21/2017 |
Any help is appreciated. Thank you for your time and effort. Please feel free to comment or ask questions if my question seems unclear.
Use DO UNTIL() with by processing for each ID group:
data have;
input ID (begdate enddate) (:mmddyy.);
format begdate enddate yymmdd10.;
datalines;
200 4/11/2011 2/20/2012
200 5/11/2012 10/31/2013
920 1/23/2003 1/11/2016
920 1/12/2016 3/29/2016
920 3/30/2016 7/21/2017
;
data want;
format startDate lastDate yymmdd10.;
lastDate = "01jan1930"d;
do until(last.id);
set have; by id;
if intck("day", lastDate, begdate) > 1 then do;
if not missing(startDate) then output;
startDate = begDate;
end;
lastDate = endDate;
end;
if not missing(startDate) then output;
drop begDate endDate;
rename startDate=begDate lastDate=endDate;
run;
proc print data=want; run;
Obs. begDate endDate ID 1 2011-04-11 2012-02-20 200 2 2012-05-11 2013-10-31 200 3 2003-01-23 2017-07-21 920
Use DO UNTIL() with by processing for each ID group:
data have;
input ID (begdate enddate) (:mmddyy.);
format begdate enddate yymmdd10.;
datalines;
200 4/11/2011 2/20/2012
200 5/11/2012 10/31/2013
920 1/23/2003 1/11/2016
920 1/12/2016 3/29/2016
920 3/30/2016 7/21/2017
;
data want;
format startDate lastDate yymmdd10.;
lastDate = "01jan1930"d;
do until(last.id);
set have; by id;
if intck("day", lastDate, begdate) > 1 then do;
if not missing(startDate) then output;
startDate = begDate;
end;
lastDate = endDate;
end;
if not missing(startDate) then output;
drop begDate endDate;
rename startDate=begDate lastDate=endDate;
run;
proc print data=want; run;
Obs. begDate endDate ID 1 2011-04-11 2012-02-20 200 2 2012-05-11 2013-10-31 200 3 2003-01-23 2017-07-21 920
Using a "look-ahead":
data have;
infile datalines dlm='09'x dsd truncover;
input id $ (begdate enddate) (:mmddyy10.);
format begdate enddate yymmddd10.;
datalines;
200 4/11/2011 2/20/2012
200 5/11/2012 10/31/2013
920 1/23/2003 1/11/2016
920 1/12/2016 3/29/2016
920 3/30/2016 7/21/2017
;
data want;
merge
have
have (firstobs=2 rename=(id=_id begdate=_begdate) drop=enddate)
;
retain _beg;
if id = _id and _begdate le (enddate + 1)
then do;
if _beg = . then _beg = begdate;
delete;
end;
if _beg ne . then begdate = _beg;
if id ne _id or _begdate gt (enddate + 1) then _beg = .;
drop _:;
run;
proc print data=want noobs;
run;
Result:
id begdate enddate 200 2011-04-11 2012-02-20 200 2012-05-11 2013-10-31 920 2003-01-23 2017-07-21
data have;
infile datalines expandtabs truncover;
input id $ (begdate enddate) (:mmddyy10.);
format begdate enddate yymmddd10.;
datalines;
200 4/11/2011 2/20/2012
200 5/11/2012 10/31/2013
920 1/23/2003 1/11/2016
920 1/12/2016 3/29/2016
920 3/30/2016 7/21/2017
;
data temp;
set have;
do date=begdate to enddate;
output;
end;
keep id date;
run;
proc sort data=temp nodupkey;
by id date;
run;
data temp;
set temp;
by id;
if first.id or dif(date) ne 1 then group+1;
run;
proc sql;
create table want as
select group,id,min(date) as begdate format=mmddyy10.,
max(date) as enddate format=mmddyy10.
from temp
group by group,id;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.