I have found some help but not quite able to resolve the issues so looking for a help to get it done correctly. Here is the example of sample data-set and result data-set that is desired.
Data have:
ID | START_DATE | End_Date |
1 | 23-Jan-04 | 27-Mar-08 |
1 | 27-Mar-10 | 27-Mar-13 |
1 | 27-Mar-10 | 27-Aug-18 |
1 | 27-Mar-13 | 27-Mar-15 |
1 | 25-Sep-13 | 25-May-16 |
2 | 16-Dec-06 | 12-Dec-18 |
2 | 16-Jan-09 | 16-Jan-14 |
2 | 16-Jan-14 | 16-Jan-17 |
3 | 23-Jan-04 | 27-Mar-08 |
3 | 23-Jan-09 | 27-Mar-12 |
3 | 1-May-14 | 1-May-18 |
4 | 25-Feb-08 | 25-Feb-15 |
4 | 7-Mar-08 | 7-Mar-15 |
4 | 28-Feb-19 | 28-Feb-20 |
Data want:
ID | START_DATE | End_Date | Stay | Gap | Net_Stay |
1 | 23-Jan-04 | 27-Aug-18 | 5330 | 730 | 4600 |
2 | 16-Dec-06 | 12-Dec-18 | 4379 | 0 | 4379 |
3 | 23-Jan-04 | 1-May-18 | 5212 | 1067 | 4145 |
4 | 25-Feb-08 | 28-Feb-20 | 4386 | 1454 | 2932 |
What is actually needed:
START_DATE - minimum for a given ID
End_Date - maximum for a given ID
Stay - Total Stayed
Gap - Any gap during the stay
NetStay - Total Stay - Gap
Started thinking about going this route but appears with this logic - not able to salvage min(Start_Date) and Max(End_Date).
data diff;
merge have
HAVE(firstobs=2 keep=ID START_DATE end_date rename=(ID=nID START_DATE=nSTART_DATE end_date=nend_date));
diff_date=end_date-start_date;
if ID=NID then do;
diff_btwn_date=nSTART_DATE-end_date;
end;
run;
Note: i am using SAS9.4
So, GAP is calculated as nStart Date - End_date.
Example: For ID1 - There is no coverage between 3/27/2008 and 3/27/2010 so the GAP = 730 days. If there are multiple gaps then it has to be summed up to get the total gap days (for ID3 = 302 + 765 = 1067 days)
Thank you in advance for your help!
Hello @APU_007 can you please review and confirm for ID1
Stay:
23-Jan-04 | 27-Aug-18 | 5330 |
isn't this 27aug2018-23jan2004+1 meaning an inclusive of start_date and end_date making it 5331 rather than 5330?
data have;
infile cards;
input id start_date:anydtdte. end_date:anydtdte.;
format start_date end_date date9.;
cards;
1 23-Jan-04 27-Mar-08
1 27-Mar-10 27-Mar-13
1 27-Mar-10 27-Aug-18
1 27-Mar-13 27-Mar-15
1 25-Sep-13 25-May-16
2 16-Dec-06 12-Dec-18
2 16-Jan-09 16-Jan-14
2 16-Jan-14 16-Jan-17
3 23-Jan-04 27-Mar-08
3 23-Jan-09 27-Mar-12
3 1-May-14 1-May-18
4 25-Feb-08 25-Feb-15
4 7-Mar-08 7-Mar-15
4 28-Feb-19 28-Feb-20
;
/*Get the number of dimensions in the array*/
proc sql;
select min(start_date), max(end_date) into :s trimmed, :e trimmed
from have;
quit;
/*Key Indexing method*/
data want;
do until(last.id);
set have;
by id;
array t(&s:&e) _temporary_;
do _n_=start_date to end_date;
t(_n_)=1;
end;
_min=min(_min,start_date);
_max=max(_max,end_date);
end;
stay=sum(of t(*));
do _n_=_min to _max;
gap=sum(gap,nmiss(t(_n_)));
end;
totay_stay=_max-_min+1;
call missing(of t(*));
rename _min=start_date _max=end_date;
drop start_date end_date;
format _: date9.;
run;
So there is not logik determining Gap? Should this just be selected randomly or?
This is a bit of a "brute force" way to program it, but it should work
data have;
infile cards;
input id start_date:anydtdte. end_date:anydtdte.;
cards;
1 23-Jan-04 27-Mar-08
1 27-Mar-10 27-Mar-13
1 27-Mar-10 27-Aug-18
1 27-Mar-13 27-Mar-15
1 25-Sep-13 25-May-16
2 16-Dec-06 12-Dec-18
2 16-Jan-09 16-Jan-14
2 16-Jan-14 16-Jan-17
3 23-Jan-04 27-Mar-08
3 23-Jan-09 27-Mar-12
3 1-May-14 1-May-18
4 25-Feb-08 25-Feb-15
4 7-Mar-08 7-Mar-15
4 28-Feb-19 28-Feb-20
;
/* Compute max and min dates of all stays */
proc summary data=have nway;
class id;
var start_date end_date;
output out=stay min(start_date)=min_start_date max(end_date)=max_end_date;
run;
/* Compute if each date between min_start_date and max_start_date is "stay" or "gap" */
data stay2;
merge have stay;
by id;
do thisdate=min_start_date to max_end_date;
if start_date<=thisdate<=end_date then gap=0;
else gap=1;
output;
end;
run;
/* Find days that are "gap" */
proc summary data=stay2 nway;
class id thisdate;
var gap;
output out=gap min=min_gap;
run;
/* Add up number of gap days */
proc summary data=gap nway;
class id;
var min_gap;
output out=want sum=gap;
run;
/* Now that you have total length of stay and number of gap days, you do the rest */
Sir @PaigeMiller Neat work with right set of comments to follow. Very nice!
Hello @APU_007 can you please review and confirm for ID1
Stay:
23-Jan-04 | 27-Aug-18 | 5330 |
isn't this 27aug2018-23jan2004+1 meaning an inclusive of start_date and end_date making it 5331 rather than 5330?
data have;
infile cards;
input id start_date:anydtdte. end_date:anydtdte.;
format start_date end_date date9.;
cards;
1 23-Jan-04 27-Mar-08
1 27-Mar-10 27-Mar-13
1 27-Mar-10 27-Aug-18
1 27-Mar-13 27-Mar-15
1 25-Sep-13 25-May-16
2 16-Dec-06 12-Dec-18
2 16-Jan-09 16-Jan-14
2 16-Jan-14 16-Jan-17
3 23-Jan-04 27-Mar-08
3 23-Jan-09 27-Mar-12
3 1-May-14 1-May-18
4 25-Feb-08 25-Feb-15
4 7-Mar-08 7-Mar-15
4 28-Feb-19 28-Feb-20
;
/*Get the number of dimensions in the array*/
proc sql;
select min(start_date), max(end_date) into :s trimmed, :e trimmed
from have;
quit;
/*Key Indexing method*/
data want;
do until(last.id);
set have;
by id;
array t(&s:&e) _temporary_;
do _n_=start_date to end_date;
t(_n_)=1;
end;
_min=min(_min,start_date);
_max=max(_max,end_date);
end;
stay=sum(of t(*));
do _n_=_min to _max;
gap=sum(gap,nmiss(t(_n_)));
end;
totay_stay=_max-_min+1;
call missing(of t(*));
rename _min=start_date _max=end_date;
drop start_date end_date;
format _: date9.;
run;
Hello @novinosrin,
This is what I was exactly looking for. Thank you so much for this piece of code. Regarding, adding +1, I was excluding the start date for a final total stay. It can be included/excluded as this would be a rule for every observation!
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.