I have found some help but not quite able to resolve the issues so looking for a help to get it done correctly. Here is the example of sample data-set and result data-set that is desired.
Data have:
ID | START_DATE | End_Date |
1 | 23-Jan-04 | 27-Mar-08 |
1 | 27-Mar-10 | 27-Mar-13 |
1 | 27-Mar-10 | 27-Aug-18 |
1 | 27-Mar-13 | 27-Mar-15 |
1 | 25-Sep-13 | 25-May-16 |
2 | 16-Dec-06 | 12-Dec-18 |
2 | 16-Jan-09 | 16-Jan-14 |
2 | 16-Jan-14 | 16-Jan-17 |
3 | 23-Jan-04 | 27-Mar-08 |
3 | 23-Jan-09 | 27-Mar-12 |
3 | 1-May-14 | 1-May-18 |
4 | 25-Feb-08 | 25-Feb-15 |
4 | 7-Mar-08 | 7-Mar-15 |
4 | 28-Feb-19 | 28-Feb-20 |
Data want:
ID | START_DATE | End_Date | Stay | Gap | Net_Stay |
1 | 23-Jan-04 | 27-Aug-18 | 5330 | 730 | 4600 |
2 | 16-Dec-06 | 12-Dec-18 | 4379 | 0 | 4379 |
3 | 23-Jan-04 | 1-May-18 | 5212 | 1067 | 4145 |
4 | 25-Feb-08 | 28-Feb-20 | 4386 | 1454 | 2932 |
What is actually needed:
START_DATE - minimum for a given ID
End_Date - maximum for a given ID
Stay - Total Stayed
Gap - Any gap during the stay
NetStay - Total Stay - Gap
Started thinking about going this route but appears with this logic - not able to salvage min(Start_Date) and Max(End_Date).
data diff;
merge have
HAVE(firstobs=2 keep=ID START_DATE end_date rename=(ID=nID START_DATE=nSTART_DATE end_date=nend_date));
diff_date=end_date-start_date;
if ID=NID then do;
diff_btwn_date=nSTART_DATE-end_date;
end;
run;
Note: i am using SAS9.4
So, GAP is calculated as nStart Date - End_date.
Example: For ID1 - There is no coverage between 3/27/2008 and 3/27/2010 so the GAP = 730 days. If there are multiple gaps then it has to be summed up to get the total gap days (for ID3 = 302 + 765 = 1067 days)
Thank you in advance for your help!
Hello @APU_007 can you please review and confirm for ID1
Stay:
23-Jan-04 | 27-Aug-18 | 5330 |
isn't this 27aug2018-23jan2004+1 meaning an inclusive of start_date and end_date making it 5331 rather than 5330?
data have;
infile cards;
input id start_date:anydtdte. end_date:anydtdte.;
format start_date end_date date9.;
cards;
1 23-Jan-04 27-Mar-08
1 27-Mar-10 27-Mar-13
1 27-Mar-10 27-Aug-18
1 27-Mar-13 27-Mar-15
1 25-Sep-13 25-May-16
2 16-Dec-06 12-Dec-18
2 16-Jan-09 16-Jan-14
2 16-Jan-14 16-Jan-17
3 23-Jan-04 27-Mar-08
3 23-Jan-09 27-Mar-12
3 1-May-14 1-May-18
4 25-Feb-08 25-Feb-15
4 7-Mar-08 7-Mar-15
4 28-Feb-19 28-Feb-20
;
/*Get the number of dimensions in the array*/
proc sql;
select min(start_date), max(end_date) into :s trimmed, :e trimmed
from have;
quit;
/*Key Indexing method*/
data want;
do until(last.id);
set have;
by id;
array t(&s:&e) _temporary_;
do _n_=start_date to end_date;
t(_n_)=1;
end;
_min=min(_min,start_date);
_max=max(_max,end_date);
end;
stay=sum(of t(*));
do _n_=_min to _max;
gap=sum(gap,nmiss(t(_n_)));
end;
totay_stay=_max-_min+1;
call missing(of t(*));
rename _min=start_date _max=end_date;
drop start_date end_date;
format _: date9.;
run;
So there is not logik determining Gap? Should this just be selected randomly or?
This is a bit of a "brute force" way to program it, but it should work
data have;
infile cards;
input id start_date:anydtdte. end_date:anydtdte.;
cards;
1 23-Jan-04 27-Mar-08
1 27-Mar-10 27-Mar-13
1 27-Mar-10 27-Aug-18
1 27-Mar-13 27-Mar-15
1 25-Sep-13 25-May-16
2 16-Dec-06 12-Dec-18
2 16-Jan-09 16-Jan-14
2 16-Jan-14 16-Jan-17
3 23-Jan-04 27-Mar-08
3 23-Jan-09 27-Mar-12
3 1-May-14 1-May-18
4 25-Feb-08 25-Feb-15
4 7-Mar-08 7-Mar-15
4 28-Feb-19 28-Feb-20
;
/* Compute max and min dates of all stays */
proc summary data=have nway;
class id;
var start_date end_date;
output out=stay min(start_date)=min_start_date max(end_date)=max_end_date;
run;
/* Compute if each date between min_start_date and max_start_date is "stay" or "gap" */
data stay2;
merge have stay;
by id;
do thisdate=min_start_date to max_end_date;
if start_date<=thisdate<=end_date then gap=0;
else gap=1;
output;
end;
run;
/* Find days that are "gap" */
proc summary data=stay2 nway;
class id thisdate;
var gap;
output out=gap min=min_gap;
run;
/* Add up number of gap days */
proc summary data=gap nway;
class id;
var min_gap;
output out=want sum=gap;
run;
/* Now that you have total length of stay and number of gap days, you do the rest */
Sir @PaigeMiller Neat work with right set of comments to follow. Very nice!
Hello @APU_007 can you please review and confirm for ID1
Stay:
23-Jan-04 | 27-Aug-18 | 5330 |
isn't this 27aug2018-23jan2004+1 meaning an inclusive of start_date and end_date making it 5331 rather than 5330?
data have;
infile cards;
input id start_date:anydtdte. end_date:anydtdte.;
format start_date end_date date9.;
cards;
1 23-Jan-04 27-Mar-08
1 27-Mar-10 27-Mar-13
1 27-Mar-10 27-Aug-18
1 27-Mar-13 27-Mar-15
1 25-Sep-13 25-May-16
2 16-Dec-06 12-Dec-18
2 16-Jan-09 16-Jan-14
2 16-Jan-14 16-Jan-17
3 23-Jan-04 27-Mar-08
3 23-Jan-09 27-Mar-12
3 1-May-14 1-May-18
4 25-Feb-08 25-Feb-15
4 7-Mar-08 7-Mar-15
4 28-Feb-19 28-Feb-20
;
/*Get the number of dimensions in the array*/
proc sql;
select min(start_date), max(end_date) into :s trimmed, :e trimmed
from have;
quit;
/*Key Indexing method*/
data want;
do until(last.id);
set have;
by id;
array t(&s:&e) _temporary_;
do _n_=start_date to end_date;
t(_n_)=1;
end;
_min=min(_min,start_date);
_max=max(_max,end_date);
end;
stay=sum(of t(*));
do _n_=_min to _max;
gap=sum(gap,nmiss(t(_n_)));
end;
totay_stay=_max-_min+1;
call missing(of t(*));
rename _min=start_date _max=end_date;
drop start_date end_date;
format _: date9.;
run;
Hello @novinosrin,
This is what I was exactly looking for. Thank you so much for this piece of code. Regarding, adding +1, I was excluding the start date for a final total stay. It can be included/excluded as this would be a rule for every observation!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.