I've a data step question. Any help is appreciated.
Basically I've a dataset with ID, entry date and exit date which can last as short as several days up to several years, and it can cut across two year points. What I like to do is allocate the days into yearly periods (say 5 days in year 1, 365 in year 2 and 100 days in year 3, for a total of 470 days). For the yearly period, I'd like to try April 1-March 31 as the cycle - I call it fiscal year.
Say I've days from May 1 to May 9, 2008, then it's 8 days for 2008. If I've March 1 to April 15 for 2008 (total 45 days), then I've 30 days for 2007 (31-1) and 15 for 2008 (I can't use 15 minus 1 because it's be short of 1 day. It's fine if it's 31 days for 2007 and 14 days for 2008 as long as the total is still 45 days).
I tried the stupidest way as follows doing it by single year. But I ran into problems.
1. I tried to use macro but it didn't take yr1 and yr2 in the date condition (e.g. "31mar&yr2." "01apr&yr1.".
2. Most of all, the END-Start+1 for each year will give an extra day when I sum up all day variables for an ID.
To get around it I could tell SAS to find the variable that has the first days value and subtract 1 from it. But how do I do it?
Someone gave me an idea about subtracting the days for partial year (the beginning or ending of the year or both) and leave everything in between as 365 or 366 days, etc. But it doesn't seem feasible.
I thought I could use DO startyear TO endyear (along with function like year(intnx("YEAR.4",startdate,0)) to get around with the fiscal year). But how do I tell SAS to break the days by specific dates (say March 31 of each year). (If it's by every 365 days it's be earlier)
Here are the data and my clumsy SAS code. The desired output is at the end. BTW, if the exit date is missing, I set to Mar 31 of 2012 as its the end of the study period.
Many thanks.
data test;
format id best. entrydate exitdate yymmdd10.;
input id @3 entrydate yymmdd10. @14 exitdate yymmdd10.;
datalines;
1 2010-09-22
2 2008-09-22 2008-10-03
3 2007-03-19 2011-12-14
;
run;
proc print; run;
data want;
keep id entrydate exitdate start end days:;
set test ;
format start end yymmdd10.;
If entrydate < '1Apr2007'd and (exitdate GE '01Apr2006'd or exitdate = .) then do;
If exitdate = . then exitdate2 = '31Mar2007'd;
start = MAX('01Apr2006'd, entrydate);
end = MIN(exitdate, '31Mar2007'd);
days2006=end-start+1;
end;
If entrydate < '1Apr2008'd and (exitdate GE '01Apr2007'd or exitdate = .) then do;
If exitdate = . then exitdate2 = '31Mar2008'd;
start = MAX('01Apr2007'd, entrydate);
end = MIN(exitdate, '31Mar2008'd);
days2007=end-start+1;
end;
If entrydate < '1Apr2009'd and (exitdate GE '01Apr2008'd or exitdate = .) then do;
If exitdate = . then exitdate2 = '31Mar2009'd;
start = MAX('01Apr2008'd, entrydate);
end = MIN(exitdate, '31Mar2009'd);
days2008=end-start+1;
end;
If entrydate < '1Apr2010'd and (exitdate GE '01Apr2009'd or exitdate = .) then do;
If exitdate = . then exitdate2 = '31Mar2010'd;
start = MAX('01Apr2009'd, entrydate);
end = MIN(exitdate, '31Mar2010'd);
days2009=end-start+1;
end;
If entrydate < '1Apr2011'd and (exitdate GE '01Apr2010'd or exitdate = .) then do;
If exitdate = . then exitdate2 = '31Mar2011'd;
start = MAX('01Apr2010'd, entrydate);
end = MIN(exitdate, '31Mar2011'd);
days2010=end-start+1;
end;
If entrydate < '1Apr2012'd and (exitdate GE '01Apr2011'd or exitdate = .) then do;
If exitdate = . then exitdate2 = '31Mar2012'd;
start = MAX('01Apr2011'd, entrydate);
end = MIN(exitdate, '31Mar2012'd);
days2011=end-start+1;
end;
run;
proc print; run;
/* Actual output
ID Entrydate Exitdate FY2006 Fy2007 FY2008 FY2009 FY2010 FY2011
1 2010-09-22 191 366
2 2008-09-22 2008-10-03 12
3 2007-03-19 2011-12-14 13 366 365 365 365 258
*/
/* Desired output - I throw in LOS, just for reference and if exit date is missing, I set at Mar 31 2012:
ID Entrydate Exitdate LOS FY2006 Fy2007 FY2008 FY2009 FY2010 FY2011
1 2010-09-22 566 190 366
2 2008-09-22 2008-10-03 11
3 2007-03-19 2011-12-14 1731 12 366 365 365 365 258
*/
Will your years always range between 2006 and 2011?
No. The entry and exit date can be long before or after these years. I'm actually interested in 2002-2011 (that is Apr 1, 2002 to Mar 31, 2011).
Well...particularly inelegant as well but flexible....
data test;
format id best. entrydate exitdate yymmdd10.;
input id @3 entrydate yymmdd10. @14 exitdate yymmdd10.;
datalines;
1 2010-09-22
2 2008-09-22 2008-10-03
3 2007-03-19 2011-12-14
;
run;
data expand;
set test;
if exitdate =. then exitdate='31mar2012'd;
do date=entrydate to exitdate;
*calculate fiscal year;
fiscal_year=ifn(qtr(date)<1, year(date)+1, year(date));
output;
end;
run;
proc sql;
create table part1 as
select id, entrydate, exitdate, fiscal_year, count(*) as num_days
from expand
group by id, entrydate, exitdate, fiscal_year;
quit;
proc transpose data=part1 out=part2 prefix=FY;
by id entrydate exitdate;
id fiscal_year;
var num_days;
run;
Thanks a lot. It's one clever to do it, except the sum seem to mess up. The output numbers aren't quite right.
id | entrydate | exitdate | _NAME_ | FY2007 | FY2008 | FY2009 | FY2010 | FY2011 | FY2012 |
1 | 22/09/2010 | 31/03/2012 | num_days | . | . | . | 101 | 365 | 91 |
2 | 22/09/2008 | 03/10/2008 | num_days | . | 12 | . | . | . | . |
3 | 19/03/2007 | 14/12/2011 | num_days | 288 | 366 | 365 | 365 | 348 | . |
Need to be
id | entrydate | exitdate | _NAME_ | FY2006 | FY2007 | FY2008 | FY2009 | FY2010 | FY2011 |
1 | 22/09/2010 | 31/03/2012 | num_days | . | . | . | . | 190 | 366 |
2 | 22/09/2008 | 03/10/2008 | num_days | . | . | 11 | . | . | . |
3 | 19/03/2007 | 14/12/2011 | num_days | 12 | 366 | 365 | 365 | 365 | 258 |
probably due to fiscal_year=ifn(qtr(date)<1, year(date)+1, year(date))
I tried the following; it's fine except for the first year it is over counting by 1 day.
fiscal_year=year(intnx("year.4",date,0));
(The other problem with this is if I've a huge dataset like over 500,000 cases I might run into problems. But it's better than nothing if the overcounting issue could be fixed).
ifn(qtr(date)=1, year(date), year(date)+1)
Should be =1 to one actually, i.e. anything in the first quarter gets the previous fiscal year.
Basically if its the first 3 months then the fiscal year would be the previous year, else it would be the next year.
Actually there is a mathmatics problem in your logic .
2008/04/15 - 2008/03/01 = 45 days . but in reality, there is 46 days you should add 2008/03/01 too.
data test; format id best. entrydate exitdate yymmdd10.; input id @3 entrydate yymmdd10. @14 exitdate yymmdd10.; datalines; 1 2010-09-22 2 2008-09-22 2008-10-03 3 2007-03-19 2011-12-14 ; run; data expand; set test; if exitdate =. then exitdate='31mar2012'd; run; data temp; set expand; do temp=entrydate+1 to exitdate; year=ifn(month(temp) lt 4 ,year(temp)-1,year(temp)); output; end; keep id year; run; proc sql noprint; create table x as select *,count(*) as count from temp group by id,year; select distinct cats('x(where=(year=',year,') rename=(count=F',year,'))') into : list separated by ' ' from x; quit; data want; merge expand &list ; by id; drop year; run;
Ksharp
Thanks so much for all your replies. I just modified the code and it worked beautifully, simple and elegant.
As to the mathematical calculation, Ksharp you are right. It's generally calculated as 46 days for 4/15 - 3/01. But in our particular setting, we count check in today and check out tomorrow as staying for 1 day, not 2 days. That's why the total days are just date2 - date1 without adding an extra day. Again many thanks to both of you.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.