BookmarkSubscribeRSS Feed
Solph
Pyrite | Level 9

I've a data step question. Any help is appreciated.

Basically I've a dataset with ID, entry date and exit date which can last as short as several days up to several years, and it can cut across two year points. What I like to do is allocate the days into yearly periods (say 5 days in year 1, 365 in year 2 and 100 days in year 3, for a total of 470 days). For the yearly period, I'd like to try April 1-March 31 as the cycle - I call it fiscal year.

Say I've days from May 1 to May 9, 2008, then it's 8 days for 2008. If I've March 1 to April 15 for 2008 (total 45 days), then  I've 30 days for 2007 (31-1) and 15 for 2008 (I can't use 15 minus 1 because it's be short of 1 day. It's fine if it's 31 days for 2007 and 14 days for 2008 as long as the total is still 45 days).

I tried the stupidest way as follows doing it by single year. But I ran into problems.

1. I tried to use macro but it didn't take yr1 and yr2 in the date condition (e.g. "31mar&yr2." "01apr&yr1.".

2. Most of all, the END-Start+1 for each year will give an extra day when I sum up all day variables for an ID.

    To get around it I could tell SAS to find the variable that has the first days value and subtract 1 from it. But how do I do it?

Someone gave me an idea about subtracting the days for partial year (the beginning or ending of the year or both) and leave everything in between as 365 or 366 days, etc. But it doesn't seem feasible.

I thought I could use DO startyear TO endyear (along with function like year(intnx("YEAR.4",startdate,0)) to get around with the fiscal year). But how do I tell SAS to break the days by specific dates (say March 31 of each year). (If it's by every 365 days it's be earlier)

Here are the data and my clumsy SAS code. The desired output is at the end. BTW, if the exit date is missing, I set to Mar 31 of 2012 as its the end of the study period.

Many thanks.

data test;

format id best. entrydate exitdate yymmdd10.;

input id @3 entrydate yymmdd10. @14  exitdate yymmdd10.;

datalines;

1 2010-09-22   

2 2008-09-22 2008-10-03

3 2007-03-19 2011-12-14

;

run;

proc print; run;

data want;

    keep id entrydate exitdate start end days:;

    set test ;

    format start end yymmdd10.;

    If entrydate < '1Apr2007'd and (exitdate GE '01Apr2006'd or exitdate = .) then do; 

        If exitdate = . then exitdate2 = '31Mar2007'd;

        start = MAX('01Apr2006'd, entrydate);

        end = MIN(exitdate, '31Mar2007'd);

        days2006=end-start+1;

     end;

    If entrydate < '1Apr2008'd and (exitdate GE '01Apr2007'd or exitdate = .) then do; 

        If exitdate = . then exitdate2 = '31Mar2008'd;

        start = MAX('01Apr2007'd, entrydate);

        end = MIN(exitdate, '31Mar2008'd);

        days2007=end-start+1;

     end;

    If entrydate < '1Apr2009'd and (exitdate GE '01Apr2008'd or exitdate = .) then do; 

        If exitdate = . then exitdate2 = '31Mar2009'd;

        start = MAX('01Apr2008'd, entrydate);

        end = MIN(exitdate, '31Mar2009'd);

        days2008=end-start+1;

     end;

    If entrydate < '1Apr2010'd and (exitdate GE '01Apr2009'd or exitdate = .) then do; 

        If exitdate = . then exitdate2 = '31Mar2010'd;

        start = MAX('01Apr2009'd, entrydate);

        end = MIN(exitdate, '31Mar2010'd);

        days2009=end-start+1;

     end;

    If entrydate < '1Apr2011'd and (exitdate GE '01Apr2010'd or exitdate = .) then do; 

        If exitdate = . then exitdate2 = '31Mar2011'd;

        start = MAX('01Apr2010'd, entrydate);

        end = MIN(exitdate, '31Mar2011'd);

        days2010=end-start+1;

     end;

    If entrydate < '1Apr2012'd and (exitdate GE '01Apr2011'd or exitdate = .) then do; 

        If exitdate = . then exitdate2 = '31Mar2012'd;

        start = MAX('01Apr2011'd, entrydate);

        end = MIN(exitdate, '31Mar2012'd);

        days2011=end-start+1;

     end;

run;

proc print; run;

/* Actual output

ID    Entrydate    Exitdate                 FY2006  Fy2007  FY2008  FY2009  FY2010 FY2011

1    2010-09-22                                                                                         191     366

2    2008-09-22    2008-10-03                                              12           

3    2007-03-19    2011-12-14                13         366           365     365        365      258

*/

/* Desired output - I throw in LOS, just for reference and if exit date is missing, I set at Mar 31 2012:

ID    Entrydate    Exitdate         LOS  FY2006  Fy2007  FY2008  FY2009  FY2010 FY2011

1    2010-09-22                         566                                                            190     366

2    2008-09-22    2008-10-03                                              11           

3    2007-03-19    2011-12-14    1731     12         366           365      365         365     258

*/

7 REPLIES 7
Reeza
Super User

Will your years always range between 2006 and 2011?

Solph
Pyrite | Level 9

No. The entry and exit date can be long before or after these years. I'm actually interested in 2002-2011 (that is Apr 1, 2002 to Mar 31, 2011).

Reeza
Super User

Well...particularly inelegant as well but flexible....

data test;

format id best. entrydate exitdate yymmdd10.;

input id @3 entrydate yymmdd10. @14  exitdate yymmdd10.;

datalines;

1 2010-09-22  

2 2008-09-22 2008-10-03

3 2007-03-19 2011-12-14

;

run;

data expand;

    set test;

    if exitdate =. then exitdate='31mar2012'd;

    do date=entrydate to exitdate;

        *calculate fiscal year;

        fiscal_year=ifn(qtr(date)<1, year(date)+1, year(date));

        output;

    end;

run;

proc sql;

    create table part1 as

    select id, entrydate, exitdate, fiscal_year, count(*) as num_days

    from expand

    group by id, entrydate, exitdate, fiscal_year;

quit;

proc transpose data=part1 out=part2 prefix=FY;

    by id entrydate exitdate;

    id fiscal_year;

    var num_days;

run;

Solph
Pyrite | Level 9

Thanks a lot. It's one clever to do it, except the sum seem to mess up. The output numbers aren't quite right.

identrydateexitdate_NAME_FY2007FY2008FY2009FY2010FY2011FY2012
122/09/201031/03/2012num_days...10136591
222/09/200803/10/2008num_days.12....
319/03/200714/12/2011num_days288366365365348.

Need to be

identrydateexitdate_NAME_FY2006FY2007FY2008FY2009FY2010FY2011
122/09/201031/03/2012num_days....190366
222/09/200803/10/2008num_days..11...
319/03/200714/12/2011num_days12366365365365258

probably due to fiscal_year=ifn(qtr(date)<1, year(date)+1, year(date))

I tried the following; it's fine except for the first year it is over counting by 1 day.

fiscal_year=year(intnx("year.4",date,0));

(The other problem with this is if I've a huge dataset like over 500,000 cases I might run into problems. But it's better than nothing if the overcounting issue could be fixed).

Reeza
Super User

ifn(qtr(date)=1, year(date), year(date)+1)


Should be =1 to one actually, i.e. anything in the first quarter gets the previous fiscal year.

Basically if its the first 3 months then the fiscal year would be the previous year, else it would be the next year. 



Ksharp
Super User

Actually there is a mathmatics problem in your logic .

2008/04/15 - 2008/03/01 = 45 days . but in reality, there is 46 days you should add 2008/03/01 too.

data test;
format id best. entrydate exitdate yymmdd10.;
input id @3 entrydate yymmdd10. @14  exitdate yymmdd10.;
datalines;
1 2010-09-22  
2 2008-09-22 2008-10-03
3 2007-03-19 2011-12-14
;
run;


data expand;
    set test;
    if exitdate =. then exitdate='31mar2012'd;
run;
data temp;
 set expand;
 do temp=entrydate+1 to exitdate;
  year=ifn(month(temp) lt 4 ,year(temp)-1,year(temp));
  output;
 end;
 keep id year;
run;

proc sql noprint;
create table x as 
 select *,count(*) as count from temp group by id,year;
select distinct cats('x(where=(year=',year,') rename=(count=F',year,'))') into : list separated by ' ' from x;
quit;
data want;
 merge expand  &list ;
 by id;
 drop year;
run;



Ksharp

Solph
Pyrite | Level 9

Thanks so much for all your replies. I just modified the code and it worked beautifully, simple and elegant.

As to the mathematical calculation, Ksharp you are right. It's generally calculated as 46 days for 4/15 - 3/01. But in our particular setting, we count check in today and check out tomorrow as staying for 1 day, not 2 days. That's why the total days are just date2 - date1 without adding an extra day. Again many thanks to both of you.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1046 views
  • 0 likes
  • 3 in conversation