Hello,
I'd like compare two dates (say start date and end date for program admission and discharge, moving in and out of a city) with another two dates (usually by calendar year from Jan 1 to Dec 31, or fiscal year from April 1 of one year to March 31 of the next year). I'd like to output records that meet the criteria for each year so that I can know if they are active in a given year.
Here is an example and in this particular cases, the comparison range is April 1 to March 31. So a person's record should be outputted for each year he or she is active between April 1 of and March 31. The end date can be missing (though I could set the missing as the current date) and a person could have multiple entries and exits.
I've the following code that is doing the job, but I am hoping to simplify the code by using macro or write different code. How do I start? I learned from this forum to use the function year(intnx("YEAR.4",sdate,0)) to select fiscal year. I know there are numerous date functions but I couldn't find examples similar to what I wanted to do. Many thanks.
data test;
format id best. startdate enddate yymmdd10.;
input id @3 startdate yymmdd10. @14 enddate yymmdd10. @25 amount;
datalines;
1 2003-04-01 2005-01-01 3
1 2007-06-01 2009-01-01 4
2 2004-01-01 2009-12-01 2
3 2003-01-01 2005-01-02 9
3 2007-01-01 4
4 2006-06-01 2009-01-01 1
5 2007-07-01 2009-01-01 0
;
run;
data yr2002 yr2003 yr2004 yr2005 yr2006 yr2007 yr2008 yr2009 yr2010; set test;
if startdate <= '31Mar2003'd and (enddate >= '1Apr2002'd or enddate = .) then output yr2002;
if startdate <= '31Mar2004'd and (enddate >= '1Apr2003'd or enddate = .) then output yr2003;
if startdate <= '31Mar2005'd and (enddate >= '1Apr2004'd or enddate = .) then output yr2004;
if startdate <= '31Mar2006'd and (enddate >= '1Apr2005'd or enddate = .) then output yr2005;
if startdate <= '31Mar2006'd and (enddate >= '1Apr2005'd or enddate = .) then output yr2005;
if startdate <= '31Mar2007'd and (enddate >= '1Apr2006'd or enddate = .) then output yr2006;
if startdate <= '31Mar2008'd and (enddate >= '1Apr2007'd or enddate = .) then output yr2007;
run;
Results for selected years are as follows:
yr2002
id startdate enddate amount
3 2003-01-01 2005-01-02 9
yr2003
id startdate enddate amount
1 2003-04-01 2005-01-01 3
2 2004-01-01 2009-12-01 2
3 2003-01-01 2005-01-02 9
yr2007
id startdate enddate amount
1 2007-06-01 2009-01-01 4
2 2004-01-01 2009-12-01 2
3 2007-01-01 . 4
4 2006-06-01 2009-01-01 1
5 2007-07-01 2009-01-01 0
Here's one way.
One change I do make with this type of data is to add an enddate for those that are missing. Either the max current value or a value that's riduculous, 2099 for example depending on what I want to do.
In this case I made it today() so it would reflect current data.
Then I basically expand the data set so there's a record for each year of interest and output that to a new dataset. That dataset can then by used with by variables if necessary later on rather than have multiple datasets for each year.
data test;
format id best. startdate enddate yymmdd10.;
input id @3 startdate yymmdd10. @14 enddate yymmdd10. @25 amount;
datalines;
1 2003-04-01 2005-01-01 3
1 2007-06-01 2009-01-01 4
2 2004-01-01 2009-12-01 2
3 2003-01-01 2005-01-02 9
3 2007-01-01 4
4 2006-06-01 2009-01-01 1
5 2007-07-01 2009-01-01 0
;
run;
data want;
set test;
if enddate=. then enddate=today();
date=startdate;
do while (date < enddate);
*find fiscal year;
year=year(intnx("Year.4", date, 0));
date=intnx('Year', date, 1, 'same');
output;
end;
run;
proc print data=want;
where year=2002;
run;
proc print data=want;
where year=2003;
run;
proc print data=want;
where year=2007;
run;
Your question is not really easy.
data test; format id best. startdate enddate yymmdd10.; input id @3 startdate yymmdd10. @14 enddate yymmdd10. @25 amount; datalines; 1 2003-04-01 2005-01-01 3 1 2007-06-01 2009-01-01 4 2 2004-01-01 2009-12-01 2 3 2003-01-01 2005-01-02 9 3 2007-01-01 . 4 4 2006-06-01 2009-01-01 1 5 2007-07-01 2009-01-01 0 ; run; data want(drop=_year i); set test; if enddate=. then enddate=today(); do i=startdate to enddate ; if year(intnx('year.4',i,0)) ne _year then do; year=year(intnx('year.4',i,0)); output; end; _year=year(intnx('year.4',i,0)); end; run; proc sort data=want;by year; run; data _null_; if _n_ eq 1 then do; if 0 then set want; declare hash ha(multidata:'Y'); ha.definekey('year'); ha.definedata('id','startdate','enddate','amount'); ha.definedone(); end; do until(last.year); set want; by year; ha.add(); end; ha.output(dataset:cats('yr',year)); ha.clear(); run;
Ksharp
Hi,
I have withdrawn my previous post for not reading through OP's request. Here is my new try, for the dynamic approach:
/*raw data stolen from Fareeza above*/
data test;
format id best. startdate enddate yymmdd10.;
input id @3 startdate yymmdd10. @14 enddate yymmdd10. @25 amount;
datalines;
1 2003-04-01 2005-01-01 3
1 2007-06-01 2009-01-01 4
2 2004-01-01 2009-12-01 2
3 2003-01-01 2005-01-02 9
3 2007-01-01 4
4 2006-06-01 2009-01-01 1
5 2007-07-01 2009-01-01 0
;
run;
/*to get the end year*/
proc sql;
select max(year(intnx('year.4',enddate,0,'b'))) into :max_year from test;
quit;
data want;
set test;
_s=intnx('year.4', startdate,0,'b');
_e=ifn(missing(enddate),&max_year,year(intnx('year.4',enddate,0,'b')));
do year=year(_s) to _e;
output;
end;
drop _:;
run;
proc sort data=want;
by year;
run;
/*dynamically output data set by year without hard coding their name*/
data _null_;
declare hash h(multidata:'y');
h.definekey('year');
h.definedata('id', 'startdate', 'enddate', 'amount');
h.definedone();
do until (last.year);
set want;
by year;
rc=h.add();
end;
h.output(dataset:'yr'||put(year,z4.));
run;
Haikuo
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.