Hello,
I’ve a dataset with an ID variable and two time points that can be used to identify if an individual is active between two time points. I’d like to create a dataset that can record the person’s ID and if the person is active in each year. I'd like to do fiscal year (say from April 1 to March 31), though if one program can do calendar year (i.e. from Jan 1 to Dec 31), that's be terrific too. If the active period goes cross several years’ span, this person will have multiple records.
Here is the data I’ve:
data aa;
format id best. start end yymmdd10.;
input id start yymmdd10. end yymmdd10.;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-2-28
;
The output data should look like the following. Cyear is a bonus ONLY if it can easily done and does not over-complicate the program:
ID Fyear Cyear
1   2006  2006
2   2005 
2   2006  2006
3   2006  2006
3   2007  2007
3   2008  2008
4   2005  
4   2006  2006
4   2007  2007
4 2008 2008
4            2009
Many thanks.
I tried the following doing fiscal year, but it sort of worked and didn't work.
#1 It seems to be working - it's creating multiple year records. But the yr2 parameter is read in as say "2005+1" or "2006+1", so it isn't creating the right records - For one, year 2005 records aren't outputted.
#2 The code seems not very efficient as it is reading each record 6 times. If I've a large dataset or have a large time span (say from 1950 to 2011), the program is not very efficient.
Hopefully people have solutions to at least #1 above. Thanks.
%macro try;
data bb;
set aa;
format year best.;
%do i = 2005 %to 2011;
%let yr1 = &i;
%let yr2= &i.+1;
if START <= "31Mar&yr2."d and (END >="01Apr&yr1."d or END=.) then do; *need END=. to accommodate those who have open end date;
year=&yr1;
output;
end;
%end;
run;
%mend try;
%try;
p.s. I thought the following would work but it didn't work (year values aren't outputted).
| %do i = 2005 %to 2011 and j=2006 %to 2012; | 
%let yr1 = &i;
%let yr2= &j;
It can be done this way :
data aa; 
format id best. start end yymmdd10.;
input id start yymmdd10. end yymmdd10.;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-2-28
;
proc sort data=aa; by id; run;
data fiscalYears(keep=id year);
set aa;
do year = year(intnx("YEAR.4",start,0)) to year(intnx("YEAR.4",end,0));
 output;
 end;
run;
data calendarYears(keep=id year);
set aa;
do year = year(start) to year(end);
 output;
 end;
run;
data bothYears(keep=id fYear cYear);
merge fiscalYears (in=inf) calendarYears (in=inc) ;
by id year;
if inf then fYear = year;
if inc then cYear = year;
run;
proc print; run;
PG
If I understand what you mean .
data aa; format id best. start end yymmdd10.; input id start yymmdd10. end yymmdd10.; datalines; 1 2006-05-05 2006-06-30 2 2006-02-05 2006-05-30 3 2006-09-01 2008-11-30 4 2006-03-31 2009-2-28 ; run; data want(drop=i s e); set aa; s=mdy(4,1,year(start)); e=mdy(4,1,year(end)); if start lt s then do; fyear=year(start)-1; output;end; do i=year(start) to year(end)-1; fyear=i;cyear=i;output; end; cyear=year(end); if end lt e then call missing(fyear); else fyear=cyear; output; run;
Ksharp
data fy cy;
format id best. start end yymmdd10.;
input id start :yymmdd10. end :yymmdd10.;
nfy=intck('month12.4',start,end);
ncy=intck('year',start,end);
do i=0 to nfy;
fy=year(intnx('month12.4',start,i));
output fy;
end;
fy=.;
do i=0 to ncy;
cy=year(intnx('year',start,i));
output cy;
end;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-2-28
;
run;
proc sql;
create table want as
select coalesce(fy.id,cy.id) as id, fy.fy,cy.cy from
fy (keep=id fy) full join cy (keep=id cy)
on fy.id=cy.id and fy.fy=cy.cy;
quit;
Solph,
The more I look at your desired outcome, the more I question if that is what you should aim for. For example, why should calendar year always equal fiscal year (or be missing). There are some periods when they would be different. Consider the time period for ID=year and FYear=CYear=2007. Shouldn't that really be broken down into multiple time periods? One block of months would have FYear=2007 and CYear=2007, but another block of months would have FYear=2006 and CYear=2007. At any rate, here is a program that works slightly differently and breaks the data down into more blocks. (Thanks to AUTigers for looking up the syntax for intnx using month12.4.) Think about whether this result might better suit your needs:
data test;
   input id start : yymmdd10. end : yymmdd10.;
   n_months = intck('month', start, end);
   fyear = year( intnx('month12.4', start, 0) );
   cyear = year(start);
   output;
   do _n_=1 to n_months;
      test_date = intnx('month', start, _n_);
      test_fyear = year( intnx('month12.4', test_date, 0) );
      test_cyear = year(test_date);
      if (test_fyear ne fyear) or (test_cyear ne cyear) then do;
         fyear = test_fyear;
         cyear = test_cyear;
         output;
      end;
   end;
  * put cyear= fyear=;
  * put n_months=;
  keep id fyear cyear start end;
  format start end yymmdd10.;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-02-28
;
proc print; 
run;
Obs    id         start           end    fyear    cyear
  1     1    2006-05-05    2006-06-30     2006     2006
  2     2    2006-02-05    2006-05-30     2005     2006
  3     2    2006-02-05    2006-05-30     2006     2006
  4     3    2006-09-01    2008-11-30     2006     2006
  5     3    2006-09-01    2008-11-30     2006     2007
  6     3    2006-09-01    2008-11-30     2007     2007
  7     3    2006-09-01    2008-11-30     2007     2008
  8     3    2006-09-01    2008-11-30     2008     2008
  9     4    2006-03-31    2009-02-28     2005     2006
 10     4    2006-03-31    2009-02-28     2006     2006
 11     4    2006-03-31    2009-02-28     2006     2007
 12     4    2006-03-31    2009-02-28     2007     2007
 13     4    2006-03-31    2009-02-28     2007     2008
 14     4    2006-03-31    2009-02-28     2008     2008
 15     4    2006-03-31    2009-02-28     2008     2009
Thanks for everyone. It took me a while to digest your codes as I was not familiar with the INTNX function. And seeing how you (PGState, AUTigers and Astounding) used the function (similarly and differently) I can see how useful it is and it took me a while to get my head around it. Even it took me a while to go through Ksharp's neat code. All helped and worked perfectly. As to Astounding's question why I didn't consider the block, I can see the usefulness of it. It's just that often times we analyze data (say counting people) by either by fiscal year or by calendar year, and rarely both. I'm just hoping to consider both in one dataset so that I've the flexibility of using one or other (but not both).
It's such a great community to seek help. Thanks everyone.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
