Hello,
I’ve a dataset with an ID variable and two time points that can be used to identify if an individual is active between two time points. I’d like to create a dataset that can record the person’s ID and if the person is active in each year. I'd like to do fiscal year (say from April 1 to March 31), though if one program can do calendar year (i.e. from Jan 1 to Dec 31), that's be terrific too. If the active period goes cross several years’ span, this person will have multiple records.
Here is the data I’ve:
data aa;
format id best. start end yymmdd10.;
input id start yymmdd10. end yymmdd10.;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-2-28
;
The output data should look like the following. Cyear is a bonus ONLY if it can easily done and does not over-complicate the program:
ID Fyear Cyear
1 2006 2006
2 2005
2 2006 2006
3 2006 2006
3 2007 2007
3 2008 2008
4 2005
4 2006 2006
4 2007 2007
4 2008 2008
4 2009
Many thanks.
I tried the following doing fiscal year, but it sort of worked and didn't work.
#1 It seems to be working - it's creating multiple year records. But the yr2 parameter is read in as say "2005+1" or "2006+1", so it isn't creating the right records - For one, year 2005 records aren't outputted.
#2 The code seems not very efficient as it is reading each record 6 times. If I've a large dataset or have a large time span (say from 1950 to 2011), the program is not very efficient.
Hopefully people have solutions to at least #1 above. Thanks.
%macro try;
data bb;
set aa;
format year best.;
%do i = 2005 %to 2011;
%let yr1 = &i;
%let yr2= &i.+1;
if START <= "31Mar&yr2."d and (END >="01Apr&yr1."d or END=.) then do; *need END=. to accommodate those who have open end date;
year=&yr1;
output;
end;
%end;
run;
%mend try;
%try;
p.s. I thought the following would work but it didn't work (year values aren't outputted).
%do i = 2005 %to 2011 and j=2006 %to 2012; |
%let yr1 = &i;
%let yr2= &j;
It can be done this way :
data aa;
format id best. start end yymmdd10.;
input id start yymmdd10. end yymmdd10.;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-2-28
;
proc sort data=aa; by id; run;
data fiscalYears(keep=id year);
set aa;
do year = year(intnx("YEAR.4",start,0)) to year(intnx("YEAR.4",end,0));
output;
end;
run;
data calendarYears(keep=id year);
set aa;
do year = year(start) to year(end);
output;
end;
run;
data bothYears(keep=id fYear cYear);
merge fiscalYears (in=inf) calendarYears (in=inc) ;
by id year;
if inf then fYear = year;
if inc then cYear = year;
run;
proc print; run;
PG
If I understand what you mean .
data aa; format id best. start end yymmdd10.; input id start yymmdd10. end yymmdd10.; datalines; 1 2006-05-05 2006-06-30 2 2006-02-05 2006-05-30 3 2006-09-01 2008-11-30 4 2006-03-31 2009-2-28 ; run; data want(drop=i s e); set aa; s=mdy(4,1,year(start)); e=mdy(4,1,year(end)); if start lt s then do; fyear=year(start)-1; output;end; do i=year(start) to year(end)-1; fyear=i;cyear=i;output; end; cyear=year(end); if end lt e then call missing(fyear); else fyear=cyear; output; run;
Ksharp
data fy cy;
format id best. start end yymmdd10.;
input id start :yymmdd10. end :yymmdd10.;
nfy=intck('month12.4',start,end);
ncy=intck('year',start,end);
do i=0 to nfy;
fy=year(intnx('month12.4',start,i));
output fy;
end;
fy=.;
do i=0 to ncy;
cy=year(intnx('year',start,i));
output cy;
end;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-2-28
;
run;
proc sql;
create table want as
select coalesce(fy.id,cy.id) as id, fy.fy,cy.cy from
fy (keep=id fy) full join cy (keep=id cy)
on fy.id=cy.id and fy.fy=cy.cy;
quit;
Solph,
The more I look at your desired outcome, the more I question if that is what you should aim for. For example, why should calendar year always equal fiscal year (or be missing). There are some periods when they would be different. Consider the time period for ID=year and FYear=CYear=2007. Shouldn't that really be broken down into multiple time periods? One block of months would have FYear=2007 and CYear=2007, but another block of months would have FYear=2006 and CYear=2007. At any rate, here is a program that works slightly differently and breaks the data down into more blocks. (Thanks to AUTigers for looking up the syntax for intnx using month12.4.) Think about whether this result might better suit your needs:
data test;
input id start : yymmdd10. end : yymmdd10.;
n_months = intck('month', start, end);
fyear = year( intnx('month12.4', start, 0) );
cyear = year(start);
output;
do _n_=1 to n_months;
test_date = intnx('month', start, _n_);
test_fyear = year( intnx('month12.4', test_date, 0) );
test_cyear = year(test_date);
if (test_fyear ne fyear) or (test_cyear ne cyear) then do;
fyear = test_fyear;
cyear = test_cyear;
output;
end;
end;
* put cyear= fyear=;
* put n_months=;
keep id fyear cyear start end;
format start end yymmdd10.;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-02-28
;
proc print;
run;
Obs id start end fyear cyear
1 1 2006-05-05 2006-06-30 2006 2006
2 2 2006-02-05 2006-05-30 2005 2006
3 2 2006-02-05 2006-05-30 2006 2006
4 3 2006-09-01 2008-11-30 2006 2006
5 3 2006-09-01 2008-11-30 2006 2007
6 3 2006-09-01 2008-11-30 2007 2007
7 3 2006-09-01 2008-11-30 2007 2008
8 3 2006-09-01 2008-11-30 2008 2008
9 4 2006-03-31 2009-02-28 2005 2006
10 4 2006-03-31 2009-02-28 2006 2006
11 4 2006-03-31 2009-02-28 2006 2007
12 4 2006-03-31 2009-02-28 2007 2007
13 4 2006-03-31 2009-02-28 2007 2008
14 4 2006-03-31 2009-02-28 2008 2008
15 4 2006-03-31 2009-02-28 2008 2009
Thanks for everyone. It took me a while to digest your codes as I was not familiar with the INTNX function. And seeing how you (PGState, AUTigers and Astounding) used the function (similarly and differently) I can see how useful it is and it took me a while to get my head around it. Even it took me a while to go through Ksharp's neat code. All helped and worked perfectly. As to Astounding's question why I didn't consider the block, I can see the usefulness of it. It's just that often times we analyze data (say counting people) by either by fiscal year or by calendar year, and rarely both. I'm just hoping to consider both in one dataset so that I've the flexibility of using one or other (but not both).
It's such a great community to seek help. Thanks everyone.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.