BookmarkSubscribeRSS Feed
Solph
Pyrite | Level 9

Hello,

I’ve a dataset with an ID variable and two time points that can be used to identify if an individual is active between two time points. I’d like to create a dataset that can record the person’s ID and if the person is active in each year. I'd like to do fiscal year (say from April 1 to March 31), though if one program can do calendar year (i.e. from Jan 1 to Dec 31), that's be terrific too. If the active period goes cross several years’ span, this person will have multiple records.

Here is the data I’ve:

data aa;

format id best. start end yymmdd10.;

input id start yymmdd10. end yymmdd10.;

datalines;

1 2006-05-05 2006-06-30

2 2006-02-05 2006-05-30

3 2006-09-01 2008-11-30

4 2006-03-31 2009-2-28

;

The output data should look like the following. Cyear is a bonus ONLY if it can easily done and does not over-complicate the program:

ID Fyear Cyear

1   2006  2006

2   2005

2   2006  2006

3   2006  2006

3   2007  2007

3   2008  2008

4   2005 

4   2006  2006

4   2007  2007

4   2008  2008

4            2009

Many thanks.

6 REPLIES 6
Solph
Pyrite | Level 9

I tried the following doing fiscal year, but it sort of worked and didn't work.

#1 It seems to be working - it's creating multiple year records. But the yr2 parameter is read in as say "2005+1" or "2006+1", so it isn't creating the right records - For one, year 2005 records aren't outputted.

#2 The code seems not very efficient as it is reading each record 6 times. If I've a large dataset or have a large time span (say from 1950 to 2011), the program is not very efficient.

Hopefully people have solutions to at least #1 above. Thanks.

%macro try;

data bb;

    set aa;

    format year best.;

    %do i = 2005 %to 2011;

    %let yr1 = &i;

    %let yr2= &i.+1;

        if START <= "31Mar&yr2."d and (END >="01Apr&yr1."d or END=.) then do;  *need END=. to accommodate those who have open end date;

        year=&yr1;

        output;

        end;

    %end;

    run;

%mend try;

%try;

p.s. I thought the following would work but it didn't work (year values aren't outputted).

%do i = 2005 %to 2011 and j=2006 %to 2012;

    %let yr1 = &i;

    %let yr2= &j;

PGStats
Opal | Level 21

It can be done this way :

data aa;
format id best. start end yymmdd10.;
input id start yymmdd10. end yymmdd10.;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-2-28
;

proc sort data=aa; by id; run;

data fiscalYears(keep=id year);
set aa;
do year = year(intnx("YEAR.4",start,0)) to year(intnx("YEAR.4",end,0));
output;
end;
run;

data calendarYears(keep=id year);
set aa;
do year = year(start) to year(end);
output;
end;
run;

data bothYears(keep=id fYear cYear);
merge fiscalYears (in=inf) calendarYears (in=inc) ;
by id year;
if inf then fYear = year;
if inc then cYear = year;
run;

proc print; run;

PG

PG
Ksharp
Super User

If I understand what you mean .

data aa;
format id best. start end yymmdd10.;
input id start yymmdd10. end yymmdd10.;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-2-28
;
run;
data want(drop=i s e);
 set aa;
 s=mdy(4,1,year(start));
 e=mdy(4,1,year(end));
 if start lt s then do; fyear=year(start)-1; output;end;
 do i=year(start) to year(end)-1;
  fyear=i;cyear=i;output;
 end;
 cyear=year(end);
 if end lt e then call missing(fyear);
  else fyear=cyear;
 output;
run;

Ksharp

AUTigers
Calcite | Level 5

data fy cy;

format id best. start end yymmdd10.;

input id start :yymmdd10. end :yymmdd10.;

nfy=intck('month12.4',start,end);

ncy=intck('year',start,end);

do i=0 to nfy;

fy=year(intnx('month12.4',start,i));

output fy;

end;

fy=.;

do i=0 to ncy;

cy=year(intnx('year',start,i));

output cy;

end;

datalines;

1 2006-05-05 2006-06-30

2 2006-02-05 2006-05-30

3 2006-09-01 2008-11-30

4 2006-03-31 2009-2-28

;

run;

proc sql;

create table want as

  select coalesce(fy.id,cy.id) as id, fy.fy,cy.cy from

    fy (keep=id fy) full join cy (keep=id cy)

      on fy.id=cy.id and fy.fy=cy.cy;

  quit;

Astounding
PROC Star

Solph,

The more I look at your desired outcome, the more I question if that is what you should aim for.  For example, why should calendar year always equal fiscal year (or be missing).  There are some periods when they would be different.  Consider the time period for ID=year and FYear=CYear=2007.  Shouldn't that really be broken down into multiple time periods?  One block of months would have FYear=2007 and CYear=2007, but another block of months would have FYear=2006 and CYear=2007.  At any rate, here is a program that works slightly differently and breaks the data down into more blocks.  (Thanks to AUTigers for looking up the syntax for intnx using month12.4.)  Think about whether this result might better suit your needs:

data test;
   input id start : yymmdd10. end : yymmdd10.;
   n_months = intck('month', start, end);
   fyear = year( intnx('month12.4', start, 0) );
   cyear = year(start);
   output;
   do _n_=1 to n_months;
      test_date = intnx('month', start, _n_);
      test_fyear = year( intnx('month12.4', test_date, 0) );
      test_cyear = year(test_date);
      if (test_fyear ne fyear) or (test_cyear ne cyear) then do;
         fyear = test_fyear;
         cyear = test_cyear;
         output;
      end;
   end;
  * put cyear= fyear=;
  * put n_months=;
  keep id fyear cyear start end;
  format start end yymmdd10.;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-02-28
;

proc print;
run;


Obs    id         start           end    fyear    cyear

  1     1    2006-05-05    2006-06-30     2006     2006
  2     2    2006-02-05    2006-05-30     2005     2006
  3     2    2006-02-05    2006-05-30     2006     2006
  4     3    2006-09-01    2008-11-30     2006     2006
  5     3    2006-09-01    2008-11-30     2006     2007
  6     3    2006-09-01    2008-11-30     2007     2007
  7     3    2006-09-01    2008-11-30     2007     2008
  8     3    2006-09-01    2008-11-30     2008     2008
  9     4    2006-03-31    2009-02-28     2005     2006
10     4    2006-03-31    2009-02-28     2006     2006
11     4    2006-03-31    2009-02-28     2006     2007
12     4    2006-03-31    2009-02-28     2007     2007
13     4    2006-03-31    2009-02-28     2007     2008
14     4    2006-03-31    2009-02-28     2008     2008
15     4    2006-03-31    2009-02-28     2008     2009

Solph
Pyrite | Level 9

Thanks for everyone. It took me a while to digest your codes as I was not familiar with the INTNX function. And seeing how you (PGState, AUTigers and Astounding) used the function (similarly and differently) I can see how useful it is and it took me a while to get my head around it. Even it took me a while to go through Ksharp's neat code. All helped and worked perfectly. As to Astounding's question why I didn't consider the block, I can see the usefulness of it. It's just that often times we analyze data (say counting people) by either by fiscal year or by calendar year, and rarely both. I'm just hoping to consider both in one dataset so that I've the flexibility of using one or other (but not both).

It's such a great community to seek help. Thanks everyone.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1144 views
  • 0 likes
  • 5 in conversation