Help using Base SAS procedures

Do loop to find a person's status over two time points?

Reply
Frequent Contributor
Posts: 107

Do loop to find a person's status over two time points?

Hello,

I’ve a dataset with an ID variable and two time points that can be used to identify if an individual is active between two time points. I’d like to create a dataset that can record the person’s ID and if the person is active in each year. I'd like to do fiscal year (say from April 1 to March 31), though if one program can do calendar year (i.e. from Jan 1 to Dec 31), that's be terrific too. If the active period goes cross several years’ span, this person will have multiple records.

Here is the data I’ve:

data aa;

format id best. start end yymmdd10.;

input id start yymmdd10. end yymmdd10.;

datalines;

1 2006-05-05 2006-06-30

2 2006-02-05 2006-05-30

3 2006-09-01 2008-11-30

4 2006-03-31 2009-2-28

;

The output data should look like the following. Cyear is a bonus ONLY if it can easily done and does not over-complicate the program:

ID Fyear Cyear

1   2006  2006

2   2005

2   2006  2006

3   2006  2006

3   2007  2007

3   2008  2008

4   2005 

4   2006  2006

4   2007  2007

4   2008  2008

4            2009

Many thanks.

Frequent Contributor
Posts: 107

Re: Do loop to find a person's status over two time points?

I tried the following doing fiscal year, but it sort of worked and didn't work.

#1 It seems to be working - it's creating multiple year records. But the yr2 parameter is read in as say "2005+1" or "2006+1", so it isn't creating the right records - For one, year 2005 records aren't outputted.

#2 The code seems not very efficient as it is reading each record 6 times. If I've a large dataset or have a large time span (say from 1950 to 2011), the program is not very efficient.

Hopefully people have solutions to at least #1 above. Thanks.

%macro try;

data bb;

    set aa;

    format year best.;

    %do i = 2005 %to 2011;

    %let yr1 = &i;

    %let yr2= &i.+1;

        if START <= "31Mar&yr2."d and (END >="01Apr&yr1."d or END=.) then do;  *need END=. to accommodate those who have open end date;

        year=&yr1;

        output;

        end;

    %end;

    run;

%mend try;

%try;

p.s. I thought the following would work but it didn't work (year values aren't outputted).

%do i = 2005 %to 2011 and j=2006 %to 2012;

    %let yr1 = &i;

    %let yr2= &j;

Respected Advisor
Posts: 4,919

Re: Do loop to find a person's status over two time points?

It can be done this way :

data aa;
format id best. start end yymmdd10.;
input id start yymmdd10. end yymmdd10.;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-2-28
;

proc sort data=aa; by id; run;

data fiscalYears(keep=id year);
set aa;
do year = year(intnx("YEAR.4",start,0)) to year(intnx("YEAR.4",end,0));
output;
end;
run;

data calendarYears(keep=id year);
set aa;
do year = year(start) to year(end);
output;
end;
run;

data bothYears(keep=id fYear cYear);
merge fiscalYears (in=inf) calendarYears (in=inc) ;
by id year;
if inf then fYear = year;
if inc then cYear = year;
run;

proc print; run;

PG

PG
Super User
Posts: 10,018

Re: Do loop to find a person's status over two time points?

If I understand what you mean .

data aa;
format id best. start end yymmdd10.;
input id start yymmdd10. end yymmdd10.;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-2-28
;
run;
data want(drop=i s e);
 set aa;
 s=mdy(4,1,year(start));
 e=mdy(4,1,year(end));
 if start lt s then do; fyear=year(start)-1; output;end;
 do i=year(start) to year(end)-1;
  fyear=i;cyear=i;output;
 end;
 cyear=year(end);
 if end lt e then call missing(fyear);
  else fyear=cyear;
 output;
run;

Ksharp

Contributor
Posts: 21

Re: Do loop to find a person's status over two time points?

data fy cy;

format id best. start end yymmdd10.;

input id start :yymmdd10. end :yymmdd10.;

nfy=intck('month12.4',start,end);

ncy=intck('year',start,end);

do i=0 to nfy;

fy=year(intnx('month12.4',start,i));

output fy;

end;

fy=.;

do i=0 to ncy;

cy=year(intnx('year',start,i));

output cy;

end;

datalines;

1 2006-05-05 2006-06-30

2 2006-02-05 2006-05-30

3 2006-09-01 2008-11-30

4 2006-03-31 2009-2-28

;

run;

proc sql;

create table want as

  select coalesce(fy.id,cy.id) as id, fy.fy,cy.cy from

    fy (keep=id fy) full join cy (keep=id cy)

      on fy.id=cy.id and fy.fy=cy.cy;

  quit;

Super User
Posts: 5,497

Re: Do loop to find a person's status over two time points?

Solph,

The more I look at your desired outcome, the more I question if that is what you should aim for.  For example, why should calendar year always equal fiscal year (or be missing).  There are some periods when they would be different.  Consider the time period for ID=year and FYear=CYear=2007.  Shouldn't that really be broken down into multiple time periods?  One block of months would have FYear=2007 and CYear=2007, but another block of months would have FYear=2006 and CYear=2007.  At any rate, here is a program that works slightly differently and breaks the data down into more blocks.  (Thanks to AUTigers for looking up the syntax for intnx using month12.4.)  Think about whether this result might better suit your needs:

data test;
   input id start : yymmdd10. end : yymmdd10.;
   n_months = intck('month', start, end);
   fyear = year( intnx('month12.4', start, 0) );
   cyear = year(start);
   output;
   do _n_=1 to n_months;
      test_date = intnx('month', start, _n_);
      test_fyear = year( intnx('month12.4', test_date, 0) );
      test_cyear = year(test_date);
      if (test_fyear ne fyear) or (test_cyear ne cyear) then do;
         fyear = test_fyear;
         cyear = test_cyear;
         output;
      end;
   end;
  * put cyear= fyear=;
  * put n_months=;
  keep id fyear cyear start end;
  format start end yymmdd10.;
datalines;
1 2006-05-05 2006-06-30
2 2006-02-05 2006-05-30
3 2006-09-01 2008-11-30
4 2006-03-31 2009-02-28
;

proc print;
run;


Obs    id         start           end    fyear    cyear

  1     1    2006-05-05    2006-06-30     2006     2006
  2     2    2006-02-05    2006-05-30     2005     2006
  3     2    2006-02-05    2006-05-30     2006     2006
  4     3    2006-09-01    2008-11-30     2006     2006
  5     3    2006-09-01    2008-11-30     2006     2007
  6     3    2006-09-01    2008-11-30     2007     2007
  7     3    2006-09-01    2008-11-30     2007     2008
  8     3    2006-09-01    2008-11-30     2008     2008
  9     4    2006-03-31    2009-02-28     2005     2006
10     4    2006-03-31    2009-02-28     2006     2006
11     4    2006-03-31    2009-02-28     2006     2007
12     4    2006-03-31    2009-02-28     2007     2007
13     4    2006-03-31    2009-02-28     2007     2008
14     4    2006-03-31    2009-02-28     2008     2008
15     4    2006-03-31    2009-02-28     2008     2009

Frequent Contributor
Posts: 107

Re: Do loop to find a person's status over two time points?

Posted in reply to Astounding

Thanks for everyone. It took me a while to digest your codes as I was not familiar with the INTNX function. And seeing how you (PGState, AUTigers and Astounding) used the function (similarly and differently) I can see how useful it is and it took me a while to get my head around it. Even it took me a while to go through Ksharp's neat code. All helped and worked perfectly. As to Astounding's question why I didn't consider the block, I can see the usefulness of it. It's just that often times we analyze data (say counting people) by either by fiscal year or by calendar year, and rarely both. I'm just hoping to consider both in one dataset so that I've the flexibility of using one or other (but not both).

It's such a great community to seek help. Thanks everyone.

Ask a Question
Discussion stats
  • 6 replies
  • 349 views
  • 0 likes
  • 5 in conversation