DATA Step, Macro, Functions and more

To ascertain/test "consecutiveness" of variable values by ids

Reply
Occasional Contributor
Posts: 19

To ascertain/test "consecutiveness" of variable values by ids

My data looks like this

id     yr

1     1990

1     1990

1     1991

1     1991

1     1992

1     1993

1     1993

1     1994

1     1996

2     1996

2     1997

2     1997

2     1998

2     1999

2     2000

2     2001

2     2001

I want SAS to tell me if the given "id" had at least 6 consecutive years (of say, some test done.) In this case, id 1 has year 1995 missing but id 2 fits our bill.

I tried subtracting the min of the year from the max of the year for a given id but that might give erroneous answers if total years are 7 and even 1 year is missing. The subtraction will still be 6.

Thanks.

Super User
Posts: 19,804

Re: To ascertain/test "consecutiveness" of variable values by ids

Here's a long way to do it....



data have;

input id     yr;

cards;

1     1990

1     1990

1     1991

1     1991

1     1992

1     1993

1     1993

1     1994

1     1996

2     1996

2     1997

2     1997

2     1998

2     1999

2     2000

2     2001

2     2001

;

run;

proc sort data=have; by id yr;

data want1;

set have;

by id yr;

retain count;

lag_yr=lag(yr);

if yr=lag_yr+1 then count+1;

else if yr=lag_yr then count=count;

else if yr ne lag_yr then count=1;

if first.id then do;

  lag_yr=.;

  count=1;

end;

run;

proc sql;

  select distinct(id) from want1

  where count>=6;

quit;

Super Contributor
Posts: 297

Re: To ascertain/test "consecutiveness" of variable values by ids

The following is using a HASH OBJECT.  By default the HASH OBJECT does not allow for duplicate key values.  As a result you process and deduplicate in a single step.

DATA _NULL_;

DECLARE HASH HH(ORDERED:'YES');

HH.DEFINEKEY('ID');

HH.DEFINEDATA('ID','CONSEC_YRS');

HH.DEFINEDONE();

DO I = 1 BY 1 UNTIL(DONE);

  SET HAVE END=DONE;

  BY ID YR;

  RETAIN CONSEC_YRS;

  IF FIRST.ID THEN

  DO;

  YR1 = 0;

  CONSEC_YRS = 1;

  END;

  YR1 = LAG(YR);

  YR_INCREMENT = SUM(YR,-YR1);

  IF YR_INCREMENT = 1 THEN CONSEC_YRS+1;

  IF YR_INCREMENT > 1 THEN CONSEC_YRS = 1;

  IF CONSEC_YRS >= 6 THEN DO;

  RC = HH.CHECK();

  IF RC ~= 0 THEN RC1 = HH.ADD(); 

  ELSE IF RC = 0 THEN RC2 = HH.REPLACE();

  END;

END;

HH.OUTPUT(DATASET:'WANT');

RUN;

Respected Advisor
Posts: 3,156

Re: To ascertain/test "consecutiveness" of variable values by ids

Here is a Proc SQL way:

data have;

input id     yr;

cards;

1     1990

1     1990

1     1991

1     1991

1     1992

1     1993

1     1993

1     1994

1     1996

2     1996

2     1997

2     1997

2     1998

2     1999

2     2000

2     2001

2     2001

;

proc sql;

  create table want (drop=_yr) as

    select *, count(distinct yr) as _yr from have

         group by id

having _yr >=6

                   and max(yr)-min(yr)=_yr-1

                     ;

quit;


Haikuo

Respected Advisor
Posts: 4,923

Re: To ascertain/test "consecutiveness" of variable values by ids

This doesn't work when a subrange of 6 years is present, as in

data have;

input id yr;

cards;

2     1992

2     1994

2     1996

2     1997

2     1997

2     1998

2     1999

2     2000

2     2001

2     2001

;

PG

PG
Respected Advisor
Posts: 3,156

Re: To ascertain/test "consecutiveness" of variable values by ids

Yep, you are correct. I don't feel right myself, just don't know where and why. Thanks, PG.

Respected Advisor
Posts: 4,923

Re: To ascertain/test "consecutiveness" of variable values by ids

A SQL solution:

proc sql;

create table want as

select id, max(ok) as consec

from (

     select y0.id, count(distinct y.yr)=5 as ok

     from

           have as y0 left join

           have as y on y0.id=y.id and y.yr between y0.yr+1 and y0.yr+5

     group by y0.id, y0.yr)

group by id;

   

select * from want;

quit;

PG

PG
Respected Advisor
Posts: 3,156

Re: To ascertain/test "consecutiveness" of variable values by ids

Hopefully this data step solution will be more universal. Given that the data is presorted:


data have;

input id     yr;

cards;

1     1990

1     1990

1     1991

1     1991

1     1992

1     1993

1     1993

1     1994

1     1996

2     1996

2     1997

2     1997

2     1998

2     1999

2     2000

2     2001

2     2001

3     1996

3     1997

3     1997

3     1998

3     1999

3     2000

3     2001

3     2001

;

data want;

do until (last.id);

  set have;

    by id yr;

        if first.id then do;_yr=yr;_ct=0;end;

dif=yr-_yr;

         if dif >1 then _ct=0;

             else if dif=1 then _ct+1;

if _ct >=5 then _flag=1;

_yr=yr;

end;

do until (last.id);

   set have;

     by id yr;

         if _flag=1 then output;

end;

drop _:;

run;

Occasional Contributor
Posts: 19

Re: To ascertain/test "consecutiveness" of variable values by ids

Thank you Reeza, Scott_Mitchell, Hai.kuo, PGStats.

I tried all the suggestions and it turns out I had use for each one of them. My problem was a little more complex than this since I also had desirable lab results intermingled with the consecutiveness (consecutive AND desired lab result for => 8 years.)

I found out all the "desirable" years first by tagging the good lab results and then counted those years only if they were consecutive.

Occasional Contributor
Posts: 19

Re: To ascertain/test "consecutiveness" of variable values by ids

Paper 89-28

Identifying continuity in Longitudinal Data

This paper describes a simple solution to tasks similar to one discussed above.

It identifies the first date at which the desired value is seen and then counts until the value stays desirable. It then tells us if the value stayed desirable for a pre defined period of time.

Thank you all.


Ask a Question
Discussion stats
  • 9 replies
  • 374 views
  • 3 likes
  • 5 in conversation