BookmarkSubscribeRSS Feed
saurabhc
Fluorite | Level 6

My data looks like this

id     yr

1     1990

1     1990

1     1991

1     1991

1     1992

1     1993

1     1993

1     1994

1     1996

2     1996

2     1997

2     1997

2     1998

2     1999

2     2000

2     2001

2     2001

I want SAS to tell me if the given "id" had at least 6 consecutive years (of say, some test done.) In this case, id 1 has year 1995 missing but id 2 fits our bill.

I tried subtracting the min of the year from the max of the year for a given id but that might give erroneous answers if total years are 7 and even 1 year is missing. The subtraction will still be 6.

Thanks.

9 REPLIES 9
Reeza
Super User

Here's a long way to do it....



data have;

input id     yr;

cards;

1     1990

1     1990

1     1991

1     1991

1     1992

1     1993

1     1993

1     1994

1     1996

2     1996

2     1997

2     1997

2     1998

2     1999

2     2000

2     2001

2     2001

;

run;

proc sort data=have; by id yr;

data want1;

set have;

by id yr;

retain count;

lag_yr=lag(yr);

if yr=lag_yr+1 then count+1;

else if yr=lag_yr then count=count;

else if yr ne lag_yr then count=1;

if first.id then do;

  lag_yr=.;

  count=1;

end;

run;

proc sql;

  select distinct(id) from want1

  where count>=6;

quit;

Scott_Mitchell
Quartz | Level 8

The following is using a HASH OBJECT.  By default the HASH OBJECT does not allow for duplicate key values.  As a result you process and deduplicate in a single step.

DATA _NULL_;

DECLARE HASH HH(ORDERED:'YES');

HH.DEFINEKEY('ID');

HH.DEFINEDATA('ID','CONSEC_YRS');

HH.DEFINEDONE();

DO I = 1 BY 1 UNTIL(DONE);

  SET HAVE END=DONE;

  BY ID YR;

  RETAIN CONSEC_YRS;

  IF FIRST.ID THEN

  DO;

  YR1 = 0;

  CONSEC_YRS = 1;

  END;

  YR1 = LAG(YR);

  YR_INCREMENT = SUM(YR,-YR1);

  IF YR_INCREMENT = 1 THEN CONSEC_YRS+1;

  IF YR_INCREMENT > 1 THEN CONSEC_YRS = 1;

  IF CONSEC_YRS >= 6 THEN DO;

  RC = HH.CHECK();

  IF RC ~= 0 THEN RC1 = HH.ADD(); 

  ELSE IF RC = 0 THEN RC2 = HH.REPLACE();

  END;

END;

HH.OUTPUT(DATASET:'WANT');

RUN;

Haikuo
Onyx | Level 15

Here is a Proc SQL way:

data have;

input id     yr;

cards;

1     1990

1     1990

1     1991

1     1991

1     1992

1     1993

1     1993

1     1994

1     1996

2     1996

2     1997

2     1997

2     1998

2     1999

2     2000

2     2001

2     2001

;

proc sql;

  create table want (drop=_yr) as

    select *, count(distinct yr) as _yr from have

         group by id

having _yr >=6

                   and max(yr)-min(yr)=_yr-1

                     ;

quit;


Haikuo

PGStats
Opal | Level 21

This doesn't work when a subrange of 6 years is present, as in

data have;

input id yr;

cards;

2     1992

2     1994

2     1996

2     1997

2     1997

2     1998

2     1999

2     2000

2     2001

2     2001

;

PG

PG
Haikuo
Onyx | Level 15

Yep, you are correct. I don't feel right myself, just don't know where and why. Thanks, PG.

PGStats
Opal | Level 21

A SQL solution:

proc sql;

create table want as

select id, max(ok) as consec

from (

     select y0.id, count(distinct y.yr)=5 as ok

     from

           have as y0 left join

           have as y on y0.id=y.id and y.yr between y0.yr+1 and y0.yr+5

     group by y0.id, y0.yr)

group by id;

   

select * from want;

quit;

PG

PG
Haikuo
Onyx | Level 15

Hopefully this data step solution will be more universal. Given that the data is presorted:


data have;

input id     yr;

cards;

1     1990

1     1990

1     1991

1     1991

1     1992

1     1993

1     1993

1     1994

1     1996

2     1996

2     1997

2     1997

2     1998

2     1999

2     2000

2     2001

2     2001

3     1996

3     1997

3     1997

3     1998

3     1999

3     2000

3     2001

3     2001

;

data want;

do until (last.id);

  set have;

    by id yr;

        if first.id then do;_yr=yr;_ct=0;end;

dif=yr-_yr;

         if dif >1 then _ct=0;

             else if dif=1 then _ct+1;

if _ct >=5 then _flag=1;

_yr=yr;

end;

do until (last.id);

   set have;

     by id yr;

         if _flag=1 then output;

end;

drop _:;

run;

saurabhc
Fluorite | Level 6

Thank you Reeza, Scott_Mitchell, Hai.kuo, PGStats.

I tried all the suggestions and it turns out I had use for each one of them. My problem was a little more complex than this since I also had desirable lab results intermingled with the consecutiveness (consecutive AND desired lab result for => 8 years.)

I found out all the "desirable" years first by tagging the good lab results and then counted those years only if they were consecutive.

saurabhc
Fluorite | Level 6

Paper 89-28

Identifying continuity in Longitudinal Data

This paper describes a simple solution to tasks similar to one discussed above.

It identifies the first date at which the desired value is seen and then counts until the value stays desirable. It then tells us if the value stayed desirable for a pre defined period of time.

Thank you all.


sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1103 views
  • 3 likes
  • 5 in conversation