My data looks like this
id yr
1 1990
1 1990
1 1991
1 1991
1 1992
1 1993
1 1993
1 1994
1 1996
2 1996
2 1997
2 1997
2 1998
2 1999
2 2000
2 2001
2 2001
I want SAS to tell me if the given "id" had at least 6 consecutive years (of say, some test done.) In this case, id 1 has year 1995 missing but id 2 fits our bill.
I tried subtracting the min of the year from the max of the year for a given id but that might give erroneous answers if total years are 7 and even 1 year is missing. The subtraction will still be 6.
Thanks.
Here's a long way to do it....
data have;
input id yr;
cards;
1 1990
1 1990
1 1991
1 1991
1 1992
1 1993
1 1993
1 1994
1 1996
2 1996
2 1997
2 1997
2 1998
2 1999
2 2000
2 2001
2 2001
;
run;
proc sort data=have; by id yr;
data want1;
set have;
by id yr;
retain count;
lag_yr=lag(yr);
if yr=lag_yr+1 then count+1;
else if yr=lag_yr then count=count;
else if yr ne lag_yr then count=1;
if first.id then do;
lag_yr=.;
count=1;
end;
run;
proc sql;
select distinct(id) from want1
where count>=6;
quit;
The following is using a HASH OBJECT. By default the HASH OBJECT does not allow for duplicate key values. As a result you process and deduplicate in a single step.
DATA _NULL_;
DECLARE HASH HH(ORDERED:'YES');
HH.DEFINEKEY('ID');
HH.DEFINEDATA('ID','CONSEC_YRS');
HH.DEFINEDONE();
DO I = 1 BY 1 UNTIL(DONE);
SET HAVE END=DONE;
BY ID YR;
RETAIN CONSEC_YRS;
IF FIRST.ID THEN
DO;
YR1 = 0;
CONSEC_YRS = 1;
END;
YR1 = LAG(YR);
YR_INCREMENT = SUM(YR,-YR1);
IF YR_INCREMENT = 1 THEN CONSEC_YRS+1;
IF YR_INCREMENT > 1 THEN CONSEC_YRS = 1;
IF CONSEC_YRS >= 6 THEN DO;
RC = HH.CHECK();
IF RC ~= 0 THEN RC1 = HH.ADD();
ELSE IF RC = 0 THEN RC2 = HH.REPLACE();
END;
END;
HH.OUTPUT(DATASET:'WANT');
RUN;
Here is a Proc SQL way:
data have;
input id yr;
cards;
1 1990
1 1990
1 1991
1 1991
1 1992
1 1993
1 1993
1 1994
1 1996
2 1996
2 1997
2 1997
2 1998
2 1999
2 2000
2 2001
2 2001
;
proc sql;
create table want (drop=_yr) as
select *, count(distinct yr) as _yr from have
group by id
having _yr >=6
and max(yr)-min(yr)=_yr-1
;
quit;
Haikuo
Yep, you are correct. I don't feel right myself, just don't know where and why. Thanks, PG.
A SQL solution:
proc sql;
create table want as
select id, max(ok) as consec
from (
select y0.id, count(distinct y.yr)=5 as ok
from
have as y0 left join
have as y on y0.id=y.id and y.yr between y0.yr+1 and y0.yr+5
group by y0.id, y0.yr)
group by id;
select * from want;
quit;
PG
Hopefully this data step solution will be more universal. Given that the data is presorted:
data have;
input id yr;
cards;
1 1990
1 1990
1 1991
1 1991
1 1992
1 1993
1 1993
1 1994
1 1996
2 1996
2 1997
2 1997
2 1998
2 1999
2 2000
2 2001
2 2001
3 1996
3 1997
3 1997
3 1998
3 1999
3 2000
3 2001
3 2001
;
data want;
do until (last.id);
set have;
by id yr;
if first.id then do;_yr=yr;_ct=0;end;
dif=yr-_yr;
if dif >1 then _ct=0;
else if dif=1 then _ct+1;
if _ct >=5 then _flag=1;
_yr=yr;
end;
do until (last.id);
set have;
by id yr;
if _flag=1 then output;
end;
drop _:;
run;
Thank you Reeza, Scott_Mitchell, Hai.kuo, PGStats.
I tried all the suggestions and it turns out I had use for each one of them. My problem was a little more complex than this since I also had desirable lab results intermingled with the consecutiveness (consecutive AND desired lab result for => 8 years.)
I found out all the "desirable" years first by tagging the good lab results and then counted those years only if they were consecutive.
Paper 89-28
Identifying continuity in Longitudinal Data
This paper describes a simple solution to tasks similar to one discussed above.
It identifies the first date at which the desired value is seen and then counts until the value stays desirable. It then tells us if the value stayed desirable for a pre defined period of time.
Thank you all.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.