I am trying to pull in longitudinal data, and keep those with greater than 3 observations. I am using a first. last. statement to keep IDs and create an observation counter to ensure that I am keeping only those with greater than 3 observations. When I do this, the last observation is kept, and when I try to merge this back into the data set, the first observation is overwritten by the last observation being pulled from the first. last. How do I keep the first observation or at least avoid overwriting? Here is the code I am using:
DATA UniqueID;
SET Dementia;
BY ID Age;
IF First.ID = 1 THEN OBSCNT = 0;
OBSCNT + 1;
IF Last.ID = 1 AND OBSCNT GE 3;
RUN;
Hello @marin1hk and welcome to the SAS Support Communities!
For this type of task ("select all observations from BY groups satisfying a condition which can be checked only at the end of the BY group [edit: or at least not at the beginning of the group]") many people use either PROC SQL or a double DOW loop. The latter comes closer to your existing approach, but requires only one DATA step (i.e. no intermediate dataset UniqueID):
data want(drop=obscnt);
do until(last.id);
set dementia;
by id;
obscnt=sum(obscnt,1);
end;
do until(last.id);
set dementia;
by id;
if obscnt ge 3 then output;
end;
run;
Edit: Here's what the PROC SQL solution might look like:
proc sql;
create table want as
select *
from dementia
group by id
having count(*) ge 3
order by id, age;
quit;
Unlike the DOW loop, PROC SQL doesn't require the input dataset to be sorted or indexed by the grouping variable.
You have data sorted by ID, and you want all records for any ID with 3 or more records.
Although it is a common technique, you don't really have to count records for an ID. The counting strategy means you have to read every ID group twice. The first time to generate a count, and the second time to re-read the id group and output records for ID's with counts of 3 or more. That's one of the solutions offered by @FreelanceReinh.
Assuming you don't actually need to output a counter variable, you can just merge a record with the id of the second-following record. Whenever the record-in-hand is the beginning of an ID, and the 2nd trailing record has a matching ID, you know you are at the start of an ID group that meets you condition. In that case, just set variable _KEEPFLAG to "Y", (and let _KEEPFLAG retain its value throughout the ID group), and output only those records with _KEEPFLAG='Y'.
data want (drop=_:);
set have (keep=id);
by id;
merge have
have (firstobs=3 keep=id rename=(id=_nxt_id2));
retain _keepflag;
if first.id=1 then do;
if _nxt_id2=id then _keepflag='Y';
else _keepflag='N';
end;
if _keepflag='Y';
run;
BTW, you don't really need the extra "set have; by id;" statements. You can also detect the start of an ID group using a test on LAG(ID) vs current ID, as in:
data want (drop=_:);
merge have
have (firstobs=3 keep=id rename=(id=_nxt_id2));
retain _keepflag;
if id ^= lag(id) then do;
if _nxt_id2=id then _keepflag='Y';
else _keepflag='N';
end;
if _keepflag='Y';
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.