BookmarkSubscribeRSS Feed
marin1hk
Calcite | Level 5

I am trying to pull in longitudinal data, and keep those with greater than 3 observations. I am using a first. last. statement to keep IDs and create an observation counter to ensure that I am keeping only those with greater than 3 observations. When I do this, the last observation is kept, and when I try to merge this back into the data set, the first observation is overwritten by the last observation being pulled from the first. last. How do I keep the first observation or at least avoid overwriting? Here is the code I am using:

 

DATA UniqueID;
SET Dementia;
BY ID Age;
IF First.ID = 1 THEN OBSCNT = 0;
OBSCNT + 1;
IF Last.ID = 1 AND OBSCNT GE 3;
RUN;

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hello @marin1hk and welcome to the SAS Support Communities!

 

For this type of task ("select all observations from BY groups satisfying a condition which can be checked only at the end of the BY group [edit: or at least not at the beginning of the group]") many people use either PROC SQL or a double DOW loop. The latter comes closer to your existing approach, but requires only one DATA step (i.e. no intermediate dataset UniqueID):

data want(drop=obscnt);
do until(last.id);
  set dementia;
  by id;
  obscnt=sum(obscnt,1);
end;
do until(last.id);
  set dementia;
  by id;
  if obscnt ge 3 then output;
end;
run;

 

Edit: Here's what the PROC SQL solution might look like:

proc sql;
create table want as
select *
from dementia
group by id
having count(*) ge 3
order by id, age;
quit;

Unlike the DOW loop, PROC SQL doesn't require the input dataset to be sorted or indexed by the grouping variable.

mkeintz
PROC Star

You have data sorted by ID, and you want all records for any ID with 3 or more records.

 

Although it is a common technique, you don't really have to count records for an ID.  The counting strategy means you have to read every ID group twice.  The first time to generate a count, and the second time to re-read the id group and output records for ID's with counts of 3 or more.  That's one of the solutions offered by @FreelanceReinh.

 

Assuming you don't actually need to output a counter variable, you can just merge a record with the id of the second-following record.  Whenever the record-in-hand is the beginning of an ID, and the 2nd trailing record has a matching ID, you know you are at the start of an ID group that meets you condition.  In that case, just set variable _KEEPFLAG to "Y", (and let _KEEPFLAG retain its value throughout the ID group), and output only those records with _KEEPFLAG='Y'.

 

 

data want (drop=_:);
  set have (keep=id);
  by id;
  merge have
        have (firstobs=3 keep=id rename=(id=_nxt_id2));

  retain _keepflag;
  if first.id=1 then do;
    if _nxt_id2=id then _keepflag='Y';
    else _keepflag='N';
  end;
  if _keepflag='Y';
run;

 

BTW, you don't really need the extra "set have; by id;" statements.  You can also detect the start of an ID group using a test on LAG(ID) vs current ID, as in:

 

data want (drop=_:);
  merge have
        have (firstobs=3 keep=id rename=(id=_nxt_id2));

  retain _keepflag;
  if id ^= lag(id) then do;
    if _nxt_id2=id then _keepflag='Y';
    else _keepflag='N';
  end;
  if _keepflag='Y';
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2092 views
  • 1 like
  • 3 in conversation