topic How to save first observation from first. last. in SAS Data Management

How to save first observation from first. last.

marin1hk — Wed, 06 Nov 2019 17:50:08 GMT

I am trying to pull in longitudinal data, and keep those with greater than 3 observations. I am using a first. last. statement to keep IDs and create an observation counter to ensure that I am keeping only those with greater than 3 observations. When I do this, the last observation is kept, and when I try to merge this back into the data set, the first observation is overwritten by the last observation being pulled from the first. last. How do I keep the first observation or at least avoid overwriting? Here is the code I am using:

DATA UniqueID;
SET Dementia;
BY ID Age;
IF First.ID = 1 THEN OBSCNT = 0;
OBSCNT + 1;
IF Last.ID = 1 AND OBSCNT GE 3;
RUN;

Re: How to save first observation from first. last.

FreelanceReinh — Wed, 06 Nov 2019 19:05:24 GMT

Hello @marin1hk and welcome to the SAS Support Communities!

For this type of task ("select all observations from BY groups satisfying a condition which can be checked only at the end of the BY group [edit: or at least not at the beginning of the group]") many people use either PROC SQL or a double DOW loop. The latter comes closer to your existing approach, but requires only one DATA step (i.e. no intermediate dataset UniqueID):

data want(drop=obscnt);
do until(last.id);
  set dementia;
  by id;
  obscnt=sum(obscnt,1);
end;
do until(last.id);
  set dementia;
  by id;
  if obscnt ge 3 then output;
end;
run;

Edit: Here's what the PROC SQL solution might look like:

proc sql;
create table want as
select *
from dementia
group by id
having count(*) ge 3
order by id, age;
quit;

Unlike the DOW loop, PROC SQL doesn't require the input dataset to be sorted or indexed by the grouping variable.

Re: How to save first observation from first. last.

mkeintz — Thu, 07 Nov 2019 22:14:34 GMT

You have data sorted by ID, and you want all records for any ID with 3 or more records.

Although it is a common technique, you don't really have to count records for an ID. The counting strategy means you have to read every ID group twice. The first time to generate a count, and the second time to re-read the id group and output records for ID's with counts of 3 or more. That's one of the solutions offered by @FreelanceReinh.

Assuming you don't actually need to output a counter variable, you can just merge a record with the id of the second-following record. Whenever the record-in-hand is the beginning of an ID, and the 2nd trailing record has a matching ID, you know you are at the start of an ID group that meets you condition. In that case, just set variable _KEEPFLAG to "Y", (and let _KEEPFLAG retain its value throughout the ID group), and output only those records with _KEEPFLAG='Y'.

data want (drop=_:);
  set have (keep=id);
  by id;
  merge have
        have (firstobs=3 keep=id rename=(id=_nxt_id2));

  retain _keepflag;
  if first.id=1 then do;
    if _nxt_id2=id then _keepflag='Y';
    else _keepflag='N';
  end;
  if _keepflag='Y';
run;

BTW, you don't really need the extra "set have; by id;" statements. You can also detect the start of an ID group using a test on LAG(ID) vs current ID, as in:

data want (drop=_:);
  merge have
        have (firstobs=3 keep=id rename=(id=_nxt_id2));

  retain _keepflag;
  if id ^= lag(id) then do;
    if _nxt_id2=id then _keepflag='Y';
    else _keepflag='N';
  end;
  if _keepflag='Y';
run;