How to take the first observation in a grouping

Jamerkin · Posted 08-07-2024 09:21 AM

Hello experts,

I would greatly appreciate your help with this issue I am having. I need to keep the first occurrence of code change that happens in my dataset. I have created a sample of the data I am working with below. For ID 001 we want the SEQ 3 for it. This is the first time they were changed to their current code which you can see they are still on with their recent "review" from observation 1 and date of 05/04/2014. For 002 we would need to keep SEQ 2 for them.

I have tried to sort and duplicate out the observation not in the first group of 'ST' but the issue I have is if in their history they were on that code before the older dates get mixed in and those are two separate occurrences. I have no data point to separate the occurrence other than a different status happening in between, such as we see with 'GP' on observation 4. I also tried to count the number of ST's in the group and stop when it reaches GP but I had no such look due to needing a by statement and the sort has the issues mentioned above.

In short, I need a piece of code that can identify the first occurrence of their most recent Code, here it is ST but I also have SP and DD. For ID 001 observation line 3 is the one needed to keep and for 002 observation line 7 is the one to keep.

Thanks for any help I can receive.

P.s. Do note the format of the date is ddmmyy.

data want;
input ID SEQ Code $ date :ddmmyy10.;
format date ddmmyy10.;
datalines;
001 1 ST 05/04/2014
001 2 ST 05/01/2014
001 3 ST 04/01/2014
001 4 GP 02/05/2014
001 5 ST 02/01/2014
002 1 ST 05/01/2014
002 2 ST 03/02/2014
002 3 GP 02/01/2014
002 4 GP 02/01/2014
;
run;

PeterClemmensen · Posted 08-07-2024 09:26 AM

I don't see your mentioned sample data?

The DATA to DATA Step Macro
Blog: SASnrd

Jamerkin · Posted 08-07-2024 09:32 AM

Apologies. The insert code button was not working. I updated my post by just pasting my sample data at the bottom.

PeterClemmensen · Posted 08-07-2024 09:40 AM

So your desired result here is a data set with 2 observations, correct?

The DATA to DATA Step Macro
Blog: SASnrd

Jamerkin · Posted 08-07-2024 09:45 AM

Correct I should end with observation 3 and 7. Mind you the full dataset is over 500,000 individuals so doing this manually was not an option.

Thanks!

mkeintz · Posted 08-07-2024 11:08 PM

If you use the NOTSORTED option (to accomodate "CODE" value that might either ascend or descend, then

data have;
input ID SEQ Code $ date :ddmmyy10.;
format date ddmmyy10.;
datalines;
001 1 ST 05/04/2014
001 2 ST 05/01/2014
001 3 ST 04/01/2014
001 4 GP 02/05/2014
001 5 ST 02/01/2014
002 1 ST 05/01/2014
002 2 ST 03/02/2014
002 3 GP 02/01/2014
002 4 GP 02/01/2014
run;

data want (drop=_:) ;
  set have;
  by id code notsorted;
  if first.id=1 then _code_group=0;
  _code_group + (last.code=1);
  if last.code and _code_group=1;
  
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

How to take the first observation in a grouping

Re: How to take the first observation in a grouping

Re: How to take the first observation in a grouping

Re: How to take the first observation in a grouping

Re: How to take the first observation in a grouping

Re: How to take the first observation in a grouping

Registration is open

How to take the first observation in a grouping

Re: How to take the first observation in a grouping

Re: How to take the first observation in a grouping

Re: How to take the first observation in a grouping

Re: How to take the first observation in a grouping

Re: How to take the first observation in a grouping

Registration is open

SAS Training: Just a Click Away