BookmarkSubscribeRSS Feed
Jamerkin
Calcite | Level 5

Hello experts,

 

I would greatly appreciate your help with this issue I am having. I need to keep the first occurrence of code change that happens in my dataset.  I have created a sample of the data I am working with below. For ID 001 we want the SEQ 3 for it. This is the first time they were changed to their current code which you can see they are still on with their recent "review" from observation 1 and date of 05/04/2014. For 002 we would need to keep SEQ 2 for them.

 

I have tried to sort and duplicate out the observation not in the first group of 'ST' but the issue I have is if in their history they were on that code before the older dates get mixed in and those are two separate occurrences. I have no data point to separate the occurrence other than a different status happening in between, such as we see with 'GP' on observation 4. I also tried to count the number of ST's in the group and stop when it reaches GP but I had no such look due to needing a by statement and the sort has the issues mentioned above. 

 

In short, I need a piece of code that can identify the first occurrence of their most recent Code, here it is ST but I also have SP and DD. For ID 001 observation line 3 is the one needed to keep and for 002 observation line 7 is the one to keep.

 

Thanks for any help I can receive. 

 

P.s. Do note the format of the date is ddmmyy.

 

 

data want;
input ID SEQ Code $ date :ddmmyy10.;
format date ddmmyy10.;
datalines;
001 1 ST 05/04/2014
001 2 ST 05/01/2014
001 3 ST 04/01/2014
001 4 GP 02/05/2014
001 5 ST 02/01/2014
002 1 ST 05/01/2014
002 2 ST 03/02/2014
002 3 GP 02/01/2014
002 4 GP 02/01/2014
;
run;

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

I don't see your mentioned sample data?

Jamerkin
Calcite | Level 5

Apologies. The insert code button was not working. I updated my post by just pasting my sample data at the bottom. 

PeterClemmensen
Tourmaline | Level 20

So your desired result here is a data set with 2 observations, correct?

Jamerkin
Calcite | Level 5

Correct I should end with observation 3 and 7. Mind you the full dataset is over 500,000 individuals so doing this manually was not an option.

 

Thanks!

mkeintz
PROC Star

If you use the NOTSORTED option (to accomodate "CODE" value that might either ascend or descend, then 

 

data have;
input ID SEQ Code $ date :ddmmyy10.;
format date ddmmyy10.;
datalines;
001 1 ST 05/04/2014
001 2 ST 05/01/2014
001 3 ST 04/01/2014
001 4 GP 02/05/2014
001 5 ST 02/01/2014
002 1 ST 05/01/2014
002 2 ST 03/02/2014
002 3 GP 02/01/2014
002 4 GP 02/01/2014
run;

data want (drop=_:) ;
  set have;
  by id code notsorted;
  if first.id=1 then _code_group=0;
  _code_group + (last.code=1);
  if last.code and _code_group=1;
  
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1183 views
  • 0 likes
  • 3 in conversation