BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
robertrao
Quartz | Level 8

Hi,

I have the following table.

Id is the person

enc is the different times the person came to see the physician

in this table we have duplicates of ENC because they were pulled from different sources and clubbed together

WE WANT ALL THE DIFFERENT CODES FOR A PATIENT (the latest encounter ) with the respective encounter dates...

ID    ENC   ENC_DATE  code1    code2    code3    code4

500  233   2009-03-16   250.83        .           .           .

500  251   2009-03-17   250.00        .           .           .

500  130   2009-04-15   250.01        .           .           .

500  092   2009-06-22   250.00        .           .           .

500  848   2009-09-25   249.40       .            .          .

500  894   2010-04-09   250.83        .            .           .

500  894   2010-04-09   250.00        .            .          .

500  273   2010-06-09   250.83  249.00        .           .

500  273   2010-06-09   249.01         .           .           .

500  957   2010-09-08   249.00       .           .             .

500  957   2010-09-08   250.80        .           .             .

500  225   2011-01-31   250.83  249.00  249.00        .

500  225   2011-01-31   250.00    .                .            .

500  836   2011-07-08   250.83  249.00  249.00         .

500  836   2011-07-08   249.00    .                 .           .

500  202   2011-10-04   249.00  249.00  250.83  249.00

500  202   2011-10-04   250.00       .            .              .

500  443   2011-10-27   249.00  249.00  249.00         .

500  443   2011-10-27   249.00      .              .             .

500  932   2012-04-16   250.83  249.00  249.00       .

500  932   2012-04-16   250.00          .        .              .

500  830   2012-08-03   250.83  249.00  249.00        .

500  830   2012-08-03   249.00          .        .              .

500  598   2012-11-01   250.83  249.00  249.00        .

500  598   2012-11-01   249.00         .        .              .

500  350   2013-02-04   249.00          .        .              .

500  350   2013-02-04   249.00          .         .             .

500  297   2013-02-04   249.00          .         .             .

WANT

500  297   2013-02-04   249.00-------------latest enc with this code

500  598   2012-11-01   250.83-------------latest enc with this code

500  932   2012-04-16   250.00-------------latest enc with this code

500  957   2010-09-08   250.80-------------latest enc with this code

500  273   2010-06-09   249.01-------------latest enc with this code

500  848   2009-09-25   249.40-------------latest enc with this code

500  130   2009-04-15   250.01-------------latest enc with this code

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

1) Transpose your CODE1-CODE4 into a single column.  You can use PROC TRANSPOSE or a DATA step for that.

2) Sort by ID CODE ENC_DATE

3) Take the last per ID CODE.

data have ;

  length ID ENC ENC_DATE 8 code1-code4 $6 ;

  informat enc_date yymmdd10. ;

  format enc_date yymmdd10. ;

  input ID -- code4 ;

cards;

500 233 2009-03-16 250.83 . . .

500 251 2009-03-17 250.00 . . .

500 130 2009-04-15 250.01 . . .

500 092 2009-06-22 250.00 . . .

500 848 2009-09-25 249.40 . . .

500 894 2010-04-09 250.83 . . .

500 894 2010-04-09 250.00 . . .

500 273 2010-06-09 250.83 249.00 . .

500 273 2010-06-09 249.01 . . .

500 957 2010-09-08 249.00 . . .

500 957 2010-09-08 250.80 . . .

500 225 2011-01-31 250.83 249.00 249.00 .

500 225 2011-01-31 250.00 . . .

500 836 2011-07-08 250.83 249.00 249.00 .

500 836 2011-07-08 249.00 . . .

500 202 2011-10-04 249.00 249.00 250.83 249.00

500 202 2011-10-04 250.00 . . .

500 443 2011-10-27 249.00 249.00 249.00 .

500 443 2011-10-27 249.00 . . .

500 932 2012-04-16 250.83 249.00 249.00 .

500 932 2012-04-16 250.00 . . .

500 830 2012-08-03 250.83 249.00 249.00 .

500 830 2012-08-03 249.00 . . .

500 598 2012-11-01 250.83 249.00 249.00 .

500 598 2012-11-01 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 297 2013-02-04 249.00 . . .

run;


data tall ;

  set have ;

  array c code1-code4 ;

  do over c;

   code=c;

   if code ne ' ' then output;

  end;

  drop code1-code4 ;

run;


proc sort ;

  by id code enc_date ;

run;

data want ;

  set tall ;

  by id code ;

  if last.code;

run;

Obs     ID    ENC      ENC_DATE     code

1     500    297    2013-02-04    249.00

2     500    273    2010-06-09    249.01

3     500    848    2009-09-25    249.40

4     500    932    2012-04-16    250.00

5     500    130    2009-04-15    250.01

6     500    957    2010-09-08    250.80

7     500    598    2012-11-01    250.83

View solution in original post

12 REPLIES 12
art297
Opal | Level 21

A couple of questions:

1. Why do you drop all of the other ENCs?

2. When there is a tie for the data what criteria do you choose to select the record?

3. Why do you drop codes 2, 3 and 4?

robertrao
Quartz | Level 8

Art,

Thanks for the reply.

1) we drop all other encounters because we wanted a particular code with the latest encounter.....if he has the same code in the prior encounters it means he has the same disease...so we don't go for it in the prior encounters.

2)I dint understand the tie criteria you asked !!could you elaborate...possibly with an example??

3)we are not dropping codes 2,3 and 4 from nalysis..its only that in my example a new code with the latest encounter was not present in the 2nd 3rd or 4th position

Thanks

art297
Opal | Level 21

1. you keep:  500  297   2013-02-04   249.00          .         .             .

but you drop: 500  233   2009-03-16   250.83     

I don't understand the rules that would explain the above


2.  given:

500  598   2012-11-01   250.83  249.00  249.00        .

500  598   2012-11-01   249.00         .        .              .

you keep: 500  598   2012-11-01   250.83

why do you select one over the other

3.  Given the example I provided in 2, above, you kept the first record's value for code1, but dropped the other codes.  Please explain

          .

robertrao
Quartz | Level 8

1)  I kept 500  297   2013-02-04   249.00          .         .            

.because this is the latest in the list provided when he was diagnosed with 249.00(by the date). Any other 249.00 prior to this date will be ignored since he is the same patient and only the encounters are differing .


I  dropped: 500  233   2009-03-16   250.83      because I already captured  500  598   2012-11-01   250.83(this being the latest encounter with this code)

2)  there are only two unique codes here 249.00 and 250.83.

249.00 was already captured in the 500  297   2013-02-04   249.00( latest for that code across any record). so we will ignore the ( 500  598   2012-11-01   249.00) completely

so we scan the next lower date (500  598   2012-11-01   250.83  249.00  249.00) and have to take only the 250.83 because the 249.00 was already captured for this patient

Regards

Tom
Super User Tom
Super User

1) Transpose your CODE1-CODE4 into a single column.  You can use PROC TRANSPOSE or a DATA step for that.

2) Sort by ID CODE ENC_DATE

3) Take the last per ID CODE.

data have ;

  length ID ENC ENC_DATE 8 code1-code4 $6 ;

  informat enc_date yymmdd10. ;

  format enc_date yymmdd10. ;

  input ID -- code4 ;

cards;

500 233 2009-03-16 250.83 . . .

500 251 2009-03-17 250.00 . . .

500 130 2009-04-15 250.01 . . .

500 092 2009-06-22 250.00 . . .

500 848 2009-09-25 249.40 . . .

500 894 2010-04-09 250.83 . . .

500 894 2010-04-09 250.00 . . .

500 273 2010-06-09 250.83 249.00 . .

500 273 2010-06-09 249.01 . . .

500 957 2010-09-08 249.00 . . .

500 957 2010-09-08 250.80 . . .

500 225 2011-01-31 250.83 249.00 249.00 .

500 225 2011-01-31 250.00 . . .

500 836 2011-07-08 250.83 249.00 249.00 .

500 836 2011-07-08 249.00 . . .

500 202 2011-10-04 249.00 249.00 250.83 249.00

500 202 2011-10-04 250.00 . . .

500 443 2011-10-27 249.00 249.00 249.00 .

500 443 2011-10-27 249.00 . . .

500 932 2012-04-16 250.83 249.00 249.00 .

500 932 2012-04-16 250.00 . . .

500 830 2012-08-03 250.83 249.00 249.00 .

500 830 2012-08-03 249.00 . . .

500 598 2012-11-01 250.83 249.00 249.00 .

500 598 2012-11-01 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 297 2013-02-04 249.00 . . .

run;


data tall ;

  set have ;

  array c code1-code4 ;

  do over c;

   code=c;

   if code ne ' ' then output;

  end;

  drop code1-code4 ;

run;


proc sort ;

  by id code enc_date ;

run;

data want ;

  set tall ;

  by id code ;

  if last.code;

run;

Obs     ID    ENC      ENC_DATE     code

1     500    297    2013-02-04    249.00

2     500    273    2010-06-09    249.01

3     500    848    2009-09-25    249.40

4     500    932    2012-04-16    250.00

5     500    130    2009-04-15    250.01

6     500    957    2010-09-08    250.80

7     500    598    2012-11-01    250.83

robertrao
Quartz | Level 8

Tom,

Your code seems pleasing for the example I posted

But if you consider this below example separately for ID 501

501 830 2012-08-03 249.00 . . .

501 598 2012-11-01 250.83 263.00 271.00 .

501 598 2012-11-01 249.00 . . .

If you take the last of the ID and CODE U WILL MISS THE CODES 250.83 AND 263.00

hOW WOULD YOU ACCOUNT FOR THIS???

THANKS

Tom
Super User Tom
Super User

Did you try it? It worked for me.  Note that it is finding the last date AFTER it has transposed the data.

Obs     ID    ENC      ENC_DATE     code

  1    500    297    2013-02-04    249.00

  2    500    273    2010-06-09    249.01

  3    500    848    2009-09-25    249.40

  4    500    932    2012-04-16    250.00

  5    500    130    2009-04-15    250.01

  6    500    957    2010-09-08    250.80

  7    500    598    2012-11-01    250.83

  8    501    598    2012-11-01    249.00

  9    501    598    2012-11-01    250.83

10    501    598    2012-11-01    263.00

11    501    598    2012-11-01    271.00

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

hey Tom

how would you apply proc transpose in this problem?

i used this below but the codes still appear in 2 columns Smiley Happy

proc sort;by id enc enc_date ;

proc transpose;

var code1-code4;

by id enc enc_date ;

Tom
Super User Tom
Super User

You are getting more than one column because you have duplicate rows for the same set of BY values.

500 273 2010-06-09 250.83 249.00 . .

500 273 2010-06-09 249.01 . . .

robertrao
Quartz | Level 8

Does it not have to be:

"Note that it is finding the last code AFTER it has transposed the data."

instead of "Note that it is finding the last date AFTER it has transposed the data."

Thanks

Tom
Super User Tom
Super User

I think you are talking semantics there.  My understanding of the need was to find the maximum (last) date that any particular diagnosis code was encountered for each subject.  So when you sort by subject/code/date you will want the record where LAST.CODE is True as this will have the maximum date for that particular code.

Haikuo
Onyx | Level 15

Basically the same idea with Tom, only to do the transpose and filtering using Hash:

data have ;

  length ID ENC ENC_DATE 8 code1-code4 $6 ;

  informat enc_date yymmdd10. ;

  format enc_date yymmdd10. ;

  input ID -- code4 ;

cards;

500 233 2009-03-16 250.83 . . .

500 251 2009-03-17 250.00 . . .

500 130 2009-04-15 250.01 . . .

500 092 2009-06-22 250.00 . . .

500 848 2009-09-25 249.40 . . .

500 894 2010-04-09 250.83 . . .

500 894 2010-04-09 250.00 . . .

500 273 2010-06-09 250.83 249.00 . .

500 273 2010-06-09 249.01 . . .

500 957 2010-09-08 249.00 . . .

500 957 2010-09-08 250.80 . . .

500 225 2011-01-31 250.83 249.00 249.00 .

500 225 2011-01-31 250.00 . . .

500 836 2011-07-08 250.83 249.00 249.00 .

500 836 2011-07-08 249.00 . . .

500 202 2011-10-04 249.00 249.00 250.83 249.00

500 202 2011-10-04 250.00 . . .

500 443 2011-10-27 249.00 249.00 249.00 .

500 443 2011-10-27 249.00 . . .

500 932 2012-04-16 250.83 249.00 249.00 .

500 932 2012-04-16 250.00 . . .

500 830 2012-08-03 250.83 249.00 249.00 .

500 830 2012-08-03 249.00 . . .

500 598 2012-11-01 250.83 249.00 249.00 .

500 598 2012-11-01 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 297 2013-02-04 249.00 . . .

run;

proc sort data=have;

by id enc_date;

run;

data _null_;

  if _n_=1 then do;

    declare hash h(ordered:'d');

  h.definekey('id', 'code');

  h.definedata('id','enc_date','enc','code');

  h.definedone();

  end;

length code $6;

  set have end=last;

   array c code1-code4;

   do over c;

     if not missing(c) then do; code=c;rc=h.replace();end;

  end;

  if last then do; rc=h.output(dataset:'want');stop;end;

run;

Haikuo

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1283 views
  • 5 likes
  • 5 in conversation