Help using Base SAS procedures

SAS Question

Accepted Solution Solved
Reply
Super Contributor
Posts: 1,041
Accepted Solution

SAS Question

Hi,

I have the following table.

Id is the person

enc is the different times the person came to see the physician

in this table we have duplicates of ENC because they were pulled from different sources and clubbed together

WE WANT ALL THE DIFFERENT CODES FOR A PATIENT (the latest encounter ) with the respective encounter dates...

ID    ENC   ENC_DATE  code1    code2    code3    code4

500  233   2009-03-16   250.83        .           .           .

500  251   2009-03-17   250.00        .           .           .

500  130   2009-04-15   250.01        .           .           .

500  092   2009-06-22   250.00        .           .           .

500  848   2009-09-25   249.40       .            .          .

500  894   2010-04-09   250.83        .            .           .

500  894   2010-04-09   250.00        .            .          .

500  273   2010-06-09   250.83  249.00        .           .

500  273   2010-06-09   249.01         .           .           .

500  957   2010-09-08   249.00       .           .             .

500  957   2010-09-08   250.80        .           .             .

500  225   2011-01-31   250.83  249.00  249.00        .

500  225   2011-01-31   250.00    .                .            .

500  836   2011-07-08   250.83  249.00  249.00         .

500  836   2011-07-08   249.00    .                 .           .

500  202   2011-10-04   249.00  249.00  250.83  249.00

500  202   2011-10-04   250.00       .            .              .

500  443   2011-10-27   249.00  249.00  249.00         .

500  443   2011-10-27   249.00      .              .             .

500  932   2012-04-16   250.83  249.00  249.00       .

500  932   2012-04-16   250.00          .        .              .

500  830   2012-08-03   250.83  249.00  249.00        .

500  830   2012-08-03   249.00          .        .              .

500  598   2012-11-01   250.83  249.00  249.00        .

500  598   2012-11-01   249.00         .        .              .

500  350   2013-02-04   249.00          .        .              .

500  350   2013-02-04   249.00          .         .             .

500  297   2013-02-04   249.00          .         .             .

WANT

500  297   2013-02-04   249.00-------------latest enc with this code

500  598   2012-11-01   250.83-------------latest enc with this code

500  932   2012-04-16   250.00-------------latest enc with this code

500  957   2010-09-08   250.80-------------latest enc with this code

500  273   2010-06-09   249.01-------------latest enc with this code

500  848   2009-09-25   249.40-------------latest enc with this code

500  130   2009-04-15   250.01-------------latest enc with this code

Thanks


Accepted Solutions
Solution
‎11-09-2013 05:30 PM
Super User
Super User
Posts: 7,039

Re: SAS Question

Posted in reply to robertrao

1) Transpose your CODE1-CODE4 into a single column.  You can use PROC TRANSPOSE or a DATA step for that.

2) Sort by ID CODE ENC_DATE

3) Take the last per ID CODE.

data have ;

  length ID ENC ENC_DATE 8 code1-code4 $6 ;

  informat enc_date yymmdd10. ;

  format enc_date yymmdd10. ;

  input ID -- code4 ;

cards;

500 233 2009-03-16 250.83 . . .

500 251 2009-03-17 250.00 . . .

500 130 2009-04-15 250.01 . . .

500 092 2009-06-22 250.00 . . .

500 848 2009-09-25 249.40 . . .

500 894 2010-04-09 250.83 . . .

500 894 2010-04-09 250.00 . . .

500 273 2010-06-09 250.83 249.00 . .

500 273 2010-06-09 249.01 . . .

500 957 2010-09-08 249.00 . . .

500 957 2010-09-08 250.80 . . .

500 225 2011-01-31 250.83 249.00 249.00 .

500 225 2011-01-31 250.00 . . .

500 836 2011-07-08 250.83 249.00 249.00 .

500 836 2011-07-08 249.00 . . .

500 202 2011-10-04 249.00 249.00 250.83 249.00

500 202 2011-10-04 250.00 . . .

500 443 2011-10-27 249.00 249.00 249.00 .

500 443 2011-10-27 249.00 . . .

500 932 2012-04-16 250.83 249.00 249.00 .

500 932 2012-04-16 250.00 . . .

500 830 2012-08-03 250.83 249.00 249.00 .

500 830 2012-08-03 249.00 . . .

500 598 2012-11-01 250.83 249.00 249.00 .

500 598 2012-11-01 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 297 2013-02-04 249.00 . . .

run;


data tall ;

  set have ;

  array c code1-code4 ;

  do over c;

   code=c;

   if code ne ' ' then output;

  end;

  drop code1-code4 ;

run;


proc sort ;

  by id code enc_date ;

run;

data want ;

  set tall ;

  by id code ;

  if last.code;

run;

Obs     ID    ENC      ENC_DATE     code

1     500    297    2013-02-04    249.00

2     500    273    2010-06-09    249.01

3     500    848    2009-09-25    249.40

4     500    932    2012-04-16    250.00

5     500    130    2009-04-15    250.01

6     500    957    2010-09-08    250.80

7     500    598    2012-11-01    250.83

View solution in original post


All Replies
PROC Star
Posts: 7,468

Re: SAS Question

Posted in reply to robertrao

A couple of questions:

1. Why do you drop all of the other ENCs?

2. When there is a tie for the data what criteria do you choose to select the record?

3. Why do you drop codes 2, 3 and 4?

Super Contributor
Posts: 1,041

Re: SAS Question

Art,

Thanks for the reply.

1) we drop all other encounters because we wanted a particular code with the latest encounter.....if he has the same code in the prior encounters it means he has the same disease...so we don't go for it in the prior encounters.

2)I dint understand the tie criteria you asked !!could you elaborate...possibly with an example??

3)we are not dropping codes 2,3 and 4 from nalysis..its only that in my example a new code with the latest encounter was not present in the 2nd 3rd or 4th position

Thanks

PROC Star
Posts: 7,468

Re: SAS Question

Posted in reply to robertrao

1. you keep:  500  297   2013-02-04   249.00          .         .             .

but you drop: 500  233   2009-03-16   250.83     

I don't understand the rules that would explain the above


2.  given:

500  598   2012-11-01   250.83  249.00  249.00        .

500  598   2012-11-01   249.00         .        .              .

you keep: 500  598   2012-11-01   250.83

why do you select one over the other

3.  Given the example I provided in 2, above, you kept the first record's value for code1, but dropped the other codes.  Please explain

          .

Super Contributor
Posts: 1,041

Re: SAS Question

1)  I kept 500  297   2013-02-04   249.00          .         .            

.because this is the latest in the list provided when he was diagnosed with 249.00(by the date). Any other 249.00 prior to this date will be ignored since he is the same patient and only the encounters are differing .


I  dropped: 500  233   2009-03-16   250.83      because I already captured  500  598   2012-11-01   250.83(this being the latest encounter with this code)

2)  there are only two unique codes here 249.00 and 250.83.

249.00 was already captured in the 500  297   2013-02-04   249.00( latest for that code across any record). so we will ignore the ( 500  598   2012-11-01   249.00) completely

so we scan the next lower date (500  598   2012-11-01   250.83  249.00  249.00) and have to take only the 250.83 because the 249.00 was already captured for this patient

Regards

Solution
‎11-09-2013 05:30 PM
Super User
Super User
Posts: 7,039

Re: SAS Question

Posted in reply to robertrao

1) Transpose your CODE1-CODE4 into a single column.  You can use PROC TRANSPOSE or a DATA step for that.

2) Sort by ID CODE ENC_DATE

3) Take the last per ID CODE.

data have ;

  length ID ENC ENC_DATE 8 code1-code4 $6 ;

  informat enc_date yymmdd10. ;

  format enc_date yymmdd10. ;

  input ID -- code4 ;

cards;

500 233 2009-03-16 250.83 . . .

500 251 2009-03-17 250.00 . . .

500 130 2009-04-15 250.01 . . .

500 092 2009-06-22 250.00 . . .

500 848 2009-09-25 249.40 . . .

500 894 2010-04-09 250.83 . . .

500 894 2010-04-09 250.00 . . .

500 273 2010-06-09 250.83 249.00 . .

500 273 2010-06-09 249.01 . . .

500 957 2010-09-08 249.00 . . .

500 957 2010-09-08 250.80 . . .

500 225 2011-01-31 250.83 249.00 249.00 .

500 225 2011-01-31 250.00 . . .

500 836 2011-07-08 250.83 249.00 249.00 .

500 836 2011-07-08 249.00 . . .

500 202 2011-10-04 249.00 249.00 250.83 249.00

500 202 2011-10-04 250.00 . . .

500 443 2011-10-27 249.00 249.00 249.00 .

500 443 2011-10-27 249.00 . . .

500 932 2012-04-16 250.83 249.00 249.00 .

500 932 2012-04-16 250.00 . . .

500 830 2012-08-03 250.83 249.00 249.00 .

500 830 2012-08-03 249.00 . . .

500 598 2012-11-01 250.83 249.00 249.00 .

500 598 2012-11-01 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 297 2013-02-04 249.00 . . .

run;


data tall ;

  set have ;

  array c code1-code4 ;

  do over c;

   code=c;

   if code ne ' ' then output;

  end;

  drop code1-code4 ;

run;


proc sort ;

  by id code enc_date ;

run;

data want ;

  set tall ;

  by id code ;

  if last.code;

run;

Obs     ID    ENC      ENC_DATE     code

1     500    297    2013-02-04    249.00

2     500    273    2010-06-09    249.01

3     500    848    2009-09-25    249.40

4     500    932    2012-04-16    250.00

5     500    130    2009-04-15    250.01

6     500    957    2010-09-08    250.80

7     500    598    2012-11-01    250.83

Super Contributor
Posts: 1,041

Re: SAS Question

Tom,

Your code seems pleasing for the example I posted

But if you consider this below example separately for ID 501

501 830 2012-08-03 249.00 . . .

501 598 2012-11-01 250.83 263.00 271.00 .

501 598 2012-11-01 249.00 . . .

If you take the last of the ID and CODE U WILL MISS THE CODES 250.83 AND 263.00

hOW WOULD YOU ACCOUNT FOR THIS???

THANKS

Super User
Super User
Posts: 7,039

Re: SAS Question

Posted in reply to robertrao

Did you try it? It worked for me.  Note that it is finding the last date AFTER it has transposed the data.

Obs     ID    ENC      ENC_DATE     code

  1    500    297    2013-02-04    249.00

  2    500    273    2010-06-09    249.01

  3    500    848    2009-09-25    249.40

  4    500    932    2012-04-16    250.00

  5    500    130    2009-04-15    250.01

  6    500    957    2010-09-08    250.80

  7    500    598    2012-11-01    250.83

  8    501    598    2012-11-01    249.00

  9    501    598    2012-11-01    250.83

10    501    598    2012-11-01    263.00

11    501    598    2012-11-01    271.00

Super Contributor
Super Contributor
Posts: 444

Re: SAS Question

hey Tom

how would you apply proc transpose in this problem?

i used this below but the codes still appear in 2 columns Smiley Happy

proc sort;by id enc enc_date ;

proc transpose;

var code1-code4;

by id enc enc_date ;

Super User
Super User
Posts: 7,039

Re: SAS Question

You are getting more than one column because you have duplicate rows for the same set of BY values.

500 273 2010-06-09 250.83 249.00 . .

500 273 2010-06-09 249.01 . . .

Super Contributor
Posts: 1,041

Re: SAS Question

Does it not have to be:

"Note that it is finding the last code AFTER it has transposed the data."

instead of "Note that it is finding the last date AFTER it has transposed the data."

Thanks

Super User
Super User
Posts: 7,039

Re: SAS Question

Posted in reply to robertrao

I think you are talking semantics there.  My understanding of the need was to find the maximum (last) date that any particular diagnosis code was encountered for each subject.  So when you sort by subject/code/date you will want the record where LAST.CODE is True as this will have the maximum date for that particular code.

Respected Advisor
Posts: 3,156

Re: SAS Question

Basically the same idea with Tom, only to do the transpose and filtering using Hash:

data have ;

  length ID ENC ENC_DATE 8 code1-code4 $6 ;

  informat enc_date yymmdd10. ;

  format enc_date yymmdd10. ;

  input ID -- code4 ;

cards;

500 233 2009-03-16 250.83 . . .

500 251 2009-03-17 250.00 . . .

500 130 2009-04-15 250.01 . . .

500 092 2009-06-22 250.00 . . .

500 848 2009-09-25 249.40 . . .

500 894 2010-04-09 250.83 . . .

500 894 2010-04-09 250.00 . . .

500 273 2010-06-09 250.83 249.00 . .

500 273 2010-06-09 249.01 . . .

500 957 2010-09-08 249.00 . . .

500 957 2010-09-08 250.80 . . .

500 225 2011-01-31 250.83 249.00 249.00 .

500 225 2011-01-31 250.00 . . .

500 836 2011-07-08 250.83 249.00 249.00 .

500 836 2011-07-08 249.00 . . .

500 202 2011-10-04 249.00 249.00 250.83 249.00

500 202 2011-10-04 250.00 . . .

500 443 2011-10-27 249.00 249.00 249.00 .

500 443 2011-10-27 249.00 . . .

500 932 2012-04-16 250.83 249.00 249.00 .

500 932 2012-04-16 250.00 . . .

500 830 2012-08-03 250.83 249.00 249.00 .

500 830 2012-08-03 249.00 . . .

500 598 2012-11-01 250.83 249.00 249.00 .

500 598 2012-11-01 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 297 2013-02-04 249.00 . . .

run;

proc sort data=have;

by id enc_date;

run;

data _null_;

  if _n_=1 then do;

    declare hash h(ordered:'d');

  h.definekey('id', 'code');

  h.definedata('id','enc_date','enc','code');

  h.definedone();

  end;

length code $6;

  set have end=last;

   array c code1-code4;

   do over c;

     if not missing(c) then do; code=c;rc=h.replace();end;

  end;

  if last then do; rc=h.output(dataset:'want');stop;end;

run;

Haikuo

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 426 views
  • 5 likes
  • 5 in conversation