Solved: Re: SAS Question

robertrao · Posted 11-09-2013 01:28 PM

Hi,

I have the following table.

Id is the person

enc is the different times the person came to see the physician

in this table we have duplicates of ENC because they were pulled from different sources and clubbed together

WE WANT ALL THE DIFFERENT CODES FOR A PATIENT (the latest encounter ) with the respective encounter dates...

ID ENC ENC_DATE code1 code2 code3 code4

500 233 2009-03-16 250.83 . . .

500 251 2009-03-17 250.00 . . .

500 130 2009-04-15 250.01 . . .

500 092 2009-06-22 250.00 . . .

500 848 2009-09-25 249.40 . . .

500 894 2010-04-09 250.83 . . .

500 894 2010-04-09 250.00 . . .

500 273 2010-06-09 250.83 249.00 . .

500 273 2010-06-09 249.01 . . .

500 957 2010-09-08 249.00 . . .

500 957 2010-09-08 250.80 . . .

500 225 2011-01-31 250.83 249.00 249.00 .

500 225 2011-01-31 250.00 . . .

500 836 2011-07-08 250.83 249.00 249.00 .

500 836 2011-07-08 249.00 . . .

500 202 2011-10-04 249.00 249.00 250.83 249.00

500 202 2011-10-04 250.00 . . .

500 443 2011-10-27 249.00 249.00 249.00 .

500 443 2011-10-27 249.00 . . .

500 932 2012-04-16 250.83 249.00 249.00 .

500 932 2012-04-16 250.00 . . .

500 830 2012-08-03 250.83 249.00 249.00 .

500 830 2012-08-03 249.00 . . .

500 598 2012-11-01 250.83 249.00 249.00 .

500 598 2012-11-01 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 297 2013-02-04 249.00 . . .

WANT

500 297 2013-02-04 249.00-------------latest enc with this code

500 598 2012-11-01 250.83-------------latest enc with this code

500 932 2012-04-16 250.00-------------latest enc with this code

500 957 2010-09-08 250.80-------------latest enc with this code

500 273 2010-06-09 249.01-------------latest enc with this code

500 848 2009-09-25 249.40-------------latest enc with this code

500 130 2009-04-15 250.01-------------latest enc with this code

Thanks

Tom · Posted 11-09-2013 05:30 PM

1) Transpose your CODE1-CODE4 into a single column. You can use PROC TRANSPOSE or a DATA step for that.

2) Sort by ID CODE ENC_DATE

3) Take the last per ID CODE.

data have ;

length ID ENC ENC_DATE 8 code1-code4 $6 ;

informat enc_date yymmdd10. ;

format enc_date yymmdd10. ;

input ID -- code4 ;

cards;

500 233 2009-03-16 250.83 . . .

500 251 2009-03-17 250.00 . . .

500 130 2009-04-15 250.01 . . .

500 092 2009-06-22 250.00 . . .

500 848 2009-09-25 249.40 . . .

500 894 2010-04-09 250.83 . . .

500 894 2010-04-09 250.00 . . .

500 273 2010-06-09 250.83 249.00 . .

500 273 2010-06-09 249.01 . . .

500 957 2010-09-08 249.00 . . .

500 957 2010-09-08 250.80 . . .

500 225 2011-01-31 250.83 249.00 249.00 .

500 225 2011-01-31 250.00 . . .

500 836 2011-07-08 250.83 249.00 249.00 .

500 836 2011-07-08 249.00 . . .

500 202 2011-10-04 249.00 249.00 250.83 249.00

500 202 2011-10-04 250.00 . . .

500 443 2011-10-27 249.00 249.00 249.00 .

500 443 2011-10-27 249.00 . . .

500 932 2012-04-16 250.83 249.00 249.00 .

500 932 2012-04-16 250.00 . . .

500 830 2012-08-03 250.83 249.00 249.00 .

500 830 2012-08-03 249.00 . . .

500 598 2012-11-01 250.83 249.00 249.00 .

500 598 2012-11-01 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 297 2013-02-04 249.00 . . .

run;

data tall ;

set have ;

array c code1-code4 ;

do over c;

code=c;

if code ne ' ' then output;

end;

drop code1-code4 ;

run;

proc sort ;

by id code enc_date ;

run;

data want ;

set tall ;

by id code ;

if last.code;

run;

Obs ID ENC ENC_DATE code

1 500 297 2013-02-04 249.00

2 500 273 2010-06-09 249.01

3 500 848 2009-09-25 249.40

4 500 932 2012-04-16 250.00

5 500 130 2009-04-15 250.01

6 500 957 2010-09-08 250.80

7 500 598 2012-11-01 250.83

View solution in original post

art297 · Posted 11-09-2013 03:04 PM

A couple of questions:

1. Why do you drop all of the other ENCs?

2. When there is a tie for the data what criteria do you choose to select the record?

3. Why do you drop codes 2, 3 and 4?

robertrao · Posted 11-09-2013 03:27 PM

Art,

Thanks for the reply.

1) we drop all other encounters because we wanted a particular code with the latest encounter.....if he has the same code in the prior encounters it means he has the same disease...so we don't go for it in the prior encounters.

2)I dint understand the tie criteria you asked !!could you elaborate...possibly with an example??

3)we are not dropping codes 2,3 and 4 from nalysis..its only that in my example a new code with the latest encounter was not present in the 2nd 3rd or 4th position

Thanks

art297 · Posted 11-09-2013 05:09 PM

1. you keep: 500 297 2013-02-04 249.00 . . .

but you drop: 500 233 2009-03-16 250.83

I don't understand the rules that would explain the above

2. given:

500 598 2012-11-01 250.83 249.00 249.00 .

500 598 2012-11-01 249.00 . . .

you keep: 500 598 2012-11-01 250.83

why do you select one over the other

3. Given the example I provided in 2, above, you kept the first record's value for code1, but dropped the other codes. Please explain

.

robertrao · Posted 11-09-2013 05:43 PM

1) I kept 500 297 2013-02-04 249.00 . .

.because this is the latest in the list provided when he was diagnosed with 249.00(by the date). Any other 249.00 prior to this date will be ignored since he is the same patient and only the encounters are differing .

I dropped: 500 233 2009-03-16 250.83 because I already captured 500 598 2012-11-01 250.83(this being the latest encounter with this code)

2) there are only two unique codes here 249.00 and 250.83.

249.00 was already captured in the 500 297 2013-02-04 249.00( latest for that code across any record). so we will ignore the ( 500 598 2012-11-01 249.00) completely

so we scan the next lower date (500 598 2012-11-01 250.83 249.00 249.00) and have to take only the 250.83 because the 249.00 was already captured for this patient

Regards

Tom · Posted 11-09-2013 05:30 PM

1) Transpose your CODE1-CODE4 into a single column. You can use PROC TRANSPOSE or a DATA step for that.

2) Sort by ID CODE ENC_DATE

3) Take the last per ID CODE.

data have ;

length ID ENC ENC_DATE 8 code1-code4 $6 ;

informat enc_date yymmdd10. ;

format enc_date yymmdd10. ;

input ID -- code4 ;

cards;

500 233 2009-03-16 250.83 . . .

500 251 2009-03-17 250.00 . . .

500 130 2009-04-15 250.01 . . .

500 092 2009-06-22 250.00 . . .

500 848 2009-09-25 249.40 . . .

500 894 2010-04-09 250.83 . . .

500 894 2010-04-09 250.00 . . .

500 273 2010-06-09 250.83 249.00 . .

500 273 2010-06-09 249.01 . . .

500 957 2010-09-08 249.00 . . .

500 957 2010-09-08 250.80 . . .

500 225 2011-01-31 250.83 249.00 249.00 .

500 225 2011-01-31 250.00 . . .

500 836 2011-07-08 250.83 249.00 249.00 .

500 836 2011-07-08 249.00 . . .

500 202 2011-10-04 249.00 249.00 250.83 249.00

500 202 2011-10-04 250.00 . . .

500 443 2011-10-27 249.00 249.00 249.00 .

500 443 2011-10-27 249.00 . . .

500 932 2012-04-16 250.83 249.00 249.00 .

500 932 2012-04-16 250.00 . . .

500 830 2012-08-03 250.83 249.00 249.00 .

500 830 2012-08-03 249.00 . . .

500 598 2012-11-01 250.83 249.00 249.00 .

500 598 2012-11-01 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 297 2013-02-04 249.00 . . .

run;

data tall ;

set have ;

array c code1-code4 ;

do over c;

code=c;

if code ne ' ' then output;

end;

drop code1-code4 ;

run;

proc sort ;

by id code enc_date ;

run;

data want ;

set tall ;

by id code ;

if last.code;

run;

Obs ID ENC ENC_DATE code

1 500 297 2013-02-04 249.00

2 500 273 2010-06-09 249.01

3 500 848 2009-09-25 249.40

4 500 932 2012-04-16 250.00

5 500 130 2009-04-15 250.01

6 500 957 2010-09-08 250.80

7 500 598 2012-11-01 250.83

robertrao · Posted 11-09-2013 06:08 PM

Tom,

Your code seems pleasing for the example I posted

But if you consider this below example separately for ID 501

501 830 2012-08-03 249.00 . . .

501 598 2012-11-01 250.83 263.00 271.00 .

501 598 2012-11-01 249.00 . . .

If you take the last of the ID and CODE U WILL MISS THE CODES 250.83 AND 263.00

hOW WOULD YOU ACCOUNT FOR THIS???

THANKS

Tom · Posted 11-09-2013 06:11 PM

Did you try it? It worked for me. Note that it is finding the last date AFTER it has transposed the data.

Obs ID ENC ENC_DATE code

1 500 297 2013-02-04 249.00

2 500 273 2010-06-09 249.01

3 500 848 2009-09-25 249.40

4 500 932 2012-04-16 250.00

5 500 130 2009-04-15 250.01

6 500 957 2010-09-08 250.80

7 500 598 2012-11-01 250.83

8 501 598 2012-11-01 249.00

9 501 598 2012-11-01 250.83

10 501 598 2012-11-01 263.00

11 501 598 2012-11-01 271.00

Tal · Posted 11-09-2013 11:41 PM

hey Tom

how would you apply proc transpose in this problem?

i used this below but the codes still appear in 2 columns

proc sort;by id enc enc_date ;

proc transpose;

var code1-code4;

by id enc enc_date ;

Tom · Posted 11-10-2013 08:21 AM

You are getting more than one column because you have duplicate rows for the same set of BY values.

500 273 2010-06-09 250.83 249.00 . .

500 273 2010-06-09 249.01 . . .

robertrao · Posted 11-11-2013 04:40 PM

Does it not have to be:

"Note that it is finding the last code AFTER it has transposed the data."

instead of "Note that it is finding the last date AFTER it has transposed the data."

Thanks

Tom · Posted 11-11-2013 04:50 PM

I think you are talking semantics there. My understanding of the need was to find the maximum (last) date that any particular diagnosis code was encountered for each subject. So when you sort by subject/code/date you will want the record where LAST.CODE is True as this will have the maximum date for that particular code.

Haikuo · Posted 11-10-2013 12:25 AM

Basically the same idea with Tom, only to do the transpose and filtering using Hash:

data have ;

length ID ENC ENC_DATE 8 code1-code4 $6 ;

informat enc_date yymmdd10. ;

format enc_date yymmdd10. ;

input ID -- code4 ;

cards;

500 233 2009-03-16 250.83 . . .

500 251 2009-03-17 250.00 . . .

500 130 2009-04-15 250.01 . . .

500 092 2009-06-22 250.00 . . .

500 848 2009-09-25 249.40 . . .

500 894 2010-04-09 250.83 . . .

500 894 2010-04-09 250.00 . . .

500 273 2010-06-09 250.83 249.00 . .

500 273 2010-06-09 249.01 . . .

500 957 2010-09-08 249.00 . . .

500 957 2010-09-08 250.80 . . .

500 225 2011-01-31 250.83 249.00 249.00 .

500 225 2011-01-31 250.00 . . .

500 836 2011-07-08 250.83 249.00 249.00 .

500 836 2011-07-08 249.00 . . .

500 202 2011-10-04 249.00 249.00 250.83 249.00

500 202 2011-10-04 250.00 . . .

500 443 2011-10-27 249.00 249.00 249.00 .

500 443 2011-10-27 249.00 . . .

500 932 2012-04-16 250.83 249.00 249.00 .

500 932 2012-04-16 250.00 . . .

500 830 2012-08-03 250.83 249.00 249.00 .

500 830 2012-08-03 249.00 . . .

500 598 2012-11-01 250.83 249.00 249.00 .

500 598 2012-11-01 249.00 . . .

500 350 2013-02-04 249.00 . . .

500 297 2013-02-04 249.00 . . .

run;

proc sort data=have;

by id enc_date;

run;

data _null_;

if _n_=1 then do;

declare hash h(ordered:'d');

h.definekey('id', 'code');

h.definedata('id','enc_date','enc','code');

h.definedone();

end;

length code $6;

set have end=last;

array c code1-code4;

do over c;

if not missing(c) then do; code=c;rc=h.replace();end;

end;

if last then do; rc=h.output(dataset:'want');stop;end;

run;

Haikuo

Registration is open

SAS Training: Just a Click Away