Hi,
I have the following table.
Id is the person
enc is the different times the person came to see the physician
in this table we have duplicates of ENC because they were pulled from different sources and clubbed together
WE WANT ALL THE DIFFERENT CODES FOR A PATIENT (the latest encounter ) with the respective encounter dates...
ID ENC ENC_DATE code1 code2 code3 code4
500 233 2009-03-16 250.83 . . .
500 251 2009-03-17 250.00 . . .
500 130 2009-04-15 250.01 . . .
500 092 2009-06-22 250.00 . . .
500 848 2009-09-25 249.40 . . .
500 894 2010-04-09 250.83 . . .
500 894 2010-04-09 250.00 . . .
500 273 2010-06-09 250.83 249.00 . .
500 273 2010-06-09 249.01 . . .
500 957 2010-09-08 249.00 . . .
500 957 2010-09-08 250.80 . . .
500 225 2011-01-31 250.83 249.00 249.00 .
500 225 2011-01-31 250.00 . . .
500 836 2011-07-08 250.83 249.00 249.00 .
500 836 2011-07-08 249.00 . . .
500 202 2011-10-04 249.00 249.00 250.83 249.00
500 202 2011-10-04 250.00 . . .
500 443 2011-10-27 249.00 249.00 249.00 .
500 443 2011-10-27 249.00 . . .
500 932 2012-04-16 250.83 249.00 249.00 .
500 932 2012-04-16 250.00 . . .
500 830 2012-08-03 250.83 249.00 249.00 .
500 830 2012-08-03 249.00 . . .
500 598 2012-11-01 250.83 249.00 249.00 .
500 598 2012-11-01 249.00 . . .
500 350 2013-02-04 249.00 . . .
500 350 2013-02-04 249.00 . . .
500 297 2013-02-04 249.00 . . .
WANT
500 297 2013-02-04 249.00-------------latest enc with this code
500 598 2012-11-01 250.83-------------latest enc with this code
500 932 2012-04-16 250.00-------------latest enc with this code
500 957 2010-09-08 250.80-------------latest enc with this code
500 273 2010-06-09 249.01-------------latest enc with this code
500 848 2009-09-25 249.40-------------latest enc with this code
500 130 2009-04-15 250.01-------------latest enc with this code
Thanks
1) Transpose your CODE1-CODE4 into a single column. You can use PROC TRANSPOSE or a DATA step for that.
2) Sort by ID CODE ENC_DATE
3) Take the last per ID CODE.
data have ;
length ID ENC ENC_DATE 8 code1-code4 $6 ;
informat enc_date yymmdd10. ;
format enc_date yymmdd10. ;
input ID -- code4 ;
cards;
500 233 2009-03-16 250.83 . . .
500 251 2009-03-17 250.00 . . .
500 130 2009-04-15 250.01 . . .
500 092 2009-06-22 250.00 . . .
500 848 2009-09-25 249.40 . . .
500 894 2010-04-09 250.83 . . .
500 894 2010-04-09 250.00 . . .
500 273 2010-06-09 250.83 249.00 . .
500 273 2010-06-09 249.01 . . .
500 957 2010-09-08 249.00 . . .
500 957 2010-09-08 250.80 . . .
500 225 2011-01-31 250.83 249.00 249.00 .
500 225 2011-01-31 250.00 . . .
500 836 2011-07-08 250.83 249.00 249.00 .
500 836 2011-07-08 249.00 . . .
500 202 2011-10-04 249.00 249.00 250.83 249.00
500 202 2011-10-04 250.00 . . .
500 443 2011-10-27 249.00 249.00 249.00 .
500 443 2011-10-27 249.00 . . .
500 932 2012-04-16 250.83 249.00 249.00 .
500 932 2012-04-16 250.00 . . .
500 830 2012-08-03 250.83 249.00 249.00 .
500 830 2012-08-03 249.00 . . .
500 598 2012-11-01 250.83 249.00 249.00 .
500 598 2012-11-01 249.00 . . .
500 350 2013-02-04 249.00 . . .
500 350 2013-02-04 249.00 . . .
500 297 2013-02-04 249.00 . . .
run;
data tall ;
set have ;
array c code1-code4 ;
do over c;
code=c;
if code ne ' ' then output;
end;
drop code1-code4 ;
run;
proc sort ;
by id code enc_date ;
run;
data want ;
set tall ;
by id code ;
if last.code;
run;
Obs ID ENC ENC_DATE code
1 500 297 2013-02-04 249.00
2 500 273 2010-06-09 249.01
3 500 848 2009-09-25 249.40
4 500 932 2012-04-16 250.00
5 500 130 2009-04-15 250.01
6 500 957 2010-09-08 250.80
7 500 598 2012-11-01 250.83
A couple of questions:
1. Why do you drop all of the other ENCs?
2. When there is a tie for the data what criteria do you choose to select the record?
3. Why do you drop codes 2, 3 and 4?
Art,
Thanks for the reply.
1) we drop all other encounters because we wanted a particular code with the latest encounter.....if he has the same code in the prior encounters it means he has the same disease...so we don't go for it in the prior encounters.
2)I dint understand the tie criteria you asked !!could you elaborate...possibly with an example??
3)we are not dropping codes 2,3 and 4 from nalysis..its only that in my example a new code with the latest encounter was not present in the 2nd 3rd or 4th position
Thanks
1. you keep: 500 297 2013-02-04 249.00 . . .
but you drop: 500 233 2009-03-16 250.83
I don't understand the rules that would explain the above
2. given:
500 598 2012-11-01 250.83 249.00 249.00 .
500 598 2012-11-01 249.00 . . .
you keep: 500 598 2012-11-01 250.83
why do you select one over the other
3. Given the example I provided in 2, above, you kept the first record's value for code1, but dropped the other codes. Please explain
.
1) I kept 500 297 2013-02-04 249.00 . .
.because this is the latest in the list provided when he was diagnosed with 249.00(by the date). Any other 249.00 prior to this date will be ignored since he is the same patient and only the encounters are differing .
I dropped: 500 233 2009-03-16 250.83 because I already captured 500 598 2012-11-01 250.83(this being the latest encounter with this code)
2) there are only two unique codes here 249.00 and 250.83.
249.00 was already captured in the 500 297 2013-02-04 249.00( latest for that code across any record). so we will ignore the ( 500 598 2012-11-01 249.00) completely
so we scan the next lower date (500 598 2012-11-01 250.83 249.00 249.00) and have to take only the 250.83 because the 249.00 was already captured for this patient
Regards
1) Transpose your CODE1-CODE4 into a single column. You can use PROC TRANSPOSE or a DATA step for that.
2) Sort by ID CODE ENC_DATE
3) Take the last per ID CODE.
data have ;
length ID ENC ENC_DATE 8 code1-code4 $6 ;
informat enc_date yymmdd10. ;
format enc_date yymmdd10. ;
input ID -- code4 ;
cards;
500 233 2009-03-16 250.83 . . .
500 251 2009-03-17 250.00 . . .
500 130 2009-04-15 250.01 . . .
500 092 2009-06-22 250.00 . . .
500 848 2009-09-25 249.40 . . .
500 894 2010-04-09 250.83 . . .
500 894 2010-04-09 250.00 . . .
500 273 2010-06-09 250.83 249.00 . .
500 273 2010-06-09 249.01 . . .
500 957 2010-09-08 249.00 . . .
500 957 2010-09-08 250.80 . . .
500 225 2011-01-31 250.83 249.00 249.00 .
500 225 2011-01-31 250.00 . . .
500 836 2011-07-08 250.83 249.00 249.00 .
500 836 2011-07-08 249.00 . . .
500 202 2011-10-04 249.00 249.00 250.83 249.00
500 202 2011-10-04 250.00 . . .
500 443 2011-10-27 249.00 249.00 249.00 .
500 443 2011-10-27 249.00 . . .
500 932 2012-04-16 250.83 249.00 249.00 .
500 932 2012-04-16 250.00 . . .
500 830 2012-08-03 250.83 249.00 249.00 .
500 830 2012-08-03 249.00 . . .
500 598 2012-11-01 250.83 249.00 249.00 .
500 598 2012-11-01 249.00 . . .
500 350 2013-02-04 249.00 . . .
500 350 2013-02-04 249.00 . . .
500 297 2013-02-04 249.00 . . .
run;
data tall ;
set have ;
array c code1-code4 ;
do over c;
code=c;
if code ne ' ' then output;
end;
drop code1-code4 ;
run;
proc sort ;
by id code enc_date ;
run;
data want ;
set tall ;
by id code ;
if last.code;
run;
Obs ID ENC ENC_DATE code
1 500 297 2013-02-04 249.00
2 500 273 2010-06-09 249.01
3 500 848 2009-09-25 249.40
4 500 932 2012-04-16 250.00
5 500 130 2009-04-15 250.01
6 500 957 2010-09-08 250.80
7 500 598 2012-11-01 250.83
Tom,
Your code seems pleasing for the example I posted
But if you consider this below example separately for ID 501
501 830 2012-08-03 249.00 . . .
501 598 2012-11-01 250.83 263.00 271.00 .
501 598 2012-11-01 249.00 . . .
If you take the last of the ID and CODE U WILL MISS THE CODES 250.83 AND 263.00
hOW WOULD YOU ACCOUNT FOR THIS???
THANKS
Did you try it? It worked for me. Note that it is finding the last date AFTER it has transposed the data.
Obs ID ENC ENC_DATE code
1 500 297 2013-02-04 249.00
2 500 273 2010-06-09 249.01
3 500 848 2009-09-25 249.40
4 500 932 2012-04-16 250.00
5 500 130 2009-04-15 250.01
6 500 957 2010-09-08 250.80
7 500 598 2012-11-01 250.83
8 501 598 2012-11-01 249.00
9 501 598 2012-11-01 250.83
10 501 598 2012-11-01 263.00
11 501 598 2012-11-01 271.00
hey Tom
how would you apply proc transpose in this problem?
i used this below but the codes still appear in 2 columns
proc sort;by id enc enc_date ;
proc transpose;
var code1-code4;
by id enc enc_date ;
You are getting more than one column because you have duplicate rows for the same set of BY values.
500 273 2010-06-09 250.83 249.00 . .
500 273 2010-06-09 249.01 . . .
Does it not have to be:
"Note that it is finding the last code AFTER it has transposed the data."
instead of "Note that it is finding the last date AFTER it has transposed the data."
Thanks
I think you are talking semantics there. My understanding of the need was to find the maximum (last) date that any particular diagnosis code was encountered for each subject. So when you sort by subject/code/date you will want the record where LAST.CODE is True as this will have the maximum date for that particular code.
Basically the same idea with Tom, only to do the transpose and filtering using Hash:
data have ;
length ID ENC ENC_DATE 8 code1-code4 $6 ;
informat enc_date yymmdd10. ;
format enc_date yymmdd10. ;
input ID -- code4 ;
cards;
500 233 2009-03-16 250.83 . . .
500 251 2009-03-17 250.00 . . .
500 130 2009-04-15 250.01 . . .
500 092 2009-06-22 250.00 . . .
500 848 2009-09-25 249.40 . . .
500 894 2010-04-09 250.83 . . .
500 894 2010-04-09 250.00 . . .
500 273 2010-06-09 250.83 249.00 . .
500 273 2010-06-09 249.01 . . .
500 957 2010-09-08 249.00 . . .
500 957 2010-09-08 250.80 . . .
500 225 2011-01-31 250.83 249.00 249.00 .
500 225 2011-01-31 250.00 . . .
500 836 2011-07-08 250.83 249.00 249.00 .
500 836 2011-07-08 249.00 . . .
500 202 2011-10-04 249.00 249.00 250.83 249.00
500 202 2011-10-04 250.00 . . .
500 443 2011-10-27 249.00 249.00 249.00 .
500 443 2011-10-27 249.00 . . .
500 932 2012-04-16 250.83 249.00 249.00 .
500 932 2012-04-16 250.00 . . .
500 830 2012-08-03 250.83 249.00 249.00 .
500 830 2012-08-03 249.00 . . .
500 598 2012-11-01 250.83 249.00 249.00 .
500 598 2012-11-01 249.00 . . .
500 350 2013-02-04 249.00 . . .
500 350 2013-02-04 249.00 . . .
500 297 2013-02-04 249.00 . . .
run;
proc sort data=have;
by id enc_date;
run;
data _null_;
if _n_=1 then do;
declare hash h(ordered:'d');
h.definekey('id', 'code');
h.definedata('id','enc_date','enc','code');
h.definedone();
end;
length code $6;
set have end=last;
array c code1-code4;
do over c;
if not missing(c) then do; code=c;rc=h.replace();end;
end;
if last then do; rc=h.output(dataset:'want');stop;end;
run;
Haikuo
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.