Hello guys,
My data looks like this
| Patient ID | Date | ICD_1 | ICD_2 | ICD_3 | ICD_4 | ICD_5 |
| 1 | 1/1/2012 | A | B | C | ||
| 1 | 2/12/2012 | A | D | E | F | |
| 2 | 12/12/2012 | B | C |
I want to get the following:
| ICD | No.of Patients |
| A | 1 |
| B | 2 |
| C | 2 |
| D | 1 |
| E | 1 |
| F | 1 |
Basically, my data contains several patient records with their diagnosis codes (ICD). I want to take all the diagnosis codes in the data and count number of unique patients, so that I can understand most common diseases.
Thanks
Hey ladies,
You can transpose your data to a long format using PROC TRANSPOSE and then you can run a standard PROC FREQ to output the data.
If it's possible to have a patient with multiple ICD codes, you may need to remove duplicates as well.
You need transpose your data, sort nodupkey and finaly run proc freq.
Using a datastep:
data transposed;
set have;
keep patiant_id ICD;
array ICX $ icd_1 - icd_5;
di i=1 to dim(icx);
if icx(i) ne ' ' then do;
ICD = icx(i);
output;
end;
run;
proc sort data=transposed out=test nodupkey;
by ICD patient_id;
run;
proc freq data=test;
table ICD;
run;
another demo:
data have;
infile datalines truncover;
input (PatientID Date ICD_1 ICD_2 ICD_3 ICD_4 ICD_5) (:$10.);
datalines;
1 1/1/2012 A B C
1 2/12/2012 A D E F
2 12/12/2012 B C
;
proc transpose data= have out=__have;
by date notsorted;
var ICD_1-ICD_5;
run;
proc freq data=__have;
tables col1;
run;
@novinosrin, wouldn't your code count twice patient_id=1 with ICD_1='A' as it exists in two observation,
different days ?
Thank you @Shmuel , love your attention to detail.
Revised:
data have;
infile datalines truncover;
input (PatientID Date ICD_1 ICD_2 ICD_3 ICD_4 ICD_5) (:$10.);
datalines;
1 1/1/2012 A B C
1 2/12/2012 A D E F
2 12/12/2012 B C
;
proc transpose data= have out=__have;
by date notsorted;
var ICD_1-ICD_5;
run;
proc sort data=__have out=__have1 nodupkey;
by _name_ col1;
run;
proc freq data=__have1;
tables col1;
run;
Fancy hashes ? for fun-->
data have;
infile datalines truncover;
input (PatientID Date ICD_1 ICD_2 ICD_3 ICD_4 ICD_5) (:$10.);
datalines;
1 1/1/2012 A B C
1 2/12/2012 A D E F
2 12/12/2012 B C
;
data _null_;
if _N_ = 1 then do;
length icd $8 _icd $8 count 8;
declare hash h(ordered:'y');
rc = h.defineKey('icd');
rc = h.defineData('icd','_icd','count');
rc = h.defineDone();
call missing(_icd,count);
end;
set have end=last;
array t(*) ICD_1-ICD_5;
count=1;
do _n_=1 to dim(t);
icd=t(_n_);
_icd=vname(t(_n_));
if h.check() ne 0 then h.add();
else if h.find()=0 then do;
if _icd ne vname(t(_n_)) then do;count+1; h.replace(); end;
end;
end;
if last then h.output(dataset:'want(drop=_icd where=(not missing(icd))');
run;
data have;
infile datalines truncover expandtabs;
input (PatientID Date ICD_1 ICD_2 ICD_3 ICD_4 ICD_5) (:$10.);
datalines;
1 1/1/2012 A B C
1 2/12/2012 A D E F
2 12/12/2012 B C
;
proc iml;
use have(keep=ICD_:);
read all var _char_ into x;
close;
call tabulate(level,freq,x);
level=t(level);
freq=t(freq);
create want var {level freq};
append;
close;
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.