Hello guys,
My data looks like this
Patient ID | Date | ICD_1 | ICD_2 | ICD_3 | ICD_4 | ICD_5 |
1 | 1/1/2012 | A | B | C | ||
1 | 2/12/2012 | A | D | E | F | |
2 | 12/12/2012 | B | C |
I want to get the following:
ICD | No.of Patients |
A | 1 |
B | 2 |
C | 2 |
D | 1 |
E | 1 |
F | 1 |
Basically, my data contains several patient records with their diagnosis codes (ICD). I want to take all the diagnosis codes in the data and count number of unique patients, so that I can understand most common diseases.
Thanks
Hey ladies,
You can transpose your data to a long format using PROC TRANSPOSE and then you can run a standard PROC FREQ to output the data.
If it's possible to have a patient with multiple ICD codes, you may need to remove duplicates as well.
You need transpose your data, sort nodupkey and finaly run proc freq.
Using a datastep:
data transposed;
set have;
keep patiant_id ICD;
array ICX $ icd_1 - icd_5;
di i=1 to dim(icx);
if icx(i) ne ' ' then do;
ICD = icx(i);
output;
end;
run;
proc sort data=transposed out=test nodupkey;
by ICD patient_id;
run;
proc freq data=test;
table ICD;
run;
another demo:
data have;
infile datalines truncover;
input (PatientID Date ICD_1 ICD_2 ICD_3 ICD_4 ICD_5) (:$10.);
datalines;
1 1/1/2012 A B C
1 2/12/2012 A D E F
2 12/12/2012 B C
;
proc transpose data= have out=__have;
by date notsorted;
var ICD_1-ICD_5;
run;
proc freq data=__have;
tables col1;
run;
@novinosrin, wouldn't your code count twice patient_id=1 with ICD_1='A' as it exists in two observation,
different days ?
Thank you @Shmuel , love your attention to detail.
Revised:
data have;
infile datalines truncover;
input (PatientID Date ICD_1 ICD_2 ICD_3 ICD_4 ICD_5) (:$10.);
datalines;
1 1/1/2012 A B C
1 2/12/2012 A D E F
2 12/12/2012 B C
;
proc transpose data= have out=__have;
by date notsorted;
var ICD_1-ICD_5;
run;
proc sort data=__have out=__have1 nodupkey;
by _name_ col1;
run;
proc freq data=__have1;
tables col1;
run;
Fancy hashes ? for fun-->
data have;
infile datalines truncover;
input (PatientID Date ICD_1 ICD_2 ICD_3 ICD_4 ICD_5) (:$10.);
datalines;
1 1/1/2012 A B C
1 2/12/2012 A D E F
2 12/12/2012 B C
;
data _null_;
if _N_ = 1 then do;
length icd $8 _icd $8 count 8;
declare hash h(ordered:'y');
rc = h.defineKey('icd');
rc = h.defineData('icd','_icd','count');
rc = h.defineDone();
call missing(_icd,count);
end;
set have end=last;
array t(*) ICD_1-ICD_5;
count=1;
do _n_=1 to dim(t);
icd=t(_n_);
_icd=vname(t(_n_));
if h.check() ne 0 then h.add();
else if h.find()=0 then do;
if _icd ne vname(t(_n_)) then do;count+1; h.replace(); end;
end;
end;
if last then h.output(dataset:'want(drop=_icd where=(not missing(icd))');
run;
data have; infile datalines truncover expandtabs; input (PatientID Date ICD_1 ICD_2 ICD_3 ICD_4 ICD_5) (:$10.); datalines; 1 1/1/2012 A B C 1 2/12/2012 A D E F 2 12/12/2012 B C ; proc iml; use have(keep=ICD_:); read all var _char_ into x; close; call tabulate(level,freq,x); level=t(level); freq=t(freq); create want var {level freq}; append; close; quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.