Hello guys,
My data looks like this
Patient ID | Date | ICD_1 | ICD_2 | ICD_3 | ICD_4 | ICD_5 |
1 | 1/1/2012 | A | B | C | ||
1 | 2/12/2012 | A | D | E | F | |
2 | 12/12/2012 | B | C |
I want to get the following:
ICD | No.of Patients |
A | 1 |
B | 2 |
C | 2 |
D | 1 |
E | 1 |
F | 1 |
Basically, my data contains several patient records with their diagnosis codes (ICD). I want to take all the diagnosis codes in the data and count number of unique patients, so that I can understand most common diseases.
Thanks
Hey ladies,
You can transpose your data to a long format using PROC TRANSPOSE and then you can run a standard PROC FREQ to output the data.
If it's possible to have a patient with multiple ICD codes, you may need to remove duplicates as well.
You need transpose your data, sort nodupkey and finaly run proc freq.
Using a datastep:
data transposed;
set have;
keep patiant_id ICD;
array ICX $ icd_1 - icd_5;
di i=1 to dim(icx);
if icx(i) ne ' ' then do;
ICD = icx(i);
output;
end;
run;
proc sort data=transposed out=test nodupkey;
by ICD patient_id;
run;
proc freq data=test;
table ICD;
run;
another demo:
data have;
infile datalines truncover;
input (PatientID Date ICD_1 ICD_2 ICD_3 ICD_4 ICD_5) (:$10.);
datalines;
1 1/1/2012 A B C
1 2/12/2012 A D E F
2 12/12/2012 B C
;
proc transpose data= have out=__have;
by date notsorted;
var ICD_1-ICD_5;
run;
proc freq data=__have;
tables col1;
run;
@novinosrin, wouldn't your code count twice patient_id=1 with ICD_1='A' as it exists in two observation,
different days ?
Thank you @Shmuel , love your attention to detail.
Revised:
data have;
infile datalines truncover;
input (PatientID Date ICD_1 ICD_2 ICD_3 ICD_4 ICD_5) (:$10.);
datalines;
1 1/1/2012 A B C
1 2/12/2012 A D E F
2 12/12/2012 B C
;
proc transpose data= have out=__have;
by date notsorted;
var ICD_1-ICD_5;
run;
proc sort data=__have out=__have1 nodupkey;
by _name_ col1;
run;
proc freq data=__have1;
tables col1;
run;
Fancy hashes ? for fun-->
data have;
infile datalines truncover;
input (PatientID Date ICD_1 ICD_2 ICD_3 ICD_4 ICD_5) (:$10.);
datalines;
1 1/1/2012 A B C
1 2/12/2012 A D E F
2 12/12/2012 B C
;
data _null_;
if _N_ = 1 then do;
length icd $8 _icd $8 count 8;
declare hash h(ordered:'y');
rc = h.defineKey('icd');
rc = h.defineData('icd','_icd','count');
rc = h.defineDone();
call missing(_icd,count);
end;
set have end=last;
array t(*) ICD_1-ICD_5;
count=1;
do _n_=1 to dim(t);
icd=t(_n_);
_icd=vname(t(_n_));
if h.check() ne 0 then h.add();
else if h.find()=0 then do;
if _icd ne vname(t(_n_)) then do;count+1; h.replace(); end;
end;
end;
if last then h.output(dataset:'want(drop=_icd where=(not missing(icd))');
run;
data have; infile datalines truncover expandtabs; input (PatientID Date ICD_1 ICD_2 ICD_3 ICD_4 ICD_5) (:$10.); datalines; 1 1/1/2012 A B C 1 2/12/2012 A D E F 2 12/12/2012 B C ; proc iml; use have(keep=ICD_:); read all var _char_ into x; close; call tabulate(level,freq,x); level=t(level); freq=t(freq); create want var {level freq}; append; close; quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.