Re: Grouping using multiple columns

Sujithpeta · Posted 01-25-2018 11:34 AM

Hello guys,

My data looks like this

Patient ID	Date	ICD_1	ICD_2	ICD_3	ICD_4	ICD_5
1	1/1/2012	A	B	C
1	2/12/2012	A	D	E	F
2	12/12/2012	B	C

I want to get the following:

ICD	No.of Patients
A	1
B	2
C	2
D	1
E	1
F	1

Basically, my data contains several patient records with their diagnosis codes (ICD). I want to take all the diagnosis codes in the data and count number of unique patients, so that I can understand most common diseases.

Thanks

Reeza · Posted 01-25-2018 11:42 AM

Hey ladies,

You can transpose your data to a long format using PROC TRANSPOSE and then you can run a standard PROC FREQ to output the data.

If it's possible to have a patient with multiple ICD codes, you may need to remove duplicates as well.

Shmuel · Posted 01-25-2018 11:51 AM

You need transpose your data, sort nodupkey and finaly run proc freq.

Using a datastep:

data transposed;

set have;

keep patiant_id ICD;

array ICX $ icd_1 - icd_5;

di i=1 to dim(icx);

if icx(i) ne ' ' then do;

ICD = icx(i);

output;

end;

run;

proc sort data=transposed out=test nodupkey;

by ICD patient_id;

run;

proc freq data=test;

table ICD;

run;

novinosrin · Posted 01-25-2018 11:54 AM

another demo:

data have;
infile datalines truncover;
input (PatientID	Date	ICD_1	ICD_2	ICD_3	ICD_4	ICD_5) (:$10.);
datalines;
1	1/1/2012	A	B	C	 	 
1	2/12/2012	A	D	E	F	 
2	12/12/2012	B	C	 	 	 
;


proc transpose data= have out=__have;
by date notsorted;
var ICD_1-ICD_5;
run;


proc freq data=__have;
tables col1;
run;

Shmuel · Posted 01-25-2018 12:04 PM

@novinosrin, wouldn't your code count twice patient_id=1 with ICD_1='A' as it exists in two observation,

different days ?

novinosrin · Posted 01-25-2018 12:25 PM

Thank you @Shmuel , love your attention to detail.

Revised:

data have;
infile datalines truncover;
input (PatientID	Date	ICD_1	ICD_2	ICD_3	ICD_4	ICD_5) (:$10.);
datalines;
1	1/1/2012	A	B	C	 	 
1	2/12/2012	A	D	E	F	 
2	12/12/2012	B	C	 	 	 
;


proc transpose data= have out=__have;
by date notsorted;
var ICD_1-ICD_5;
run;

proc sort data=__have out=__have1 nodupkey;
by _name_ col1;
run;


proc freq data=__have1;
tables col1;
run;

novinosrin · Posted 01-25-2018 01:12 PM

Fancy hashes ? for fun-->

data have;
infile datalines truncover;
input (PatientID	Date	ICD_1	ICD_2	ICD_3	ICD_4	ICD_5) (:$10.);
datalines;
1	1/1/2012	A	B	C	 	 
1	2/12/2012	A	D	E	F	 
2	12/12/2012	B	C	 	 	 
;
data _null_;
if _N_ = 1 then do;
	length icd $8 _icd $8 count 8;
   declare hash h(ordered:'y');
   rc = h.defineKey('icd');
   rc = h.defineData('icd','_icd','count');
   rc = h.defineDone();
   call missing(_icd,count);
end;
set have end=last;
array t(*) ICD_1-ICD_5;
count=1;
do _n_=1 to dim(t);
	icd=t(_n_);
	_icd=vname(t(_n_));
	if h.check() ne 0 then h.add();
	else if  h.find()=0 then do;
	if _icd ne vname(t(_n_)) then do;count+1; h.replace(); end;
	end;
end;
if last then h.output(dataset:'want(drop=_icd where=(not missing(icd))');
run;

Ksharp · Posted 01-26-2018 10:00 AM



data have;
infile datalines truncover expandtabs;
input (PatientID	Date	ICD_1	ICD_2	ICD_3	ICD_4	ICD_5) (:$10.);
datalines;
1	1/1/2012	A	B	C	 	 
1	2/12/2012	A	D	E	F	 
2	12/12/2012	B	C	 	 	 
;

proc iml;
use have(keep=ICD_:);
read all var _char_ into x;
close;
call tabulate(level,freq,x);
level=t(level);
freq=t(freq);
create want var {level freq};
append;
close;
quit;

SAS Training: Just a Click Away