DATA Step, Macro, Functions and more

Grouping using multiple columns

Reply
Contributor
Posts: 58

Grouping using multiple columns

Hello guys,

 

My data looks like this

Patient IDDateICD_1ICD_2ICD_3ICD_4ICD_5
11/1/2012ABC  
12/12/2012ADEF 
212/12/2012BC   

 

I want to get the following:

ICDNo.of Patients
A1
B2
C2
D1
E1
F1

 

Basically, my data contains several patient records with their diagnosis codes (ICD). I want to take all the diagnosis codes in the data and count number of unique patients, so that I can understand most common diseases.

 

Thanks

Super User
Posts: 23,773

Re: Grouping using multiple columns

Posted in reply to Sujithpeta

Hey ladies,

 

You can transpose your data to a long format using PROC TRANSPOSE and then you can run a standard PROC FREQ to output the data. 

 

If it's possible to have a patient with multiple ICD codes, you may need to remove duplicates as well. 

Trusted Advisor
Posts: 1,837

Re: Grouping using multiple columns

Posted in reply to Sujithpeta

You need transpose your data, sort nodupkey and finaly run proc freq.

 

Using a datastep:

 

data transposed;

  set have;

        keep patiant_id  ICD;

        array ICX $ icd_1 - icd_5;

        di i=1 to dim(icx);

            if icx(i) ne ' ' then do;

               ICD = icx(i);

               output;

            end;

run;

proc sort data=transposed out=test nodupkey;

     by ICD patient_id;

run;

proc freq data=test;

      table ICD;

run;

PROC Star
Posts: 1,834

Re: Grouping using multiple columns

Posted in reply to Sujithpeta

another demo:

data have;
infile datalines truncover;
input (PatientID	Date	ICD_1	ICD_2	ICD_3	ICD_4	ICD_5) (:$10.);
datalines;
1	1/1/2012	A	B	C	 	 
1	2/12/2012	A	D	E	F	 
2	12/12/2012	B	C	 	 	 
;


proc transpose data= have out=__have;
by date notsorted;
var ICD_1-ICD_5;
run;


proc freq data=__have;
tables col1;
run;
Trusted Advisor
Posts: 1,837

Re: Grouping using multiple columns

Posted in reply to novinosrin

@novinosrin, wouldn't your code count twice patient_id=1 with ICD_1='A'  as it exists in two observation,

different days ?

PROC Star
Posts: 1,834

Re: Grouping using multiple columns

Thank you @Shmuel , love your attention to detail. 

Revised: 

data have;
infile datalines truncover;
input (PatientID	Date	ICD_1	ICD_2	ICD_3	ICD_4	ICD_5) (:$10.);
datalines;
1	1/1/2012	A	B	C	 	 
1	2/12/2012	A	D	E	F	 
2	12/12/2012	B	C	 	 	 
;


proc transpose data= have out=__have;
by date notsorted;
var ICD_1-ICD_5;
run;

proc sort data=__have out=__have1 nodupkey;
by _name_ col1;
run;


proc freq data=__have1;
tables col1;
run;
PROC Star
Posts: 1,834

Re: Grouping using multiple columns

Posted in reply to Sujithpeta

Fancy hashes ? for fun-->

 

data have;
infile datalines truncover;
input (PatientID	Date	ICD_1	ICD_2	ICD_3	ICD_4	ICD_5) (:$10.);
datalines;
1	1/1/2012	A	B	C	 	 
1	2/12/2012	A	D	E	F	 
2	12/12/2012	B	C	 	 	 
;
data _null_;
if _N_ = 1 then do;
	length icd $8 _icd $8 count 8;
   declare hash h(ordered:'y');
   rc = h.defineKey('icd');
   rc = h.defineData('icd','_icd','count');
   rc = h.defineDone();
   call missing(_icd,count);
end;
set have end=last;
array t(*) ICD_1-ICD_5;
count=1;
do _n_=1 to dim(t);
	icd=t(_n_);
	_icd=vname(t(_n_));
	if h.check() ne 0 then h.add();
	else if  h.find()=0 then do;
	if _icd ne vname(t(_n_)) then do;count+1; h.replace(); end;
	end;
end;
if last then h.output(dataset:'want(drop=_icd where=(not missing(icd))');
run;
Super User
Posts: 10,787

Re: Grouping using multiple columns

Posted in reply to Sujithpeta


data have;
infile datalines truncover expandtabs;
input (PatientID	Date	ICD_1	ICD_2	ICD_3	ICD_4	ICD_5) (:$10.);
datalines;
1	1/1/2012	A	B	C	 	 
1	2/12/2012	A	D	E	F	 
2	12/12/2012	B	C	 	 	 
;

proc iml;
use have(keep=ICD_:);
read all var _char_ into x;
close;
call tabulate(level,freq,x);
level=t(level);
freq=t(freq);
create want var {level freq};
append;
close;
quit;

Ask a Question
Discussion stats
  • 7 replies
  • 166 views
  • 1 like
  • 5 in conversation