Solved: summarize diagnosis multiple columns and multiple per patients

lillymaginta · Posted 04-04-2016 01:27 PM

Hi I have the following dataset: I wan to create a table that summarizes dx1-dx4 by the diagnosis regardless of the "date" where each dx per patient can only appear one. or example. patient3 has 3 diagnosis of 250, I want this to be counted as only 1 "250"

patientid    dx1 dx2 dx3 dx4      date 
1               250 223 224 444    5/5/2009 
1               555 666 120 250    5/6/2008 
2               120  666                 1/2/2007 
2               120  666                 1/1/2007 
3               250                        2/2/2004
3               240 250                      3/3/2004 
3               250                          1/1/2004

The summary table should look like this:

dx freq

223 1

224 1

120 1

250 2

and so on.

Thanks

Katie · Posted 04-04-2016 02:02 PM

So I believe I created what you are looking for. I turned it into a macro just in case if you have more diagnoses spots than you originally specified. Let me know if you have any questions.

DATA HAVE;
INPUT patientid 1-2 dx1 3-5 dx2 7-9 dx3 11-13 dx4 15-17 date mmddyy10.;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 250 5/6/2008
2 120 666 1/2/2007
2 120 666 1/1/2007
3 250 2/2/2004
3 240 250 3/3/2004
3 250 1/1/2004
;
run;

PROC SORT DATA = HAVE;
BY PatientID;
RUN;

%MACRO FREQ (END = );
%DO i = 1 %TO &END;
PROC FREQ DATA = HAVE NOPRINT;
TABLES PATIENTID*DX&i / OUT = FREQ_DX&i;
RUN;
DATA FREQ_DX&i;
SET FREQ_DX&i(RENAME = (COUNT = COUNT&i DX&i = DX));
DROP PERCENT;
RUN;
PROC SORT DATA = FREQ_DX&i;
BY PatientID DX;
RUN;
%END;
DATA ALMOST;
MERGE FREQ_DX1-FREQ_DX&END;
BY PatientID DX;
RUN;
PROC FREQ DATA = ALMOST;
TABLES DX / OUT = WANT;
RUN;
%MEND;

%FREQ(END=4);

View solution in original post

ballardw · Posted 04-04-2016 01:59 PM

Assuming your example result miscounted Dx 120 (patient 1 and patient 2) this works for your example data.

I'm treating the Dx codes as numeric to simplify reading since you didn't provide a data step to generate data and ignoring dates as there was no apparent need for date in the requested summary.

data have;
   input patientid    dx1 dx2 dx3 dx4   ;
datalines;
1               250 223 224 444    5/5/2009 
1               555 666 120 250    5/6/2008 
2               120  666  .  .     1/2/2007 
2               120  666  .  .     1/1/2007 
3               250   .   .  .     2/2/2004 
3               240 250   .  .     3/3/2004 
3               250  .    .  .     1/1/2004 
;
run;

data trans (keep=patientid Dx);
   set have;
   array d Dx1-dx4;
   do i=1 to dim (d);
      Dx = d[i];
      if not missing(Dx) then output;
   end;
run;

Proc sql;
   select dx, count(dx) as freq
   from (select distinct patientid, dx from trans)
   group by dx;
quit;

lillymaginta · Posted 04-05-2016 11:35 AM

Thank you. This worked well. Would it be possible to have the output in order descending or ascending?

Katie · Posted 04-04-2016 02:02 PM

So I believe I created what you are looking for. I turned it into a macro just in case if you have more diagnoses spots than you originally specified. Let me know if you have any questions.

DATA HAVE;
INPUT patientid 1-2 dx1 3-5 dx2 7-9 dx3 11-13 dx4 15-17 date mmddyy10.;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 250 5/6/2008
2 120 666 1/2/2007
2 120 666 1/1/2007
3 250 2/2/2004
3 240 250 3/3/2004
3 250 1/1/2004
;
run;

PROC SORT DATA = HAVE;
BY PatientID;
RUN;

%MACRO FREQ (END = );
%DO i = 1 %TO &END;
PROC FREQ DATA = HAVE NOPRINT;
TABLES PATIENTID*DX&i / OUT = FREQ_DX&i;
RUN;
DATA FREQ_DX&i;
SET FREQ_DX&i(RENAME = (COUNT = COUNT&i DX&i = DX));
DROP PERCENT;
RUN;
PROC SORT DATA = FREQ_DX&i;
BY PatientID DX;
RUN;
%END;
DATA ALMOST;
MERGE FREQ_DX1-FREQ_DX&END;
BY PatientID DX;
RUN;
PROC FREQ DATA = ALMOST;
TABLES DX / OUT = WANT;
RUN;
%MEND;

%FREQ(END=4);

lillymaginta · Posted 04-05-2016 11:40 AM

Thank you for taking the time to write the macro. Both methods gave me the same answer, I really apprecaite it!

summarize diagnosis multiple columns and multiple per patients

Re: summarize diagnosis multiple columns and multiple per patients

Re: summarize diagnosis multiple columns and multiple per patients

Re: summarize diagnosis multiple columns and multiple per patients

Re: summarize diagnosis multiple columns and multiple per patients

Re: summarize diagnosis multiple columns and multiple per patients

summarize diagnosis multiple columns and multiple per patients

Re: summarize diagnosis multiple columns and multiple per patients

Re: summarize diagnosis multiple columns and multiple per patients

Re: summarize diagnosis multiple columns and multiple per patients

Re: summarize diagnosis multiple columns and multiple per patients

Re: summarize diagnosis multiple columns and multiple per patients

Registration is open

SAS Training: Just a Click Away