Hi I have the following dataset: I wan to create a table that summarizes dx1-dx4 by the diagnosis regardless of the "date" where each dx per patient can only appear one. or example. patient3 has 3 diagnosis of 250, I want this to be counted as only 1 "250"
patientid dx1 dx2 dx3 dx4 date
1 250 223 224 444 5/5/2009
1 555 666 120 250 5/6/2008
2 120 666 1/2/2007
2 120 666 1/1/2007
3 250 2/2/2004
3 240 250 3/3/2004
3 250 1/1/2004
The summary table should look like this:
dx freq
223 1
224 1
120 1
250 2
and so on.
Thanks
So I believe I created what you are looking for. I turned it into a macro just in case if you have more diagnoses spots than you originally specified. Let me know if you have any questions.
DATA HAVE;
INPUT patientid 1-2 dx1 3-5 dx2 7-9 dx3 11-13 dx4 15-17 date mmddyy10.;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 250 5/6/2008
2 120 666 1/2/2007
2 120 666 1/1/2007
3 250 2/2/2004
3 240 250 3/3/2004
3 250 1/1/2004
;
run;
PROC SORT DATA = HAVE;
BY PatientID;
RUN;
%MACRO FREQ (END = );
%DO i = 1 %TO &END;
PROC FREQ DATA = HAVE NOPRINT;
TABLES PATIENTID*DX&i / OUT = FREQ_DX&i;
RUN;
DATA FREQ_DX&i;
SET FREQ_DX&i(RENAME = (COUNT = COUNT&i DX&i = DX));
DROP PERCENT;
RUN;
PROC SORT DATA = FREQ_DX&i;
BY PatientID DX;
RUN;
%END;
DATA ALMOST;
MERGE FREQ_DX1-FREQ_DX&END;
BY PatientID DX;
RUN;
PROC FREQ DATA = ALMOST;
TABLES DX / OUT = WANT;
RUN;
%MEND;
%FREQ(END=4);
Assuming your example result miscounted Dx 120 (patient 1 and patient 2) this works for your example data.
I'm treating the Dx codes as numeric to simplify reading since you didn't provide a data step to generate data and ignoring dates as there was no apparent need for date in the requested summary.
data have;
input patientid dx1 dx2 dx3 dx4 ;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 250 5/6/2008
2 120 666 . . 1/2/2007
2 120 666 . . 1/1/2007
3 250 . . . 2/2/2004
3 240 250 . . 3/3/2004
3 250 . . . 1/1/2004
;
run;
data trans (keep=patientid Dx);
set have;
array d Dx1-dx4;
do i=1 to dim (d);
Dx = d[i];
if not missing(Dx) then output;
end;
run;
Proc sql;
select dx, count(dx) as freq
from (select distinct patientid, dx from trans)
group by dx;
quit;
Thank you. This worked well. Would it be possible to have the output in order descending or ascending?
So I believe I created what you are looking for. I turned it into a macro just in case if you have more diagnoses spots than you originally specified. Let me know if you have any questions.
DATA HAVE;
INPUT patientid 1-2 dx1 3-5 dx2 7-9 dx3 11-13 dx4 15-17 date mmddyy10.;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 250 5/6/2008
2 120 666 1/2/2007
2 120 666 1/1/2007
3 250 2/2/2004
3 240 250 3/3/2004
3 250 1/1/2004
;
run;
PROC SORT DATA = HAVE;
BY PatientID;
RUN;
%MACRO FREQ (END = );
%DO i = 1 %TO &END;
PROC FREQ DATA = HAVE NOPRINT;
TABLES PATIENTID*DX&i / OUT = FREQ_DX&i;
RUN;
DATA FREQ_DX&i;
SET FREQ_DX&i(RENAME = (COUNT = COUNT&i DX&i = DX));
DROP PERCENT;
RUN;
PROC SORT DATA = FREQ_DX&i;
BY PatientID DX;
RUN;
%END;
DATA ALMOST;
MERGE FREQ_DX1-FREQ_DX&END;
BY PatientID DX;
RUN;
PROC FREQ DATA = ALMOST;
TABLES DX / OUT = WANT;
RUN;
%MEND;
%FREQ(END=4);
Thank you for taking the time to write the macro. Both methods gave me the same answer, I really apprecaite it!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.