Hi I have the following dataset: I wan to create a table that summarizes dx1-dx4 by the diagnosis regardless of the "date" where each dx per patient can only appear one. or example. patient3 has 3 diagnosis of 250, I want this to be counted as only 1 "250"
patientid dx1 dx2 dx3 dx4 date
1 250 223 224 444 5/5/2009
1 555 666 120 250 5/6/2008
2 120 666 1/2/2007
2 120 666 1/1/2007
3 250 2/2/2004
3 240 250 3/3/2004
3 250 1/1/2004
The summary table should look like this:
dx freq
223 1
224 1
120 1
250 2
and so on.
Thanks
So I believe I created what you are looking for. I turned it into a macro just in case if you have more diagnoses spots than you originally specified. Let me know if you have any questions.
DATA HAVE;
INPUT patientid 1-2 dx1 3-5 dx2 7-9 dx3 11-13 dx4 15-17 date mmddyy10.;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 250 5/6/2008
2 120 666 1/2/2007
2 120 666 1/1/2007
3 250 2/2/2004
3 240 250 3/3/2004
3 250 1/1/2004
;
run;
PROC SORT DATA = HAVE;
BY PatientID;
RUN;
%MACRO FREQ (END = );
%DO i = 1 %TO &END;
PROC FREQ DATA = HAVE NOPRINT;
TABLES PATIENTID*DX&i / OUT = FREQ_DX&i;
RUN;
DATA FREQ_DX&i;
SET FREQ_DX&i(RENAME = (COUNT = COUNT&i DX&i = DX));
DROP PERCENT;
RUN;
PROC SORT DATA = FREQ_DX&i;
BY PatientID DX;
RUN;
%END;
DATA ALMOST;
MERGE FREQ_DX1-FREQ_DX&END;
BY PatientID DX;
RUN;
PROC FREQ DATA = ALMOST;
TABLES DX / OUT = WANT;
RUN;
%MEND;
%FREQ(END=4);
Assuming your example result miscounted Dx 120 (patient 1 and patient 2) this works for your example data.
I'm treating the Dx codes as numeric to simplify reading since you didn't provide a data step to generate data and ignoring dates as there was no apparent need for date in the requested summary.
data have;
input patientid dx1 dx2 dx3 dx4 ;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 250 5/6/2008
2 120 666 . . 1/2/2007
2 120 666 . . 1/1/2007
3 250 . . . 2/2/2004
3 240 250 . . 3/3/2004
3 250 . . . 1/1/2004
;
run;
data trans (keep=patientid Dx);
set have;
array d Dx1-dx4;
do i=1 to dim (d);
Dx = d[i];
if not missing(Dx) then output;
end;
run;
Proc sql;
select dx, count(dx) as freq
from (select distinct patientid, dx from trans)
group by dx;
quit;
Thank you. This worked well. Would it be possible to have the output in order descending or ascending?
So I believe I created what you are looking for. I turned it into a macro just in case if you have more diagnoses spots than you originally specified. Let me know if you have any questions.
DATA HAVE;
INPUT patientid 1-2 dx1 3-5 dx2 7-9 dx3 11-13 dx4 15-17 date mmddyy10.;
datalines;
1 250 223 224 444 5/5/2009
1 555 666 120 250 5/6/2008
2 120 666 1/2/2007
2 120 666 1/1/2007
3 250 2/2/2004
3 240 250 3/3/2004
3 250 1/1/2004
;
run;
PROC SORT DATA = HAVE;
BY PatientID;
RUN;
%MACRO FREQ (END = );
%DO i = 1 %TO &END;
PROC FREQ DATA = HAVE NOPRINT;
TABLES PATIENTID*DX&i / OUT = FREQ_DX&i;
RUN;
DATA FREQ_DX&i;
SET FREQ_DX&i(RENAME = (COUNT = COUNT&i DX&i = DX));
DROP PERCENT;
RUN;
PROC SORT DATA = FREQ_DX&i;
BY PatientID DX;
RUN;
%END;
DATA ALMOST;
MERGE FREQ_DX1-FREQ_DX&END;
BY PatientID DX;
RUN;
PROC FREQ DATA = ALMOST;
TABLES DX / OUT = WANT;
RUN;
%MEND;
%FREQ(END=4);
Thank you for taking the time to write the macro. Both methods gave me the same answer, I really apprecaite it!
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.