If you've seen any of my recent posts, you know that I've been struggling with this for a while now. My supervisor keeps telling me it's easy and it should have only taken a day. What I'm finding is that it's fairly easy to generate the numbers, but very time-consuming and tedious to rearrange them into a summary table. Proc Report seems to help a lot, but there is still a lot of data prep to get to a dataset that can be fed into Proc Report.
Anyway, as the title says, I'm trying to create a summary table. Below is sample data, and the code that I have so far. PUFFWGT is the person weight, while PUFF001 is a replicate weight to account for effects of the sampling/survey design. There are actually 100 replicate weights, but I only included 2 as an example.
Also below is an example table that shows the desired layout. This table was created with the TableN macro, which works great, but unfortunately doesn't work with weights. I'm not too concerned with labels right now, I just need the correct numbers in the correct place. There are a few differences between the Proc Report output and the desired table:
--There is no column for missing.
--There is no column for Total. In each row, this should be the sum of the Missing, no ADRD, and ADRD groups. The Total row has the correct n, but not the correct percentage.
--Once the Total column is created, I can drop the Frequency columns.
--The percentages in () should be column totals for each variable. So the percentage of DEM_AGE=1 + DEM_AGE=2 should equal 100%
--Need to add a column for the p-values from the chisq file.
data have;
infile datalines dsd dlm=',' truncover;
input TM_group ADRD_group DEM_AGE DEM_SEX PUFFWGT PUFF001-PUFF002;
datalines;
0,0,1,1,2110.5357,676.0632354,3108.917787
0,1,1,1,918.4676314,1106.627216,11398.01208
0,0,1,1,2758.096955,4735.215718,1418.690422
0,1,1,1,2456.208457,918.6245816,4548.556261
0,1,1,1,616.4292264,1033.661611,1213.613946
0,0,1,1,3673.275956,6316.504889,1704.047563
0,1,1,1,1102.850296,304.9154776,904.2769054
0,0,1,2,2714.610724,877.3159168,993.7824266
0,1,2,2,20090.52479,6085.947656,7590.092351
1,1,1,2,9098.780523,3498.802425,1335.317433
1,1,1,2,7196.92957,11977.46854,1240.066816
1,1,1,1,842.3146743,1255.501335,4314.886534
1,0,2,2,13516.51008,4775.164303,1106.604846
1,0,1,2,4460.828162,1083.99043,943.9937287
1,0,1,2,4818.845124,9286.932256,8197.376118
1,0,1,1,2848.023551,4739.321046,1113.329766
1,.,1,2,3050.820193,805.9607077,761.0495075
1,.,1,1,4256.565135,1176.233283,1775.665732
1,1,1,1,5211.416525,1570.093954,291.3352611
1,1,1,2,4181.422692,8242.459467,2573.543707
1,1,1,1,3179.943422,4688.70721,6518.281019
.,.,1,1,3451.708714,4256.625072,1167.477312
.,.,1,1,3018.745468,4729.150346,1565.75873
.,.,1,2,4098.775,1486.742512,1283.25
.,.,1,1,3483.708151,1098.980881,1349.598
;RUN;
/*Sort analysis file on BY variables, as required by PROC SURVEYFREQ*/
PROC SORT data=have;
BY TM_group ADRD_group; RUN;
/*** SECTION 1 -- Examine weighted freqs for each group ***/
/*Section 1.1 -- Weighted freqs - both person and sample weights*/
/*Section 1.1.1 -- Use ODS to save results into a file*/
ods output crosstabs = temp.freqs_raw_t1
chisq = temp.chisq_raw_t1;
/*Section 1.1.2 -- Generate weighted freqs*/
title1 "Section 1.1.2 -- Weighted freqs - both person and sample weights V&version.";
PROC SURVEYFREQ data=have VARMETHOD=BRR missing;
WEIGHT PUFFWGT;
REPWEIGHTS PUFF001-PUFF002;
BY TM_group;
TABLES DEM_AGE*ADRD_group DEM_SEX*ADRD_group / rschisq;
RUN;
/*** SECTION 2 -- Reorganize ODS output tables ***/
/*Section 2.1 -- freqs table*/
/*Keep needed variables and use macro to populate Value_Labels with proper values*/
DATA temp.freqs_t1_t; SET temp.freqs_raw_t1;
DROP WgtFreq /*Frequency*/ StdDev StdErr _SkipLine;
/*Create Value_Labels column and populate it with labels from each variable*/
Value_Labels = " ";
%macro val_labs (val_labs);
IF Table = "Table &val_labs. * ADRD_group" THEN Value_Labels = F_&val_labs.;
DROP F_&val_labs. &val_labs.;
%mend;
%val_labs(DEM_AGE)
%val_labs(DEM_SEX)
/*Reformat and combine weighted count and percent columns*/
PERCENT_RD = ROUND(Percent,.01);
Freq_percent = " ";
Freq_percent = COMPRESS(Frequency||"("||put(PERCENT_RD,8.2)||"%"||")");
/*IF Frequency <10 THEN Freq_percent = "<10"*/;
DROP /*Frequency*/ Percent PERCENT_RD; /*Retain Frequency because it is needed for the Total column*/
RUN;
DATA temp.chisq_2nd_t1; SET temp.chisq_raw_t1;
IF Label1 ^= "Pr > ChiSq" THEN delete; RUN;
/*REVISE THIS SECTION TO INCLUDE MISSING COLUMN*/
/*Section 2.1.1 -- Put columns in correct order*/
PROC REPORT data=temp.freqs_t1_t;
COLUMNS table Value_Labels TM_group, ADRD_group,(Freq_percent Frequency);
DEFINE table / group;
DEFINE value_labels / group;
DEFINE TM_group / across;
DEFINE ADRD_group /across;
RUN;
/*Section 2.2 -- chisq table*/
/*Drop unneeded variables, delete results for missing cases, and only keep Rao-Scott chisq p-value rows*/
DATA temp.chisq_t1_t; SET temp.chisq_raw_t1;
DROP Name1 nValue1;
IF TM_group = . THEN delete;
/*IF ADRD_group = . THEN delete;*/
IF Label1 ^= "Pr > ChiSq" THEN delete;
RUN;
MA_group
TM_group
Missing
no ADRD
ADRD
Total
P-value
Missing
no ADRD
ADRD
Total
P-value
(N=14)
(N=21836)
(N=1232)
(N=23082)
(N=17)
(N=29642)
(N=1475)
(N=31134)
Age group, n (%)
<.0001 1
<.0001 1
2:Age Group [65,75)
4 (28.6%)
8701 (39.8%)
179 (14.5%)
8884 (38.5%)
5 (29.4%)
12504 (42.2%)
215 (14.6%)
12724 (40.9%)
3:Age Group >=75
10 (71.4%)
13135 (60.2%)
1053 (85.5%)
14198 (61.5%)
12 (70.6%)
17138 (57.8%)
1260 (85.4%)
18410 (59.1%)
Gender, n (%)
<.0001 1
0.00751
1:Male
6 (42.9%)
9479 (43.4%)
453 (36.8%)
9938 (43.1%)
6 (35.3%)
13623 (46.0%)
619 (42.0%)
14248 (45.8%)
2:Female
8 (57.1%)
12357 (56.6%)
779 (63.2%)
13144 (56.9%)
11 (64.7%)
16019 (54.0%)
856 (58.0%)
16886 (54.2%)
... View more