Re: Creating summary table of Proc Surveyfreq with subgroups, totals, ...

Wolverine · Posted 05-29-2024 12:03 PM

If you've seen any of my recent posts, you know that I've been struggling with this for a while now. My supervisor keeps telling me it's easy and it should have only taken a day. What I'm finding is that it's fairly easy to generate the numbers, but very time-consuming and tedious to rearrange them into a summary table. Proc Report seems to help a lot, but there is still a lot of data prep to get to a dataset that can be fed into Proc Report.

Anyway, as the title says, I'm trying to create a summary table. Below is sample data, and the code that I have so far. PUFFWGT is the person weight, while PUFF001 is a replicate weight to account for effects of the sampling/survey design. There are actually 100 replicate weights, but I only included 2 as an example.

Also below is an example table that shows the desired layout. This table was created with the TableN macro, which works great, but unfortunately doesn't work with weights. I'm not too concerned with labels right now, I just need the correct numbers in the correct place. There are a few differences between the Proc Report output and the desired table:

--There is no column for missing.

--There is no column for Total. In each row, this should be the sum of the Missing, no ADRD, and ADRD groups. The Total row has the correct n, but not the correct percentage.

--Once the Total column is created, I can drop the Frequency columns.

--The percentages in () should be column totals for each variable. So the percentage of DEM_AGE=1 + DEM_AGE=2 should equal 100%

--Need to add a column for the p-values from the chisq file.

data have;
infile datalines dsd dlm=',' truncover;
input TM_group ADRD_group DEM_AGE DEM_SEX PUFFWGT PUFF001-PUFF002;
datalines;
0,0,1,1,2110.5357,676.0632354,3108.917787
0,1,1,1,918.4676314,1106.627216,11398.01208
0,0,1,1,2758.096955,4735.215718,1418.690422
0,1,1,1,2456.208457,918.6245816,4548.556261
0,1,1,1,616.4292264,1033.661611,1213.613946
0,0,1,1,3673.275956,6316.504889,1704.047563
0,1,1,1,1102.850296,304.9154776,904.2769054
0,0,1,2,2714.610724,877.3159168,993.7824266
0,1,2,2,20090.52479,6085.947656,7590.092351
1,1,1,2,9098.780523,3498.802425,1335.317433
1,1,1,2,7196.92957,11977.46854,1240.066816
1,1,1,1,842.3146743,1255.501335,4314.886534
1,0,2,2,13516.51008,4775.164303,1106.604846
1,0,1,2,4460.828162,1083.99043,943.9937287
1,0,1,2,4818.845124,9286.932256,8197.376118
1,0,1,1,2848.023551,4739.321046,1113.329766
1,.,1,2,3050.820193,805.9607077,761.0495075
1,.,1,1,4256.565135,1176.233283,1775.665732
1,1,1,1,5211.416525,1570.093954,291.3352611
1,1,1,2,4181.422692,8242.459467,2573.543707
1,1,1,1,3179.943422,4688.70721,6518.281019
.,.,1,1,3451.708714,4256.625072,1167.477312
.,.,1,1,3018.745468,4729.150346,1565.75873
.,.,1,2,4098.775,1486.742512,1283.25
.,.,1,1,3483.708151,1098.980881,1349.598
;RUN;

/*Sort analysis file on BY variables, as required by PROC SURVEYFREQ*/
PROC SORT data=have;
      BY TM_group ADRD_group; RUN;

/***  SECTION 1 -- Examine weighted freqs for each group    ***/
/*Section 1.1 -- Weighted freqs - both person and sample weights*/
 
/*Section 1.1.1 -- Use ODS to save results into a file*/
ods output crosstabs = temp.freqs_raw_t1  
         chisq = temp.chisq_raw_t1;       

/*Section 1.1.2 -- Generate weighted freqs*/
title1 "Section 1.1.2 -- Weighted freqs - both person and sample weights V&version.";
PROC SURVEYFREQ data=have VARMETHOD=BRR missing;
      WEIGHT PUFFWGT;
      REPWEIGHTS PUFF001-PUFF002;
      BY TM_group;
      TABLES DEM_AGE*ADRD_group DEM_SEX*ADRD_group  / rschisq;
RUN;
 
/***  SECTION 2 -- Reorganize ODS output tables ***/
/*Section 2.1 -- freqs table*/
/*Keep needed variables and use macro to populate Value_Labels with proper values*/
DATA temp.freqs_t1_t; SET temp.freqs_raw_t1;
DROP WgtFreq /*Frequency*/ StdDev StdErr _SkipLine;
 
 
 /*Create Value_Labels column and populate it with labels from each variable*/
Value_Labels = "                                                                    ";
%macro val_labs (val_labs);
IF Table = "Table &val_labs. * ADRD_group" THEN Value_Labels = F_&val_labs.;
DROP F_&val_labs. &val_labs.;

%mend;
%val_labs(DEM_AGE)
%val_labs(DEM_SEX)
 

/*Reformat and combine weighted count and percent columns*/
PERCENT_RD = ROUND(Percent,.01);
Freq_percent = "                     ";
Freq_percent = COMPRESS(Frequency||"("||put(PERCENT_RD,8.2)||"%"||")");

/*IF Frequency <10 THEN Freq_percent = "<10"*/;
DROP /*Frequency*/ Percent PERCENT_RD; /*Retain Frequency because it is needed for the Total column*/
RUN;

DATA temp.chisq_2nd_t1; SET temp.chisq_raw_t1;

IF Label1 ^= "Pr > ChiSq" THEN delete; RUN;

/*REVISE THIS SECTION TO INCLUDE MISSING COLUMN*/
/*Section 2.1.1 -- Put columns in correct order*/

PROC REPORT data=temp.freqs_t1_t;

      COLUMNS table Value_Labels TM_group, ADRD_group,(Freq_percent Frequency);
        DEFINE table / group;
        DEFINE value_labels / group;
        DEFINE TM_group / across;
        DEFINE ADRD_group /across;
RUN;

/*Section 2.2 -- chisq table*/

/*Drop unneeded variables, delete results for missing cases, and only keep Rao-Scott chisq p-value rows*/
DATA temp.chisq_t1_t; SET temp.chisq_raw_t1;
DROP Name1 nValue1;
      IF TM_group = . THEN delete;
      /*IF ADRD_group = . THEN delete;*/
      IF Label1 ^= "Pr > ChiSq" THEN delete;
 RUN;

	MA_group					TM_group
	Missing	no ADRD	ADRD	Total	P-value	Missing	no ADRD	ADRD	Total	P-value
	(N=14)	(N=21836)	(N=1232)	(N=23082)	P-value	(N=17)	(N=29642)	(N=1475)	(N=31134)	P-value
Age group, n (%)					<.0001¹					<.0001¹
2:Age Group [65,75)	4 (28.6%)	8701 (39.8%)	179 (14.5%)	8884 (38.5%)		5 (29.4%)	12504 (42.2%)	215 (14.6%)	12724 (40.9%)
3:Age Group >=75	10 (71.4%)	13135 (60.2%)	1053 (85.5%)	14198 (61.5%)		12 (70.6%)	17138 (57.8%)	1260 (85.4%)	18410 (59.1%)

Gender, n (%)					<.0001¹					0.00751
1:Male	6 (42.9%)	9479 (43.4%)	453 (36.8%)	9938 (43.1%)		6 (35.3%)	13623 (46.0%)	619 (42.0%)	14248 (45.8%)
2:Female	8 (57.1%)	12357 (56.6%)	779 (63.2%)	13144 (56.9%)		11 (64.7%)	16019 (54.0%)	856 (58.0%)	16886 (54.2%)

ballardw · Posted 05-29-2024 01:01 PM

First is you have to redo this code:

PROC SURVEYFREQ data=have VARMETHOD=BRR missing;
      WEIGHT PUFFWGT;
      REPWEIGHTS PUFF001-PUFF002;
      BY TM_group;
      TABLES DEM_AGE*ADRD_group DEM_SEX*ADRD_group  / rschisq;
RUN;

From the Proc Surveyfreq documentation of the BY statement: (emphasis added)

Create an index on the BY variables by using the DATASETS procedure (in Base SAS software).

Using a BY statement provides completely separate analyses of the BY groups. It does not provide a statistically valid domain (subpopulation) analysis, where the total number of units in the subpopulation is not known with certainty. You should include the domain variable(s) in your TABLES request to obtain domain analysis. For more information, see the section Domain Analysis.

Which means that you want TM_Group in your Tables statement(s).

I usually spend a little time with Surveyfreq playing with the order of the variables on the tables statement to get groups that are "easy" to pull for reporting. Some times this means multiple tables statements.

Defining macros in the middle of other procedure code is bad practice. if you really need that %val_labs macro then place the macro code before the data step or procedure that want to use it.

Wolverine · Posted 05-29-2024 01:31 PM

@ballardw wrote:

Which means that you want TM_Group in your Tables statement(s).

Defining macros in the middle of other procedure code is bad practice. if you really need that %val_labs macro then place the macro code before the data step or procedure that want to use it.

So the Tables statement should be "TABLES DEM_AGE*TM_group*ADRD_group DEM_SEX*TM_group *ADRD_group / rschisq;" , and I should drop the BY statement altogether?

The purpose of the macro is to rearrange the freqs_raw_t1 file, which has 2 separate columns for each variable (one for values and one for value labels). What I need is to have all of the value labels in a single column. It's inside the DATA step so the output file will have that single column.

Wolverine · Posted 05-29-2024 06:42 PM

@Wolverine wrote:

@ballardw wrote:

Which means that you want TM_Group in your Tables statement(s).

Defining macros in the middle of other procedure code is bad practice. if you really need that %val_labs macro then place the macro code before the data step or procedure that want to use it.

So the Tables statement should be "TABLES DEM_AGE*TM_group*ADRD_group DEM_SEX*TM_group *ADRD_group / rschisq;" , and I should drop the BY statement altogether?

I thought about it some more and I think it is correct to have TM_group as a BY variable. The significant difference I'm looking for is whether there is a difference between the ADRD and non-ADRD groups within the MA group (which is equivalent to TM_group = 0), or between ADRD/non-ADRD within the TM group (TM_group=1). The BY groups are TM_group = 0 and TM_group = 1, and I am NOT comparing them in this significance test.

Creating summary table of Proc Surveyfreq with subgroups, totals, missing, and significance

Re: Creating summary table of Proc Surveyfreq with subgroups, totals, missing, and significance

Re: Creating summary table of Proc Surveyfreq with subgroups, totals, missing, and significance

Re: Creating summary table of Proc Surveyfreq with subgroups, totals, missing, and significance

Creating summary table of Proc Surveyfreq with subgroups, totals, missing, and significance

Re: Creating summary table of Proc Surveyfreq with subgroups, totals, missing, and significance

Re: Creating summary table of Proc Surveyfreq with subgroups, totals, missing, and significance

Re: Creating summary table of Proc Surveyfreq with subgroups, totals, missing, and significance

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!