Re: Lower and upper bounds for proportions

Demographer · Posted 07-02-2021 02:37 AM

Hi,

The variable lab_pred is a binary variable indicating whether or not the person works (1=work).

Edu, sex, and agegr are categorical variables for the education (4 categories: 0,1,2,3), sex (two categories: 0,01) and age groups (from 15 to 70 by 5 years). For each combination of sex*edu*agegr, I want to calculate the proportion that works and the lower and upper bounds.

To produce this outcomes, I use the following code, but it produces a different table for each sub-category of sex*edu*agegr.

proc sort data=work.predict2; by sex edu agegr; run;
proc freq data=work.predict2;
table lab_pred /nocol nofreq nopercent binomial (CL=Wald(correct));
by sex edu agegr;
run;

How can I generate an output that will look like:

Sex	Agegr	Edu	% lab_pred=1	Lower CI	Upper CI
0	15	0
0	15	1
0	15	2
0	15	3
…	…	…
1	70	3

FreelanceReinh · Posted 07-02-2021 03:29 AM

Hi @Demographer,

You can use the OUTPUT statement to obtain the desired output dataset, possibly with a KEEP= dataset option to restrict it to the statistics of interest (e.g., ... out=want(keep=sex edu agegr _bin_ l_bin u_bin)).

proc freq data=work.predict2 noprint;
table lab_pred /nocol nofreq nopercent binomial (level='1' CL=Wald(correct));
by sex edu agegr;
output out=want binomial;
run;

The NOPRINT option of the PROC FREQ statement is optional, just to suppress the printed output.

Edit: The LEVEL='1' option is important if the other level of LAB_PRED is 0 because in this case 0 (the first level) would be the default in the calculation of statistics.

StatDave · Posted 07-02-2021 10:41 AM

In order to use all of the data in the computation of each of the proportions, a modeling approach is needed. PROC FREQ with a BY statement will only use the data in the one BY group when computing the proportion. The following fits a logistic model to predict the probability of work in each combination. The possible downside to this approach is convergence problems if the data are sparse, particularly if there is a zero count in any of the cells of the four-way table using all of the variables. If not, then the LSMEANS statement with the ILINK option gives the proportions and the CL option gives the confidence limits.

proc logistic;
class edu sex agegr / param=glm;
model lab_pred(event="1")=edu|sex|agegr;
lsmeans edu*sex*agegr / ilink cl;
run;

Lower and upper bounds for proportions

Re: Lower and upper bounds for proportions

Re: Lower and upper bounds for proportions

Ready to join fellow brilliant minds for the SAS Hackathon?