BookmarkSubscribeRSS Feed
Demographer
Pyrite | Level 9

Hi,

The variable lab_pred is a binary variable indicating whether or not the person works (1=work).

 

Edu, sex, and agegr are categorical variables for the education (4 categories: 0,1,2,3), sex (two categories: 0,01) and age groups (from 15 to 70 by 5 years). For each combination of sex*edu*agegr, I want to calculate the proportion that works and the lower and upper bounds.

 

To produce this outcomes, I use the following code, but it produces a different table for each sub-category of sex*edu*agegr.

 

 

proc sort data=work.predict2; by sex edu agegr; run;
proc freq data=work.predict2;
table lab_pred /nocol nofreq nopercent binomial (CL=Wald(correct));
by sex edu agegr;
run;

How can I generate an output that will look like:

 

Sex Agegr Edu % lab_pred=1 Lower CI Upper CI
0 15 0      
0 15 1      
0 15 2      
0 15 3      
     
1 70 3      

 

 

 

 

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hi @Demographer,

 

You can use the OUTPUT statement to obtain the desired output dataset, possibly with a KEEP= dataset option to restrict it to the statistics of interest (e.g., ... out=want(keep=sex edu agegr _bin_ l_bin u_bin)).

proc freq data=work.predict2 noprint;
table lab_pred /nocol nofreq nopercent binomial (level='1' CL=Wald(correct));
by sex edu agegr;
output out=want binomial;
run;

The NOPRINT option of the PROC FREQ statement is optional, just to suppress the printed output.

 

Edit: The LEVEL='1' option is important if the other level of LAB_PRED is 0 because in this case 0 (the first level) would be the default in the calculation of statistics.

StatDave
SAS Super FREQ

In order to use all of the data in the computation of each of the proportions, a modeling approach is needed. PROC FREQ with a BY statement will only use the data in the one BY group when computing the proportion. The following fits a logistic model to predict the probability of work in each combination. The possible downside to this approach is convergence problems if the data are sparse, particularly if there is a zero count in any of the cells of the four-way table using all of the variables. If not, then the LSMEANS statement with the ILINK option gives the proportions and the CL option gives the confidence limits.

proc logistic;
class edu sex agegr / param=glm;
model lab_pred(event="1")=edu|sex|agegr;
lsmeans edu*sex*agegr / ilink cl;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 572 views
  • 4 likes
  • 3 in conversation