TLDR: I want to conduct a balancing test / F-statistic to see if 6 control variables (foed_kom_num, foed_mmyy, spacing_mon, mor_alder, far_alder, and andel_mor_foed) can predict the variable of interest ('brother'). THANK YOU!
Dear everyone, this is my first post in this community—I hope someone has the time to help. I use SAS 9.4 (English with DBCS).
I'm working on my thesis and replicating parts of the article "Brothers Increase Women’s Gender Conformity" by Anne Brenøe (2021). I'm using slightly different data and variables. In Table 1, Panel B, she conducts a balancing test/joint F-statistic to test whether all seven control variables in her OLS model can predict the variable of interest. Her model (simplified) is: y = alpha_0 + alpha_1 * brother + X * delta + error term, where 'brother' is the variable of interest, and X is a vector of the seven control variables. The outcome for the balancing test is reported as:
joint F-statistic: 0.92
prob > F: 0.92
Please note that she provides only one result (0.92).
My model follows the same structure: y = alpha_0 + alpha_1 * brother + X * delta + error term, where 'brother' is again the variable of interest. However, my vector X contains only six control variables: foed_kom_num, foed_mmyy, spacing_mon, mor_alder, far_alder, and andel_mor_foed (written in Danish so I can follow your answers—I hope this makes sense even though it’s not in English.)
Variables formats:
brother: 1 if boy, 2 if girl
foed_kom_num: numeric best12.
foed_mmyy: numeric best12.
spacing_mon: numeric best12.
mor_alder: numeric best12.
far_alder: numeric best12.
andel_mor_foed: numeric best12.
I've spent a long time searching and using ChatGPT, but I can't seem to get just one joint result. The closest I've come is with the following code. (Note: I would like to use an OLS-based approach, as my supervisor is not fond of logistic approaches. I also believe Brenøe uses an OLS-based test.).
Hope someone can help. Thanks a lot in advance. Best, Jo
/* Step 1: Run the Full Model */
proc glm data=your_data outstat=full_stats noprint;
model brother= foed_kom foed_mmyy spacing_mon mor_alder far_alder andel_mor_foed;
run;
/* Step 2: Run the Reduced Model (Intercept Only) */
proc glm data=your_data outstat=reduced_stats noprint;
model brother = ;
run;
/* Step 3: Extract SSE and Compute F-Statistic */
data f_test;
merge full_stats (where=(_TYPE_='ERROR') rename=(SS=SSE_full DF=DF_full))
reduced_stats (where=(_TYPE_='ERROR') rename=(SS=SSE_reduced DF=DF_reduced));
p = 6; /* Number of predictors */
n = DF_reduced + 1; /* Total observations */
df_num = p;
df_den = DF_full;
/* Compute F-statistic */
F_stat = ((SSE_reduced - SSE_full) / df_num) / (SSE_full / df_den);
/* Compute p-value */
p_value = 1 - probf(F_stat, df_num, df_den);
run;
/* Step 4: Print the F-statistic and p-value */
proc print data=f_test;
var F_stat p_value;
run;
Hello ,
What's wrong with the F-test which is provided by default by the GLM procedure?
It's a joint test to see whether the model has any value.
If you want additional F-tests , you can use the TEST statement in PROC GLM.
Here's the doc:
The GLM Procedure
TEST Statement
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_glm_syntax24.htm
BR, Koen
To add to the comments of @sbxkoenk, my understanding of your problem is that the default F test provided by PROC GLM is computing the same test computed by your code. For example, here is the output from one of the examples in the PROC GLM documentation:
The test you want is in the columns labeled "F Value" and "Pr>F" in the upper right of the first table.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.