BookmarkSubscribeRSS Feed
jopalmoe
Calcite | Level 5

TLDR: I want to conduct a balancing test / F-statistic to see if 6 control variables (foed_kom_num, foed_mmyy, spacing_mon, mor_alder, far_alder, and andel_mor_foed)  can predict the variable of interest ('brother'). THANK YOU!

 

Dear everyone, this is my first post in this community—I hope someone has the time to help. I use SAS 9.4 (English with DBCS).

 

I'm working on my thesis and replicating parts of the article "Brothers Increase Women’s Gender Conformity" by Anne Brenøe (2021). I'm using slightly different data and variables. In Table 1, Panel B, she conducts a balancing test/joint F-statistic to test whether all seven control variables in her OLS model can predict the variable of interest. Her model (simplified) is: y = alpha_0 + alpha_1 * brother + X * delta + error term, where 'brother' is the variable of interest, and X is a vector of the seven control variables. The outcome for the balancing test is reported as:

joint F-statistic:          0.92
prob > F:                        0.92

Please note that she provides only one result (0.92).

 

My model follows the same structure: y = alpha_0 + alpha_1 * brother + X * delta + error term, where 'brother' is again the variable of interest. However, my vector X contains only six control variables: foed_kom_num, foed_mmyy, spacing_mon, mor_alder, far_alder, and andel_mor_foed (written in Danish so I can follow your answers—I hope this makes sense even though it’s not in English.)

 

Variables formats:
brother: 1 if boy, 2 if girl
foed_kom_num: numeric best12.
foed_mmyy: numeric best12.
spacing_mon: numeric best12.
mor_alder: numeric best12.
far_alder: numeric best12.
andel_mor_foed: numeric best12.

 

I've spent a long time searching and using ChatGPT, but I can't seem to get just one joint result. The closest I've come is with the following code. (Note: I would like to use an OLS-based approach, as my supervisor is not fond of logistic approaches. I also believe Brenøe uses an OLS-based test.).

 

Hope someone can help. Thanks a lot in advance. Best, Jo

 

/* Step 1: Run the Full Model */
proc glm data=your_data outstat=full_stats noprint;
model brother= foed_kom foed_mmyy spacing_mon mor_alder far_alder andel_mor_foed;
run;

/* Step 2: Run the Reduced Model (Intercept Only) */
proc glm data=your_data outstat=reduced_stats noprint;
model brother = ;
run;

/* Step 3: Extract SSE and Compute F-Statistic */
data f_test;
merge full_stats (where=(_TYPE_='ERROR') rename=(SS=SSE_full DF=DF_full))
reduced_stats (where=(_TYPE_='ERROR') rename=(SS=SSE_reduced DF=DF_reduced));

p = 6; /* Number of predictors */
n = DF_reduced + 1; /* Total observations */
df_num = p;
df_den = DF_full;

/* Compute F-statistic */
F_stat = ((SSE_reduced - SSE_full) / df_num) / (SSE_full / df_den);

/* Compute p-value */
p_value = 1 - probf(F_stat, df_num, df_den);
run;

/* Step 4: Print the F-statistic and p-value */
proc print data=f_test;
var F_stat p_value;
run;

 

3 REPLIES 3
sbxkoenk
SAS Super FREQ

Hello ,

 

What's wrong with the F-test which is provided by default by the GLM procedure?

It's a joint test to see whether the model has any value.

 

If you want additional F-tests , you can use the TEST statement in PROC GLM.
Here's the doc:

The GLM Procedure
TEST Statement
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_glm_syntax24.htm

 

BR, Koen

JackieJ_SAS
SAS Employee

To add to the comments of @sbxkoenk, my understanding of your problem is that the default F test provided by PROC GLM is computing the same test computed by your code. For example, here is the output from one of the examples in the PROC GLM documentation:

 

JackieJ_SAS_0-1743087904735.png

The test you want is in the columns labeled "F Value" and "Pr>F" in the upper right of the first table.

Ksharp
Super User
As other said why not use the default F-test of PROC GLM ?
Although it was almost likely useless due to always yield p-value <0.05 .
If you want to compare full-model and sub-model, I would recommend to use likelihood ratio test ,Check @Rick_SAS blog:
https://blogs.sas.com/content/iml/2024/03/27/likelihood-ratio-test.html

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 602 views
  • 12 likes
  • 4 in conversation