Was told to use dataset "birthweight" (Attached to question) to determine the normality of each of the following variables:
Mothers height:
proc reg data= birthw.birthweight;
model birthweight = mppwt;
run;
Baby’s Length:
proc reg data= birthw.birthweight;
model birthweight =length;
run;
Mothers number of Cigarettes:
proc reg data= birthw.birthweight;
model birthweight =mnocig;
run;
Baby Weight: not sure how to do this
Gestation:
proc reg data= birthw.birthweight;
model birthweight =gestation;
run;
My professor stated that we can use any method to determine this, even histograms, however I am uncertain on how to do so and was wondering if the codes that I have are correct way of determining normality.
I'm not sure what your professor said, or if you understood the professor properly. Normality of the response variable is NOT required for a regression, and isn't something you would usually test. https://blogs.sas.com/content/iml/2018/08/27/on-the-assumptions-and-misconceptions-of-linear-regress...
However, normality of the residuals is something you would want to test, and is a requirement of the statistical testing performed by the regression. (It is not a requirement to actually fit the least-squares regression line).
To test the normality of the residuals, you have to have PROC REG output the residuals, and these can then be tested in a number of ways, including PROC UNIVARIATE and also Q-Q plots. You can also just look at a histogram to get some idea if the residuals are normally distributed.
No testing of normality of the original variables used in PROC REG is required or necessary.
Here's one way
https://www.stat.purdue.edu/~tqin/system101/method/QQplot_sas.htm
There's more than one way to do this though, so this may not align with what your professor is expecting.
You will want to review your course notes.
@saza wrote:
I think that is what my professor is asking me. The code worked but am unsure on what to use to prove normality. What values should I be looking at
@saza wrote:
well this was the original question
Using SAS examine the normality of the variables for mother’s height, baby’s length, mother’s number of cigarettes, baby weight, gestation.
Are any of the variables non-normally distributed? How so?
and the codes I provided earlier were the ones I was planning on using but wasn't sure if they were correct
And I'm telling you that you don't have to assess the variables for normality to use them in a regression.
To follow up on @PaigeMiller ,consider this as an example: Variable: Height of adult humans. A quick plot will show that this is bimodal (clustered on genetic males and genetic females) thus not normal. However if you look at the residuals after fitting a model such as this:
model height = sex;
and outputting the residuals, you can test for normality or you can visually evaluate normality using QQ plots and histograms of the residuals, using PROC UNIVARIATE. Take a look at this really good web page regarding normality testing:
https://towardsdatascience.com/stop-testing-for-normality-dba96bb73f90
and you will see that it is really a futile effort.
SteveDenham
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.