BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
saza
Quartz | Level 8

Was told to use dataset "birthweight" (Attached to question) to determine the normality of each of the following variables:

Mothers height:

proc reg data= birthw.birthweight;
model birthweight = mppwt;
run;

Baby’s Length:

proc reg data= birthw.birthweight;
model birthweight =length;
run;

Mothers number of Cigarettes:

proc reg data= birthw.birthweight;
model birthweight =mnocig;
run;

Baby Weight: not sure how to do this

Gestation: 

proc reg data= birthw.birthweight;
model birthweight =gestation;
run;

My professor stated that we can use any method to determine this, even histograms, however I am uncertain on how to do so and was wondering if the codes that I have are correct way of determining normality.



 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
The original assumption of using PROC REG was flawed, using PROC UNIVARIATE is correct.
You assess normality of a variable on it's own, not in relation to other variables, typically.

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

I'm not sure what your professor said, or if you understood the professor properly. Normality of the response variable is NOT required for a regression, and isn't something you would usually test. https://blogs.sas.com/content/iml/2018/08/27/on-the-assumptions-and-misconceptions-of-linear-regress...

 

However, normality of the residuals is something you would want to test, and is a requirement of the statistical testing performed by the regression. (It is not a requirement to actually fit the least-squares regression line).

 

To test the normality of the residuals, you have to have PROC REG output the residuals, and these can then be tested in a number of ways, including PROC UNIVARIATE and also Q-Q plots. You can also just look at a histogram to get some idea if the residuals are normally distributed.

--
Paige Miller
saza
Quartz | Level 8
Oh wow. I'm basically the person in the example. I misinterpreted however, I think the codes I typed could be helpful in the questions after this one.

to prove normality I could have just used
proc univariate data= birthw.birthweight;
var mppwt;
run;

could I simple replace the variable with the one being asked for in each code?
PaigeMiller
Diamond | Level 26

No testing of normality of the original variables used in PROC REG is required or necessary.

--
Paige Miller
saza
Quartz | Level 8
well this was the original question

Using SAS examine the normality of the variables for mother’s height, baby’s length, mother’s number of cigarettes, baby weight, gestation.
Are any of the variables non-normally distributed? How so?

and the codes I provided earlier were the ones I was planning on using but wasn't sure if they were correct
Reeza
Super User
The original assumption of using PROC REG was flawed, using PROC UNIVARIATE is correct.
You assess normality of a variable on it's own, not in relation to other variables, typically.

saza
Quartz | Level 8
I think that is what my professor is asking me. The code worked but am unsure on what to use to prove normality. What values should I be looking at
Reeza
Super User

Here's one way

https://www.stat.purdue.edu/~tqin/system101/method/QQplot_sas.htm

 

There's more than one way to do this though, so this may not align with what your professor is expecting. 

You will want to review your course notes.

 


@saza wrote:
I think that is what my professor is asking me. The code worked but am unsure on what to use to prove normality. What values should I be looking at

 

PaigeMiller
Diamond | Level 26

@saza wrote:
well this was the original question

Using SAS examine the normality of the variables for mother’s height, baby’s length, mother’s number of cigarettes, baby weight, gestation.
Are any of the variables non-normally distributed? How so?

and the codes I provided earlier were the ones I was planning on using but wasn't sure if they were correct

And I'm telling you that you don't have to assess the variables for normality to use them in a regression.

--
Paige Miller
SteveDenham
Jade | Level 19

To follow up on @PaigeMiller ,consider this as an example: Variable:  Height of adult humans.  A quick plot will show that this is bimodal (clustered on genetic males and genetic females) thus not normal.  However if you look at the residuals after fitting a model such as this:

 

model height = sex;

and outputting the residuals, you can test for normality or you can visually evaluate normality using QQ plots and histograms of the residuals, using PROC UNIVARIATE.  Take a look at this really good web page regarding normality testing:

 

https://towardsdatascience.com/stop-testing-for-normality-dba96bb73f90 

 

and you will see that it is really a futile effort.

 

SteveDenham

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 956 views
  • 12 likes
  • 4 in conversation