BookmarkSubscribeRSS Feed
Barney1998
Obsidian | Level 7

Good morning everyone! I would like to ask you, I have a sample of 200 people. This sample contains several categorical variables and 3 continuous variables. I would like to see if there is a statistically significant difference in my continuous variables by categorical variable. The problem is that none of my three continuous variables follow the normal distribution. What can I do?

P.S Some of my categorical variables have more than 6 levels.

Thank you !
Have a nice day!

3 REPLIES 3
PaigeMiller
Diamond | Level 26

Use PROC GLM. There is no such requirement that the raw data has a normal distribution.

https://blogs.sas.com/content/iml/2018/08/27/on-the-assumptions-and-misconceptions-of-linear-regress...

 

Example:

 

proc glm data=have;
     class class_variable_name;
     model continuousvar1 continuousvar2 continuousvar3=class_variable_name;
run;

 

 

The F-test is the test which compares the means overall for each class_variable_name. You can add in the MEANS statement in PROC GLM to get more information.

 

means class_variable_name/t lines;

 

--
Paige Miller
whymath
Lapis Lazuli | Level 10
Paige, I think Rick Wicklin's post is about simple linear regression, not ANOVA.
StatDave
SAS Super FREQ

If you are concerned about normality, you can use a nonparametric test like the Kruskal-Wallis test in PROC NPAR1WAY. And since you'll be doing multiple tests - one test for each combination of continuous response variable and categorical variable -  you might want to consider using a p-value adjustment method to control the overall, family error rate. This can be done by using an adjustment method, like Holm's stepdown Bonferroni method, on the multiple p-values in PROC MULTTEST.

 

An efficient way to do all of this is to rearrange your data from 200 observations into 200*k observations, where k is the number of your categorical variables. Each block of 200 will have your three continuous variables and a variable containing one of your categorical variables. You can then use BY processing in a single call of PROC NPAR1WAY to do all of the analyses. Use the OUTPUT statement to save the Kruskal-Wallis test results in a data set and then use PROC MULTTEST on that data set to do the p-value adjustment. All of that is illustrated below. The ODS statements are used to avoid displaying the results from the multiple NPAR1WAY runs. 

data a; set <your-data>;
array x (*) <list-of-your-categorical-variables>;
do byvar=1 to dim(x);
 c=x(byvar); output;
end;
run;
proc sort; by byvar; run;
ods exclude all;
proc npar1way wilcoxon;
by byvar;
class c; var <your-continuous-variables>;
output out=out wilcoxon;
run;
ods select all;
proc multtest inpvalues(P_KW)=out holm;
run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 351 views
  • 4 likes
  • 4 in conversation