topic Re: Compare means in Statistical Procedures

Compare means

Barney1998 — Sun, 24 Mar 2024 08:43:42 GMT

Good morning everyone! I would like to ask you, I have a sample of 200 people. This sample contains several categorical variables and 3 continuous variables. I would like to see if there is a statistically significant difference in my continuous variables by categorical variable. The problem is that none of my three continuous variables follow the normal distribution. What can I do?

P.S Some of my categorical variables have more than 6 levels.

Thank you !
Have a nice day!

Re: Compare means

PaigeMiller — Sun, 24 Mar 2024 10:36:49 GMT

Use PROC GLM. There is no such requirement that the raw data has a normal distribution.

https://blogs.sas.com/content/iml/2018/08/27/on-the-assumptions-and-misconceptions-of-linear-regression.html

Example:

proc glm data=have;
     class class_variable_name;
     model continuousvar1 continuousvar2 continuousvar3=class_variable_name;
run;

The F-test is the test which compares the means overall for each class_variable_name. You can add in the MEANS statement in PROC GLM to get more information.

means class_variable_name/t lines;

Re: Compare means

StatDave — Sun, 24 Mar 2024 19:39:55 GMT

If you are concerned about normality, you can use a nonparametric test like the Kruskal-Wallis test in PROC NPAR1WAY. And since you'll be doing multiple tests - one test for each combination of continuous response variable and categorical variable - you might want to consider using a p-value adjustment method to control the overall, family error rate. This can be done by using an adjustment method, like Holm's stepdown Bonferroni method, on the multiple p-values in PROC MULTTEST.

An efficient way to do all of this is to rearrange your data from 200 observations into 200*k observations, where k is the number of your categorical variables. Each block of 200 will have your three continuous variables and a variable containing one of your categorical variables. You can then use BY processing in a single call of PROC NPAR1WAY to do all of the analyses. Use the OUTPUT statement to save the Kruskal-Wallis test results in a data set and then use PROC MULTTEST on that data set to do the p-value adjustment. All of that is illustrated below. The ODS statements are used to avoid displaying the results from the multiple NPAR1WAY runs.

data a; set <your-data>;
array x (*) <list-of-your-categorical-variables>;
do byvar=1 to dim(x);
 c=x(byvar); output;
end;
run;
proc sort; by byvar; run;
ods exclude all;
proc npar1way wilcoxon;
by byvar;
class c; var <your-continuous-variables>;
output out=out wilcoxon;
run;
ods select all;
proc multtest inpvalues(P_KW)=out holm;
run;

Re: Compare means

whymath — Mon, 25 Mar 2024 02:06:36 GMT

Paige, I think Rick Wicklin's post is about simple linear regression, not ANOVA.