- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Good morning everyone! I would like to ask you, I have a sample of 200 people. This sample contains several categorical variables and 3 continuous variables. I would like to see if there is a statistically significant difference in my continuous variables by categorical variable. The problem is that none of my three continuous variables follow the normal distribution. What can I do?
P.S Some of my categorical variables have more than 6 levels.
Thank you !
Have a nice day!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use PROC GLM. There is no such requirement that the raw data has a normal distribution.
Example:
proc glm data=have;
class class_variable_name;
model continuousvar1 continuousvar2 continuousvar3=class_variable_name;
run;
The F-test is the test which compares the means overall for each class_variable_name. You can add in the MEANS statement in PROC GLM to get more information.
means class_variable_name/t lines;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you are concerned about normality, you can use a nonparametric test like the Kruskal-Wallis test in PROC NPAR1WAY. And since you'll be doing multiple tests - one test for each combination of continuous response variable and categorical variable - you might want to consider using a p-value adjustment method to control the overall, family error rate. This can be done by using an adjustment method, like Holm's stepdown Bonferroni method, on the multiple p-values in PROC MULTTEST.
An efficient way to do all of this is to rearrange your data from 200 observations into 200*k observations, where k is the number of your categorical variables. Each block of 200 will have your three continuous variables and a variable containing one of your categorical variables. You can then use BY processing in a single call of PROC NPAR1WAY to do all of the analyses. Use the OUTPUT statement to save the Kruskal-Wallis test results in a data set and then use PROC MULTTEST on that data set to do the p-value adjustment. All of that is illustrated below. The ODS statements are used to avoid displaying the results from the multiple NPAR1WAY runs.
data a; set <your-data>;
array x (*) <list-of-your-categorical-variables>;
do byvar=1 to dim(x);
c=x(byvar); output;
end;
run;
proc sort; by byvar; run;
ods exclude all;
proc npar1way wilcoxon;
by byvar;
class c; var <your-continuous-variables>;
output out=out wilcoxon;
run;
ods select all;
proc multtest inpvalues(P_KW)=out holm;
run;