Hello everyone,
I wanted to analyze the data and find the correlation between them. I have categorical/ continuous variables and numeric variables. Which test is accurate and what output object is more precise and best? I have used proc glm here. Is is correct? Which sas procedure is apt?
I want to find out the below:
Characteristics vs SVI Outcomes
data test;
input @1 Age 1-2 @3 RPL_Themes_WI 3-10 @11 race_ethnicity $ 11-44;
datalines;
89 0.1453 Asian
35 0.0619 White or Caucasian
69 0.0158 White or Caucasian
27 0.5367 Hispanic
46 0.9835 Black or African American
24 0.9353 White or Caucasian
50 0.041 Hispanic
34 0.5151 White or Caucasian
63 0.7683 White or Caucasian
32 0.0669 White or Caucasian
47 0.5324 White or Caucasian
75 0.1511 White or Caucasian
33 0.7475 White or Caucasian
85 Black or African American
39 White or Caucasian
57 0.0468 White or Caucasian
36 White or Caucasian
56 0.0813 White or Caucasian
44 0.5144 White or Caucasian
69 0.3619 White or Caucasian
56 0.0331 White or Caucasian
50 0.3381 White or Caucasian
54 0.7252 Hispanic
49 0.2489 White or Caucasian
51 0.6194 White or Caucasian
66 0.6784 White or Caucasian
46 0.6288 White or Caucasian
65 0.5554 White or Caucasian
54 0.4094 White or Caucasian
40 0.9007 American Indian or Alaska Native
35 0.9137 Black or African American
42 0.7971 Black or African American
63 0.918 Hispanic
58 0.9576 White or Caucasian
33 0.946 White or Caucasian
50 0.7626 White or Caucasian
75 0.8 White or Caucasian
60 0.7504 White or Caucasian
44 0.918 White or Caucasian
57 0.5705 White or Caucasian
61 0 White or Caucasian
54 0.677 White or Caucasian
74 0.4101 White or Caucasian
63 0.3799 White or Caucasian
65 0.4755 White or Caucasian
71 0.6986 Black or African American
68 Asian
33 0.2518 White or Caucasian
60 White or Caucasian
54 0.4612 White or Caucasian
;
run;
proc glm data = test;
class Race_Ethnicity;
model RPL_themes_WI = Race_Ethnicity;
lsmeans Race_Ethnicity / cl pdiff adjust= tukey;
run;
Thank you!
@RAVI2000 wrote:
Hello everyone,
I wanted to analyze the data and find the correlation between them. I have categorical/ continuous variables and numeric variables. Which test is accurate and what output object is more precise and best? I have used proc glm here. Is is correct? Which sas procedure is apt?
I want to find out the below:
Characteristics vs SVI Outcomes
- Race_Ethnicity correlated with RPL_themes
- Age vs RPL_themes
Certainly, PROC GLM is an appropriate analysis of this data in some situations, depending on what you want to do. You asked for correlations, PROC GLM does not find "correlations" in the true statistical meaning of "correlations". GLM does find predictive relationships, if they exist.
Can you explain more about what you are trying to do and why you want "correlations"?
@RAVI2000 wrote:
I would like to see how race_ethnicity is effecting the RPL_Themes
If you stop here, then GLM is correct. But then you add
... and how age is effecting the RPL_Themes.
and so now maybe there's some confusion over words. Correlation does not imply causation, and so you could compute some sort of correlation between these two variables except its not clear again how you are meaning the word "effecting" which maybe should say "affecting" but even there, its not clear what you mean.
If you want to quantify relationships between the two variables, that's possible, but you haven't said that's what you want. If you want to find predictive relationships, then we're back to GLM and that is really only useful in the case where one variable predicts the other; and not in the case where you seem to be asking for where variable 1 predicts variable 2 and also at the same time variable 2 predicts variable 1.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.