- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello!
I have a project at work where I'm being asked to analyze 7 variables with (5 of them being categorical(Yes/No) and 2 being numerical)
and their correlation with disease status (Variable ADV_HF): 1= they have the disease / 0 or blank = they don't have it.
I have not used multivariate analysis before and the different types are a little overwhelming.
Based on the SAS forums, I am under the impression that I shouldn't use Proc Reg since I have categorical variables, so should I use Proc GLM or Proc Corr? Will it make a huge difference?
What I've got so far is:
proc glm data= Hetal.ES_Regression;
Class Adv_HF;
model age_diag fam_Hx_ES fam_Hx_SD hx_sync LBBB EF_Reg AF_prior ;
1.I'm not sure how or if to use the contrast statement and manova statement.
2. How do I specify in the class statement that 1= disease state and 0- without disease?
3. Am I missing anything other key data step in this analysis?
Thank you!
- Tags:
- multivariate
- proc
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a project at work where I'm being asked to analyze 7 variables with (5 of them being categorical(Yes/No) and 2 being numerical) and their correlation with disease status (Variable ADV_HF): 1= they have the disease / 0 or blank = they don't have it.
If the specific request you have is simply to analyze correlations, then you would use PROC CORR.
If the underlying reason is to fit a model, you should use PROC LOGISTIC (which is appropriate when your response variable is binary).
This is not (at least the way I use the word) a "multivariate" analysis, and no MANOVA would work here anyway. Multivariate would imply to me that you have multiple response variables, and if the multiple response variables are continuous, that is the only time when any MANOVA would work. So none of this applies to your situation, as I understand it.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @PaigeMiller , thanks for responding.
I decided to use Proc Corr with the following code:
and it keeps giving me this error in my log:
Why does it keep telling me my variables do not match the type prescribed for this list? What am i doing wrong here
Do i need to denote which are categorical? Also do i need to use the "BY" statement to classify that I want these variables compared with Those who have the disease (1) vs those who dont (0)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
https://stats.idre.ucla.edu/sas/dae/logit-regression/
Note the use of the PARAM=REF option on the CLASS statement. You will want to do that. Additionally, check this example out:
https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_logistic_examples02.htm&docsetVer...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for replying @Reeza!
I don't think Proc logistic works in this case because we're not looking for a specific question/outcome. Merely seeing if there is a correlation between the variables whether they have or don't have the disease.
My variables are potential risk factors and we want want to see if there's any correlation between these and disease status.
So I think I'd use proc Corr, yes?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Nope, you have a binary outcome variable so using PROC CORR is not suitable here. You really are looking for logistic regression here and the odds ratios, whether you do it one variable at a time or a full model.
@hpatel3 wrote:
Thanks for replying @Reeza!
I don't think Proc logistic works in this case because we're not looking for a specific question/outcome. Merely seeing if there is a correlation between the variables whether they have or don't have the disease.
My variables are potential risk factors and we want want to see if there's any correlation between these and disease status.
So I think I'd use proc Corr, yes?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It worked! I got results! (I Think)
My only question left is about status. Does multivariate analysis take into account that for developing the disease (event status=1) vs not developing the disease (event status=0) need to be specified anywhere? Or does SAS automatically assume 0=no event and 1=event?? My code and results are as followed:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
For a univariate analysis with a single binary outcome, I would recommend ANOVA\T-Test for the continuous variables and chi square for categorical variables.
For your output, you have Pearson correlation coefficients and SAS makes no assumptions regarding the value of 0/1 being a particular record type. You probably want Tau-b or Tau-c instead.
https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PROC CORR can produce a correlation-like number called Kendall's Tau for categorical variables. However, they probably do need to be converted to category numbers (a numeric variable) in order for PROC CORR to process them.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PaigeMiller , I converted to categorical variables and I do see Kendall's tau, but I figured pearson's would be the number to look at. Why do you suggest Tau instead?
Should I be using pearson's on the continuous variables and Tau on the categorical ones?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, as @Reeza pointed out, and I omitted, you want Kendall's Tau-B (or maybe it's a lowercase b) which is appropriate for computing a correlation-like measure when you have two ordinal variables. You use Pearson when you have two continuous variables.
I still have issues with your requirement that you want a measure like correlation when you have predictor (X) variables and response (Y) variables, correlation is not meant for that case; some measure of how well the X predict the Y is the appropriate statistic.
I can't agree with this statement from Reeza
For a univariate analysis with a single binary outcome, I would recommend ANOVA\T-Test for the continuous variables
This isn't correlation, which seems to be what you are asking for, although I don't understand why; and it also seems to reverse the role of X and Y. You don't do ANOVA or t-tests with binary Y, you do it for binary X. For continuous X variables, and binary Y, logistic regression is still what I would use, and the measure you want is the odds ratio or the slope of the logistic regression.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As I stated, you seem to have reversed X and Y. Logistic regression is what you use with binary Y.
But, the whole issue remains unclear as to what the original poster really wants to achieve, and so I think until he clarifies the situation, I'm going to pause here. On the one hand he wants correlation but on the other hand he was talking about fitting a model with PROC REG or PROC GLM.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
After going back and reviewing, I don't think Proc reg or corr would give me the best results. I stated originally that I want correlation between variables and I still do. I thought Proc Corr/GLM/REG would give me that. That's why I was trying to use them. But someone on another thread stated that Proc Logistic would work for both continuous and categorical variables, so if I can just use that, that would work I believe? I don't need to fit the model or anything I believe.
So TO SEE IF THERE IS A CORRELATION between my risk factors and whether they develop the disease, I tried this:
libname Hetal "\\tuftsmc\home\hpatel3\SAS Datasets";
run;
proc logistic data=hetal.es_regre;
class age_diag EF_reg Hx_Sync FHX_ES FHX_SD LBBB AF_prior * Hx_Sync FHX_ES FHX_SD LBBB AF_prior;
model adv_hf (event="1")= age_diag EF_REG Hx_sync FHX_ES FHX_SD LBBB AF_prior;
run;
However got this error:
I'm not sure exactly what should be done based on this error. Is asking me to clarify numeric or character variables??? I'm not sure how to write this out in my code.
My class statement is: all variables * categorical variables <-- That is what I was supposed to do, correct?
And my model statement is: disease status variable = all variables <--is this correct as well?
Thanks!