I was given a cleaned data set and am trying to run a multivariate analysis on a dataset, the individual I am running it for is not sure how to run it, and I have never conducted a multivariate analysis like this before. They have asked that I dodo my best not to recode or clean the data and it is causing me some problems.
I have a dichotomous, binary outcome variable (y/n; coded as 1/2) and 8 predictors. The predictors are:
Ultimately i am trying to see if any of the predictors have associations with the outcome, if there are any interactions, and if there are any relationships/patterns within a location (for example, are certain locations associated with higher rates of the systemic diseases and if so, is there an effect on the outcome).
We have someone to write the code, but we need to provide a list of tests that should be done and I am no statistics wizard so i would love any suggestions on tests our coder should look at.
Thank you!
Logistic regression is the usual starting point for this type of analysis. Not sure how recoding causes any issues, SAS will handle the values correctly if they're described as below.
Logistic regression is pretty straightforward and there are a lot of references there. Start with the second example in the documentation and then the UCLA tutorials (found via Google) that are useful if you're looking to code it yourself.
When you get a variable like county you may want to look at how many records you have per county if you haven't already.
I know that in my state almost any report drops/excludes certain counties because of low population. For relatively common occurrences, such as traffic fatalities, these counties may have none for a given period of time.
Combining such a variable with others such as sex, age, race/ethnicity, multiple diseases and insurance you may very well not have many of the combinations occur.
From your data description if you do not have at least 13,824 records per county you won't have all your combinations included. (8x2x8x3x3x3x4)
Sample size can raise problems with coding schemes and may require some reconsideration.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.