I am comparing the 1 year survival for a group of patients (yes/no). I have a three continuous variables (age, karnofsky scale, and comobidities) and I have three categorical variables (HLA match (two cats), related (two cats), disease risk (three cats)). I'm trying to pick the best test to use but can't quite nail it down. Can I use these variables as is? Or would I be better of converting the continuous variables into categorical? Either way which would be the best test and pariwise comparison?
Thanks
I don't think you will get there with a simple hypothesis test.
You need a statistical model.
A model that explains/predicts your binary target.
You could consider logistic regression, for example.
Koen
I don't think you will get there with a simple hypothesis test.
You need a statistical model.
A model that explains/predicts your binary target.
You could consider logistic regression, for example.
Koen
> I am comparing the 1 year survival for a group of patients
You do not mention having data for the time at which patients died, so I assume you are modeling the probability of survival at the end of the year, conditional on the covariates in the model. I think a logistic regression is feasible, but you need to ask yourself which group of patients you are trying to compare. Look at the EFFECTPLOT statement in PROC LOGISTIC or PROC PLM, as discussed in this article: https://blogs.sas.com/content/iml/2016/06/22/sas-effectplot-statement.html
Depending on how you specify the EFFECTPLOT statement, you can "slice and dice" the visualization of the model in many ways. For example, you could visualize the probability as a function of age for each level of the 'disease risk' categorical variable, for specified values of the other explanatory variables. By default, the mean value of continuous variables are used. I like to specify a reference value for the classification variables.
As @Rick_SAS alluded to, logistic regression is probably ok to use if all you have is 1-year survival (yes/no). However, if you actually have survival times for each person (e.g., person 1 survived 5 years, person 2 survived 6 months, person 3 was lost to follow-up at 3 years), you should generally not use logistic regression and instead prefer a model tailored to survival data. One possibility is proportional hazards regression, which can be implemented using PROC PHREG.
To your question about categorizing continuous variables, you should almost always avoid doing that. Rather, use a model that can accommodate continuous predictors.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.