BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
michelconn
Quartz | Level 8

I am comparing the 1 year survival for a group of patients (yes/no). I have a  three continuous variables (age, karnofsky scale, and comobidities) and I have three categorical variables (HLA match (two cats), related (two cats), disease risk (three cats)). I'm trying to pick the best test to use but can't quite nail it down. Can I use these variables as is? Or would I be better of converting the continuous variables into categorical? Either way which would be the best test and pariwise comparison? 

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

I don't think you will get there with a simple hypothesis test.

You need a statistical model.

A model that explains/predicts your binary target.

You could consider logistic regression, for example.

 

Koen

View solution in original post

3 REPLIES 3
sbxkoenk
SAS Super FREQ

I don't think you will get there with a simple hypothesis test.

You need a statistical model.

A model that explains/predicts your binary target.

You could consider logistic regression, for example.

 

Koen

Rick_SAS
SAS Super FREQ

> I am comparing the 1 year survival for a group of patients

 

You do not mention having data for the time at which patients died, so I assume you are modeling the probability of survival at the end of the year, conditional on the covariates in the model. I think a logistic regression is feasible, but you need to ask yourself which group of patients you are trying to compare.  Look at the EFFECTPLOT statement in PROC LOGISTIC or PROC PLM, as discussed in this article: https://blogs.sas.com/content/iml/2016/06/22/sas-effectplot-statement.html

 

Depending on how you specify the EFFECTPLOT statement, you can "slice and dice" the visualization of the model in many ways. For example, you could visualize the probability as a function of age for each level of the 'disease risk' categorical variable, for specified values of the other explanatory variables. By default, the mean value of continuous variables are used. I like to specify a reference value for the classification variables.

 

Mike_N
SAS Employee

As @Rick_SAS alluded to, logistic regression is probably ok to use if all you have is 1-year survival (yes/no). However, if you actually have survival times for each person (e.g., person 1 survived 5 years, person 2 survived 6 months, person 3 was lost to follow-up at 3 years), you should generally not use logistic regression and instead prefer a model tailored to survival data. One possibility is proportional hazards regression, which can be implemented using PROC PHREG. 

 

To your question about categorizing continuous variables, you should almost always avoid doing that. Rather, use a model that can accommodate continuous predictors. 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 555 views
  • 4 likes
  • 4 in conversation