Solved: Re: Statistical Analysis for Two groups (categorical and continuous co...

michelconn · Posted 01-25-2024 04:23 PM

I am comparing the 1 year survival for a group of patients (yes/no). I have a three continuous variables (age, karnofsky scale, and comobidities) and I have three categorical variables (HLA match (two cats), related (two cats), disease risk (three cats)). I'm trying to pick the best test to use but can't quite nail it down. Can I use these variables as is? Or would I be better of converting the continuous variables into categorical? Either way which would be the best test and pariwise comparison?

Thanks

sbxkoenk · Posted 01-26-2024 05:11 AM

I don't think you will get there with a simple hypothesis test.

You need a statistical model.

A model that explains/predicts your binary target.

You could consider logistic regression, for example.

Koen

View solution in original post

sbxkoenk · Posted 01-26-2024 05:11 AM

I don't think you will get there with a simple hypothesis test.

You need a statistical model.

A model that explains/predicts your binary target.

You could consider logistic regression, for example.

Koen

Rick_SAS · Posted 01-26-2024 08:42 AM

> I am comparing the 1 year survival for a group of patients

You do not mention having data for the time at which patients died, so I assume you are modeling the probability of survival at the end of the year, conditional on the covariates in the model. I think a logistic regression is feasible, but you need to ask yourself which group of patients you are trying to compare. Look at the EFFECTPLOT statement in PROC LOGISTIC or PROC PLM, as discussed in this article: https://blogs.sas.com/content/iml/2016/06/22/sas-effectplot-statement.html

Depending on how you specify the EFFECTPLOT statement, you can "slice and dice" the visualization of the model in many ways. For example, you could visualize the probability as a function of age for each level of the 'disease risk' categorical variable, for specified values of the other explanatory variables. By default, the mean value of continuous variables are used. I like to specify a reference value for the classification variables.

Mike_N · Posted 01-26-2024 10:05 AM

As @Rick_SAS alluded to, logistic regression is probably ok to use if all you have is 1-year survival (yes/no). However, if you actually have survival times for each person (e.g., person 1 survived 5 years, person 2 survived 6 months, person 3 was lost to follow-up at 3 years), you should generally not use logistic regression and instead prefer a model tailored to survival data. One possibility is proportional hazards regression, which can be implemented using PROC PHREG.

To your question about categorizing continuous variables, you should almost always avoid doing that. Rather, use a model that can accommodate continuous predictors.

Statistical Analysis for Two groups (categorical and continuous covariates)

Re: Statistical Analysis for Two groups (categorical and continuous covariates)

Re: Statistical Analysis for Two groups (categorical and continuous covariates)

Re: Statistical Analysis for Two groups (categorical and continuous covariates)

Re: Statistical Analysis for Two groups (categorical and continuous covariates)

Statistical Analysis for Two groups (categorical and continuous covariates)

Re: Statistical Analysis for Two groups (categorical and continuous covariates)

Re: Statistical Analysis for Two groups (categorical and continuous covariates)

Re: Statistical Analysis for Two groups (categorical and continuous covariates)

Re: Statistical Analysis for Two groups (categorical and continuous covariates)

Ready to join fellow brilliant minds for the SAS Hackathon?