☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Quartz | Level 8

## Statistical Analysis for Two groups (categorical and continuous covariates)

I am comparing the 1 year survival for a group of patients (yes/no). I have a  three continuous variables (age, karnofsky scale, and comobidities) and I have three categorical variables (HLA match (two cats), related (two cats), disease risk (three cats)). I'm trying to pick the best test to use but can't quite nail it down. Can I use these variables as is? Or would I be better of converting the continuous variables into categorical? Either way which would be the best test and pariwise comparison?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Super FREQ

## Re: Statistical Analysis for Two groups (categorical and continuous covariates)

I don't think you will get there with a simple hypothesis test.

You need a statistical model.

A model that explains/predicts your binary target.

You could consider logistic regression, for example.

Koen

3 REPLIES 3
SAS Super FREQ

## Re: Statistical Analysis for Two groups (categorical and continuous covariates)

I don't think you will get there with a simple hypothesis test.

You need a statistical model.

A model that explains/predicts your binary target.

You could consider logistic regression, for example.

Koen

SAS Super FREQ

## Re: Statistical Analysis for Two groups (categorical and continuous covariates)

> I am comparing the 1 year survival for a group of patients

You do not mention having data for the time at which patients died, so I assume you are modeling the probability of survival at the end of the year, conditional on the covariates in the model. I think a logistic regression is feasible, but you need to ask yourself which group of patients you are trying to compare.  Look at the EFFECTPLOT statement in PROC LOGISTIC or PROC PLM, as discussed in this article: https://blogs.sas.com/content/iml/2016/06/22/sas-effectplot-statement.html

Depending on how you specify the EFFECTPLOT statement, you can "slice and dice" the visualization of the model in many ways. For example, you could visualize the probability as a function of age for each level of the 'disease risk' categorical variable, for specified values of the other explanatory variables. By default, the mean value of continuous variables are used. I like to specify a reference value for the classification variables.

SAS Employee

## Re: Statistical Analysis for Two groups (categorical and continuous covariates)

As @Rick_SAS alluded to, logistic regression is probably ok to use if all you have is 1-year survival (yes/no). However, if you actually have survival times for each person (e.g., person 1 survived 5 years, person 2 survived 6 months, person 3 was lost to follow-up at 3 years), you should generally not use logistic regression and instead prefer a model tailored to survival data. One possibility is proportional hazards regression, which can be implemented using PROC PHREG.

To your question about categorizing continuous variables, you should almost always avoid doing that. Rather, use a model that can accommodate continuous predictors.

Discussion stats
• 3 replies
• 732 views
• 4 likes
• 4 in conversation