BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hanson4022
Calcite | Level 5

I've been running in circles trying to figure out of this is possible with Proc Logistic (or similar procedures) for awhile, so hopefully someone here has some ideas. The basic question is, "Are two (or more) logistic models different?" I'm doing a basic survival analysis with 4 variables in my data set, dosage, deaths at the given dose, sample size at the dose, and a categorical treatment regimen. What I'm trying to do is build a model for each treatment regimen where you can predict the percentage surviving after being given a specific dose, and then see if the models are significantly different between treatment regiments.

I used proc logistic to produce models for each treatment, but what options are out there for comparing the models outside of testing for differences in parameter estimates or seeing overlap in the confidence intervals plots=effects statement? Here's a quick dummy set of data:

DATA dosedata;

   INPUT Treatment Dose Deaths n  ;

   DATALINES;

1 5 30 30

1 10 25 30

1 15 10 30

1 20 0 30

2 5 30 30

2 10 30 30

2 15 25 30

2 20 20 30

3 5 10 30

3 10 8 30

3 15 5 30

3 20 0 30

3 25 0 30

3 30 0 30

;

run;

proc logistic data =dosedata plots=effect plots=ROC   ;

by  treatment  ;

model    deaths/n = dose  / lackfit ;

run;

quit;

Using the by statement, proc logistic calculates a model for each individual treatment. It's on the right track as I get a different intercept and slope estimate for each model. At this point though, are there options for a sort of test somewhat equivalent to a multiple comparisons test of means, except in this case for the entire model? I could do comparisons of point estimates such as at what dosage 25% mortality occurs in each model, but I would like to be able to examine the entire binomial distribution instead if it's doable. I've seen the contrast statement for proc logistic, but this appears to only be for contrasting parameter estimates. I could do two separate tests, one to test differences in y-intercepts, and the other for the slope parameters between treatments. I may be overthinking this too though and there might a much simpler way (and procedure) that can address this question. I haven't seen much for hints when I've googled this question, so any particular thoughts on directions to check out are much appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
1zmm
Quartz | Level 8

Another possibility is to make TREATMENT a classification variable in PROC LOGISTIC and to include TREATMENT and its interaction with DOSE [=TREATMENT*DOSE] as independent variables in the MODEL statement:

   proc logistic data=dosedata plots=effect plots=ROC;

       CLASS TREATMENT /  PARAM=REFERENCE REF=FIRST;
        model deaths/n = dose TREATMENT dose*TREATMENT / lackfit risklimits;

    run;

    quit;

Treatment #1 is the reference treatment to which treatments #2 an d #3 are compared.  The overall model is

     Y=b0 + b1*dose + b2*(Treatment=2) + b3*(Treatment=3) + b4*dose*(Treatment=2) + b5*dose*(Treatment=3)  = deaths/n

If Treatment=1, then Y=b0 + b1*dose.

If Treatment=2, then Y=b0 + b1*dose + b2*1 + b3*0 + b4*dose*1 + b5*dose*0 = (b0 + b2) + (b1 + b4)*dose.

If Treatment=3, then Y=b0 + b1*dose + b2*0 + b3*1 + b4*dose*0 + b5*dose*1 = (b0 + b3) + (b1 + b5)*dose.

The model for treatment #2 differs statistically significantly from that for treatment #1 if either b2, b4, or both do not equal zero.

The model for treatment #3 differs statistically significantly from that for treatment #1 if either b3, b5, or both do not equal zero.

The model for treatment #3 differs statistically significantly from that for treatment #2 if either (b3 - b2), (b5 - b4), or both do not equal zero.  The statistical tests for the first two of these comparisons is shown in the standard PROC LOGISTIC output, "Analysis of Maximum Likelihood Estimates".  The statistical test for the last of these comparisons can be shown through use of a CONTRAST statement.

View solution in original post

5 REPLIES 5
Damien_Mather
Lapis Lazuli | Level 10

I think what you are looking for can be done with PROC PHREG.

You cannot use the trial/event count style of model specification in PHREG so you will have to use (or synthesise) disaggregate data at the individual subject level first.

TREATMENT becomes a (class) covariate along with the other effects. Rather than angsting over getting the right error distribution specification (a common symptom amongst us) PHREG is semi-parametric in the sense that the distribution is empirically found, and what is of interest is the level of significance for the differences amongst level effects for treatments including a defined baseline treatment level, other covariates also being controlled for. You can control for them by explicitly including them in as model effects OR as part of a strata specification, depending on whats suits your applicaton better. The time-to-hazard (in this case death) is best coded for your application as only 2 values - 1 or 2 - being observed dead at time 1 or unobserved (censored) as time 2, i.e. not yet dead (a certainty for us all, but just not at the time of observation). In the model statement you specify the censoring value as well as the response (time to hazard) variable.

Thae value of including other covariates is that you want the 'lift' or 'cut' to the empirical baseline distribtion of survival over time owing to the treatment levels to be consistent, i.e. NOT result in crossovers amongst the predicted survival curves for the different treatment levels (in your case lines - only two points each).

Let meknow if this helps.

Cheers.

Damien

hanson4022
Calcite | Level 5

I haven't looked in to PHREG yet, so that looks like an interesting option. However, I originally decided to go with the events/trials format because I'm using an Abbott correction to account for a control group's mortality. It gives you an adjusted proportion of mortality for each of the other groups, which you can then use to calculate the adjusted number of death events. However, rounding error comes in to play when you try to separate the adjusted mortality for the sample to a row per individual basis. Say you had an adjusted mortality of 50% out of 15 individuals. You would have 7.5 death events. You could then round down to have 6 rows under the death variable with a 1 for the event, or round up to 7 rows signifying the event, but that "half" of an event gets rounded away when you try to reduce the data to the binary outcome per individual. That results in a different proportion mortality than when you calculate the Abbott correction. I would like to avoid that rounding error if possible, but the only procs I've been able to find with the event/trial syntax are LOGISTIC and LIFEREG. It doesn't seem like I can make strata comparisons like PROC LIFETEST or PHREG are capable of when you have datasets with an individual per row either.

If worse comes to worse I can always make comparisons of the dosage required to reach a given mortality between treatments (i.e. 10, 50, 90%, etc.), or just visually point out areas where the confidence intervals don't overlap. I'm at least at a point where I can say something, but it doesn't exactly seem that rigorous.

1zmm
Quartz | Level 8

Another possibility is to make TREATMENT a classification variable in PROC LOGISTIC and to include TREATMENT and its interaction with DOSE [=TREATMENT*DOSE] as independent variables in the MODEL statement:

   proc logistic data=dosedata plots=effect plots=ROC;

       CLASS TREATMENT /  PARAM=REFERENCE REF=FIRST;
        model deaths/n = dose TREATMENT dose*TREATMENT / lackfit risklimits;

    run;

    quit;

Treatment #1 is the reference treatment to which treatments #2 an d #3 are compared.  The overall model is

     Y=b0 + b1*dose + b2*(Treatment=2) + b3*(Treatment=3) + b4*dose*(Treatment=2) + b5*dose*(Treatment=3)  = deaths/n

If Treatment=1, then Y=b0 + b1*dose.

If Treatment=2, then Y=b0 + b1*dose + b2*1 + b3*0 + b4*dose*1 + b5*dose*0 = (b0 + b2) + (b1 + b4)*dose.

If Treatment=3, then Y=b0 + b1*dose + b2*0 + b3*1 + b4*dose*0 + b5*dose*1 = (b0 + b3) + (b1 + b5)*dose.

The model for treatment #2 differs statistically significantly from that for treatment #1 if either b2, b4, or both do not equal zero.

The model for treatment #3 differs statistically significantly from that for treatment #1 if either b3, b5, or both do not equal zero.

The model for treatment #3 differs statistically significantly from that for treatment #2 if either (b3 - b2), (b5 - b4), or both do not equal zero.  The statistical tests for the first two of these comparisons is shown in the standard PROC LOGISTIC output, "Analysis of Maximum Likelihood Estimates".  The statistical test for the last of these comparisons can be shown through use of a CONTRAST statement.

hanson4022
Calcite | Level 5

I originally wrote the class statement format off because I was getting entirely different models than when I used the by statement. Turns out I forgot to add the interaction term to the model statement. Guess I was right in that it turns out I was missing something really simple. Thanks.

Doc_Duke
Rhodochrosite | Level 12

In PROC LOGISTIC, you can use the ROCCONTRAST statement to compare two (or more) models. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1742 views
  • 3 likes
  • 4 in conversation