logistic regression with statistical comparison of inflection point an...

RobinN · Posted 10-28-2020 08:13 AM

Hello,

I am trying to model the germination behavior of fungal spores. Therefore I've counted the germination rate at given times and now would like to perform a regression to a logistic function of this data to (statistically) compare the inflection point and the slope between different groups. From my mathematical standpoint, this should be easily possible as these are the key parameters that would be estimated by the regression of a standard logistic function in the form of f(x)= L/(1+e^(-k(x-x0))) where x0 is the inflection point, k the slope and L the function's maximum value ).

Example data would be:

Organism	Time after inoculation	Germination rate in %
FungiA	9	0
FungiA	12	5
FungiA	15	11
FungiA	18	52
FungiA	21	87
FungiA	24	93
FungiA	96	95
FungiB	9	0
FungiB	12	0
FungiB	15	1
FungiB	18	3
FungiB	21	8
FungiB	24	15
FungiB	27	21
FungiB	96	88

As I'm new to SAS I rely on the methods that are already known in my working group and we've tried a lot of calculations based on probit or logistic models but as they both model the probability converging to 1, which doesn't account for spores that are dead and will never germinate (in some cases up to 40%) it is for me not sufficient to describe the data. My research regarding this topic couldn't help me either.

May some of you have ideas on how this could be calculated and give me a hint on which procedures might be helpful. I'm working with the SAS University Edition

Thanks a lot for your help and input

Robin

SteveDenham · Posted 10-28-2020 08:57 AM

For data of this sort, I like a 4 parameter logistic model, and I implement it in either PROC NLIN or NLMIXED. Here is some sample code that I have used for vector potency fit to the 4 parameter logistic:

proc nlmixed data=long maxiter=5000 fdh=central tech=quanew;* hs=2 linesearch=5 ;
parms a0=8 b0=6 c0=12 d0=20 a1=-2 b1=1 c1=-1 d1=-7 a2=-2 b2=6 c2=-1 d2=-6 s2=.001 mu=0;
if (t1=0 and t2=0) then do;
a=a0;
b=b0;
c=c0;
d=d0;
end;
else if (t1=1 and t2=0) then do;
a=a0+a1;
b=b0+b1;
c=c0+c1;
c=d0+d1;
end;
else do;
a=a0 + a2;
b=b0 + b2;
c=c0 + c2;
d=d0 + d2;
end;
range=a-d;
factor=(dose/c);
denom=1+factor**b;
pred=((a-d)/(1+(dose/c)**b))+d;
myderiv=(pred-d)/denom*(-b/c)*factor**(b-1);

model value ~ normal(pred,s2);/*
estimate 'ed10' c*((1/9)**(1/b));
estimate 'ed20' c*((.25)**(1/b));
estimate 'ed80' c*(4**(1/b));
estimate 'ed90' c*((9)**(1/b));
estimate 'linear range' c*(4**(1/b))-c*((.25)**(1/b));
estimate 'extended range' c*(9**(1/b))-c*((1/9)**(1/b));
estimate 'a for TA1' a0+a1;
estimate 'b for TA1' b0+b1;
estimate 'c for TA1' c0+c1;
estimate 'd for TA1' d0+d1;
estimate 'a for TA2' a0+a2;
estimate 'b for TA2' b0+b2;
estimate 'c for TA2' c0+c2;
estimate 'd for TA2' d0+d2;
estimate 'Relative potency TA1 v Ref' (c0+c1)/c0;
estimate 'Relative potency TA2 v Ref' (c0+c2)/c0;
estimate 'Relative potency TA1 v TA2' (c0+c1)/(c0+c2);
predict pred out=predout1;
predict myderiv out=derivout1;
*ods exclude convergencestatus dimensions fitstatistics iterhistory;* parameters
			parameterestimates additionalestimates specifications;
ods output parameterestimates=outref additionalestimates=estref fitstatistics=fitref;
run;

Good starting values are critical. The method fits the comparator values as linear deviations from the control values (t1 and t2 in the data are 0), while the comparators have t1 =1, t2= 0 or t1=0, t2=1. Your data would readily go in as something like:.

proc nlmixed data=yourdata maxiter=5000 fdh=central tech=quanew;* hs=2 linesearch=5 ;
/* This part is critical to get good starting parameters.  Graphical methods should help. Plugged in values from data for a0, a1, d0 and d1 */
parms a0=0 b0=6 c0=12 d0=95 a1=0 b1=1 c1=-1 d1=-7 s2=.001;
if (organism='FungiA') then do;
a=a0;
b=b0;
c=c0;
d=d0;
end;
else if (organism='FungiB') then do;
a=a0+a1;
b=b0+b1;
c=c0+c1;
c=d0+d1;
end;

range=a-d;
factor=(time/c);
denom=1+factor**b;
pred=((a-d)/(1+(time/c)**b))+d;
myderiv=(pred-d)/denom*(-b/c)*factor**(b-1);

model value ~ normal(pred,s2);
/* These are for FungiA.  FungiB estimates can be obtained by replacing c0 with c0+c1 and b0 with b0+b1 */
estimate 'ed10' c0*((1/9)**(1/b0));
estimate 'ed20' c0*((.25)**(1/b0));
estimate 'ed80' c0*(4**(1/b0));
estimate 'ed90' c0*((9)**(1/b0));
estimate 'linear range' c0*(4**(1/b0))-c0*((.25)**(1/b0));
estimate 'extended range' c0*(9**(1/b0))-c0*((1/9)**(1/b0));
estimate 'a for FungiB' a0+a1;
estimate 'b for FungiB' b0+b1;
estimate 'c for FungiB' c0+c1;
estimate 'd for FungiB' d0+d1;
/* Potency is probably not the best description here, so insert what you want */
estimate 'Relative potency FungiB v FungiA' (c0+c1)/c0;
/* These output datasets are for further analysis or for diagnostics, respectively */
predict pred out=predout1;
predict myderiv out=derivout1;
*ods exclude convergencestatus dimensions fitstatistics iterhistory;* parameters
			parameterestimates additionalestimates specifications;
ods output parameterestimates=outref additionalestimates=estref fitstatistics=fitref;*/
run;

I hope this gets you started.

SteveDenham

StatDave · Posted 10-28-2020 09:31 AM

If you have the numerator and denominator counts in each of the rates, then you can use the events/trials syntax for the response in the PROC LOGISTIC MODEL statement. You can then use the general method in this note to compare slopes among the groups. You would then have code something like the following. You could add plotting statements like shown in the note, or statements to make specific comparisons.

proc logistic;
class group;
model numgerm/total = group time group*time / noint;
run;

If a 4-parameter logistic model is of particular interest, see this note.

SteveDenham · Posted 10-28-2020 11:19 AM

Thanks, @StatDave . I never thought about rescaling based on 1/phi - the note points out how to do that for a single model. The code I presented above essentially fits two 4P logistic models at the same time, so that relative numbers are calculated with the "pooled" error. I think that separate scale estimates should be estimated for each of the levels of group, and that the QL calculation should be modified accordingly (product of each separately) to accommodate both. Is that how you interpret the note? For my approach, this would require n+1 fits for comparison of n levels of group - first fitting each level separately to get the individual scale parameters, followed by the full analysis, enabling group "comparisons" and relative calculations, such as relative potency.

To @RobinN , be careful about just plugging your data into my code. The code was developed for a continuous response, and the response variable is assumed to have a normal distribution for the errors. If you are fitting the proportions, this assumption is not good, and the method using quasi-likelihood referred to in the note needs to be implemented.

Finally, this may be a case where the asymmetric five parameter logistic model ought to be at least examined (note that this is different than a five parameter hormesis logistic model). FungiB looks to me to have an asymmetrical time response curve.

SteveDenham

StatDave · Posted 10-28-2020 12:49 PM

For fitting the 4-parameter model, I was referring to the section below what you are referring to. See the "4- and 5-parameter logistic models" section of the note below the "Fractional logistic model" section you are referring to. It should be possible to enhance the NLMIXED code shown to do the separate group curves model that I show being done in PROC LOGISTIC in the first note I mentioned in my earlier reply.

SteveDenham · Posted 10-28-2020 01:05 PM

The fractional logistic model section deals with using PROC GLIMMIX to fit log concentration to the response, with an overdispersion specification. Everything I was about referred to the PROC NLMIXED methods. My question, and I think it applies to the OP as well, deals with how to incorporate a scale parameter, and whether it could be adapted to my method of fitting multiple curves (with either separate scale parameters or some sort of weighted-pooled scale parameter) in a single setting. For a normally distributed variable with a homogeneous variance estimate common to all groups, this would apply to my data. For a quasi-likelihood situation where there is no estimate of the variance, this would apply to the data of the OP. This would enable the statistical comparison of the inflection points and slopes. I apologize to both of you for not making that point clearly.

SteveDenham

StatDave · Posted 10-28-2020 01:27 PM

Sorry, I misunderstood. It should be possible to add a group variable into the model and use it in interaction with the parameters so that separate parameters of the basic model are obtained for each group. If that is done in the first code to obtain the predicted values, then it seems to me the same approach can be used to obtain the estimate of phi and then the final model can be fit using the same group-enhanced model as the first step.

SteveDenham · Posted 10-28-2020 01:34 PM

Sweet. Now I just need to work on re-parameterizing stuff so that I can use a quasi-likelihood approach for proportions.

@RobinN , I must say I dumped a lot of code on you, but it did not recognize that your response was a proportion, rather than a continuous/count variable where a lognormal distribution is appropriate. If I have some additional spare time, I will try to come up with code that is taken directly from the SASnote, so that you have a reference.

SteveDenham

RobinN · Posted 10-29-2020 10:33 AM

@SteveDenham

@StatDave

First of all:

Many thanks to both of you!

As I've already pointed out I'm new to SAS so I think - at the moment - it looks like it's too complicated for me to compare different fungi with one code. As it's possible to estimate single parameters I will simply compare them by their confidence interval. I will try to estimate the slope at b50 (for the 4 parameter logistic model) with the first derivation of the function at this point. But I've got some problem with fitting my data to the model.

As the data for one fungus is gappy the model is'nt able to fit the data properly. I don't know if it's therefore just impossible to fit these data or if I'm missing something.

Here is the code I've used to fit the data from one repetition of fungus A, which works fine

data fungi;                                                                                                                              
        input time germrate;   
        datalines;                                                                                                                                                                                                                                                  
96    .95                                                                                                         
24    .93
21    .87
18    .52
15    .11
12    .05
9     .01
      ;   
      run;
      
 proc nlmixed data=fungi;
        parms bu=.95 bl=.001 b50=.20;
        mu = bu + (bl-bu)/(1+(time/b50)**bs);
        l = mu**germrate * (1-mu)**(1-germrate);
        ql = log(l);
        model germrate ~ general(ql);
        predict mu out=out1;
        ods output parameterestimates=pe;
        run;
        
 proc sql noprint;
        select count(Parameter) into :nparm from pe;
        quit;
      title "Scale parameter"; 
      title2 "Pearson Chi-square / DF";
      proc sql;
        select sum(phi_i) format=best12. as Scale into :phi from
        (select (germrate-pred)**2/(pred*(1-pred)*(count(germrate)-&nparm)) as phi_i from out1);
        quit;
      title;
      
 proc nlmixed data=fungi;
        parms bu=.98 bl=.001 b50=.20;
        mu = bu + (bl-bu)/(1+(time/b50)**bs);
        l = mu**germrate * (1-mu)**(1-germrate);
        ql = (1/&phi)*log(l);
        model germrate ~ general(ql);
        predict mu out=FPLout;
        run;
        
 proc sgplot data=fplout noautolegend;
        band upper=upper lower=lower x=time;
        scatter y=germrate x=time;
        series  y=pred x=time;
        xaxis label="time after inoculation" ;
        yaxis label="germination rate" values=(0 to 1 by 0.1) ;
        title "4-parameter Logit Model";
        run;

but with this data from fungi B (+ some imaginary data for t=120 and t=150 to suggest the upper limit) and renewed parm starting points, it doesn't work. Any further ideas?

data fungi;                                                                                                                              
        input time germrate;   
        datalines; 
150   .89        
120   .89
96    .88        
27    .21
24    .15
21    .08
18    .01
15    .01
12    .01
9     .01
      ;   
      run;
      
 proc nlmixed data=fungi;
        parms bu=.90 bl=.01 b50=41;
        mu = bu + (bl-bu)/(1+(time/b50)**bs);
        l = mu**germrate * (1-mu)**(1-germrate);
        ql = log(l);
        model germrate ~ general(ql);
        predict mu out=out1;
        ods output parameterestimates=pe;
        run;
        
 proc sql noprint;
        select count(Parameter) into :nparm from pe;
        quit;
      title "Scale parameter"; 
      title2 "Pearson Chi-square / DF";
      proc sql;
        select sum(phi_i) format=best12. as Scale into :phi from
        (select (germrate-pred)**2/(pred*(1-pred)*(count(germrate)-&nparm)) as phi_i from out1);
        quit;
      title;
      
 proc nlmixed data=fungi;
        parms bu=.90 bl=.01 b50=.40;
        mu = bu + (bl-bu)/(1+(time/b50)**bs);
        l = mu**germrate * (1-mu)**(1-germrate);
        ql = (1/&phi)*log(l);
        model germrate ~ general(ql);
        predict mu out=FPLout;
        run;
        
 proc sgplot data=fplout noautolegend;
        band upper=upper lower=lower x=time;
        scatter y=germrate x=time;
        series  y=pred x=time;
        xaxis label="time after inoculation" ;
        yaxis label="germination rate" values=(0 to 1 by 0.1) ;
        title "4-parameter Logit Model";
        run;

Furthermore, do you have any suggestions on how to provide the data from different repetitions to this model, as every time I've tried this in the manner I'm used to from other models it doesn't work either.

SteveDenham · Posted 10-30-2020 08:52 AM

@RobinN , You have a bad starting value for b50 in the final code block. This parameter should be on the hours scale, and be 40 rather than 0.4. With that change, everything else runs fine on my machine.

Now the question about reps. I would add a RANDOM statement, and modify the MODEL and PARMS statements I honestly am not sure that this will work the way I think it ought to. Here are the steps: Make sure that each rep is uniquely identified - let's call this variable repno.

proc nlmixed data=fungi;
        parms bu=.95 bl=.001 b50=20 logsig= -1;
        mu = (bu + (bl-bu)/(1+(time/b50)**bs)) + repvar;
        l = mu**germrate * (1-mu)**(1-germrate);
        ql = log(l);
        model germrate ~ general(ql) ;
        random repvar ~ normal(0,exp(2*logsig)) subject=repno out=EB;
 
        predict mu out=out1;
        ods output parameterestimates=pe;
        run;

I got most of this from the examples in the NLMIXED documentation that use the RANDOM statement, and merely adds in an additional random factor for mu. If you want to get really technical, you could add random effects to the parameters that reflect variability due to rep, as in the first example that fits a one compartment PK model.

Note that this approach involves a new parameter, logsig. I gave a starting value of -1 for this, which would correspond to a standard deviation between reps of 0.36. That is just a starting value and is probably too large. This parameterization [exp(2*logsig)] guarantees a positive value for the variance.

Unfortunately, I will leave the rest of this exercise to you, as I don't have rep data.

SteveDenham

logistic regression with statistical comparison of inflection point and slope between groups

Re: logistic regression with statistical comparison of inflection point and slope between groups

Re: logistic regression with statistical comparison of inflection point and slope between groups

Re: logistic regression with statistical comparison of inflection point and slope between groups

Re: logistic regression with statistical comparison of inflection point and slope between groups

Re: logistic regression with statistical comparison of inflection point and slope between groups

Re: logistic regression with statistical comparison of inflection point and slope between groups

Re: logistic regression with statistical comparison of inflection point and slope between groups

Re: logistic regression with statistical comparison of inflection point and slope between groups

Re: logistic regression with statistical comparison of inflection point and slope between groups

Ready to join fellow brilliant minds for the SAS Hackathon?