BookmarkSubscribeRSS Feed
Corn_
Calcite | Level 5

So in my data, I'm dealing with nonlinear growth data with a logistic growth trend.  Because of this I want to use NLMixed using SAS 9.2, but I'm having trouble designing the code because our data is much more complicated than the model provided in the SAS documentation.  While the example consisted of simply 5 trees, our data is a factorial of 4 populations and 4 planting densities while at the same time being planted in 4 different environments giving us a population*density*environment random effect (see, much trickier).

We've already done Proc NLIN and we have the A, B, and C parameters for our logistic curves for each of the 16 population*density combinations.  How exactly would we account for that in the parms and model statements?  The goal of this analysis is to look for both the main effects of population and density, along with a possible interaction of the two.

Is our situation remotely salvageable, or should we abandon NLMIXED and just try GLIMMIX? 

16 REPLIES 16
SteveDenham
Jade | Level 19

I would definitely proceed with NLMIXED in this case.  If you can share your NLIN code, I think we can make a reasonable stab at the NLMIXED code needed.

I want to think about that three way random effect as well.  I see density as a fixed effect, but population and environment could be either.  (See the animal breeding literature for examples where environment is a fixed nuisance effect to be removed in calculating BLUPs.)  I would think of environment as the true random effect, with the subject being individual within population by density.

Anyway, if your NLIN code addresses fixed effects as parameters, rather than doing a BY variable analysis, it should be relatively painless to translate into a nonlinear mixed model.

Be sure to scour the SAS-L archives for examples as well.

Steve Denham

Corn_
Calcite | Level 5

Here is a basic version of my NLIN code for one of the populations at all 4 densities.  I've done this for all 16 combinations.  As you can see, I really haven't done it where fixed effects are parameters, I've instead broken them all up.  It seems I've given us a bit more pain to deal with.

proc nlin data=bscb13;

parms A=232.6 B=1332000 C=0.2065;

model length = A/(1+(B*EXP(-C*days)));

run;

proc nlin data=bscb16;

parms A=185.2 B=563354 C=0.1951;

model length = A/(1+(B*EXP(-C*days)));

run;

proc nlin data=bscb19;

parms A=234.3 B=1738760 C=0.2020;

model length = A/(1+(B*EXP(-C*days)));

run;

proc nlin data=bscb112;

parms A=189.7 B=1686900000 C=0.3048;

model length = A/(1+(B*EXP(-C*days)));

run;

As you can see, they're all fairly different when it comes to the parameters.

We've been thinking of both density and population as fixed effects because with only 4 different levels for each, you can't have a very good estimate of variance.  Additionally, we're mainly focusing on the data we have, so we're not looking to make giant inferences about all of the species.    Again, with only 4 environments, we don't have a good idea of the variance, but with our 64 population*density*environment classes, we have a better idea.  Each data point in this set is an entirely different plant (we had to hack them down to get the data), so we can't really use individual plants as the subject.  However, each field we used had 9 replicates so would population*density*rep(environment) be a good subject while the random effect would be population*density*environment?  This is the closest you're probably going to get to the individual trees in the example.

SteveDenham
Jade | Level 19

First off, we need to do something about those B values--when it comes to NLMIXED and exponentiating, I just shudder when I see numbers of that magnitude.  Could you fit instead:

length = A/(1 + exp (B' - C * days)); ?  By putting the leading coefficient inside the exponentiating, it will scale it down to something reasonable so that when we calculate the covariances between the parameters we won't get shot in the foot by highly influential observations.  It looks to me like A and C are drawn from single distributions, but without confidence bounds on them I can't be sure.  Thus, I think the fixed effects may have the greatest effect on B.

So now we start the coding.  The SAS-L archives will help with this.

Steve Denham

Corn_
Calcite | Level 5

Is the logistic equation flexible enough to work like that?

I tried the new equation and just got a whole lot of this:

Execution Errors for OBS 1:

NOTE: Missing values were generated as a result of performing an operation on missing

      values.       Each place is given by (number of times) AT

      (statement)/(line):(column).

SteveDenham
Jade | Level 19

In the famous words of L. Frank Baum, "Pay no attention to that man behind the curtain."  I am afraid that my advice is not only wrong, but dangerously wrong..  The correct formula for moving the B parameter into the exp() function should be something like;

B*exp(-C*t) = exp(B') * exp(-C*t) = exp(-B'*C*t).  This parameterization results in B and C being inversely related with correlation = -1, so that's not good.

So stay with the current parameterization.  Check SAS-L for examples where treatment groupings are coded.  I agree that the subject will be population*density*rep(environment) for the only random effect.

Steve Denham

Corn_
Calcite | Level 5

I've been looking through SAS-L and the SAS books and online resources, and I think I may have made a terrible mistake.  Is it even possible to run a test for type III fixed effects using NLMIXED?  If not, I think I may be barking up the wrong tree and need to go with either GENMOD or GLIMMIX.

SteveDenham
Jade | Level 19

The CONTRAST statement will allow you to do the tests of interest.  Once the groups are defined, multiple df CONTRASTs that are the equivalent of Type III tests can be used.  Note that only a between-within type of df can be used in NLMIXED in random effects models.

So, how about going at the parameters as "repeated" measures?  Get all of the A, B, C parameters in a long dataset, identifying them from the population, density and environment, with weights = 1/standard error of the parameter from the NLIN runs.  It should then be a 192 record dataset.

Then fit:

proc mixed data=newlongdata;

class param population density environment nlinrun;

model param_value=density|population|param/ddfm=kr(firstorder);

random intercept density population density*population/subject=environment;

repeated param/subject=nlinrun type=un;

weight nlwt;

lsmeans <whatever is of interest goes here>/<diff or slicediff as needed>;

Now if you also have the covariances of the nonlinear curve parameters, you might be able to use a PARMS statement to incorporate those as well, although GLIMMIX would probably afford more flexibility here through the PARMSDATA option.

Steve Denham

Corn_
Calcite | Level 5

Let's just assume that we did want to go with GLIMMIX because the other bit sounds really complicated.  Would this code work?  I'm really the first person in our lab to majorly have to work with nonlinear data, so I'm kind of the guinea pig for NLIN, NLMIXED, and GLIMMIX.

proc glimmix data=short;

class breed rep environment;

model length = breed|density|GD10 / ddfm=satterth dist=poisson link=log;

random rep(environment) breed*density*rep(environment);

run;

I'm pretty new to the whole ideas of dist and link, so would poisson and log cause GLIMMIX to treat it as a logistic rather than linear model?  Also, I know you are a fan of kr2 for ddfm, but I was recommended satterthwaite for this.

SteveDenham
Jade | Level 19

Well, that code will probably run into some problems.  It treats density and GD10 as continuous variables, which is OK, but it treats them as linear inputs.

Second, modeling length this way assumes that it follows a Poisson distribution, usually used for counts, not for a continuous variable.  The link=log means that log(length) is fit to the linear model.

Third, is this supposed to fit your raw data, i.e., the data you are currently using NLIN to fit to the independent time variable called days?  Is that what GD10 is?

GLIMMIX is designed to fit generalized LINEAR mixed models.  The generalized part means that the response variable follows some known distribution, and the independent variables have a linear relationship with the dependent variable in some sense.  NLIN fits a nonlinear model, where the independent variables have a nonlinear relationship.

So the only way I see to get the nonlinearity accounted for is to either use NLMIXED with proper class variables (GENMOD might be useful to generate the dummy variables representing the classes, although output from MIXED could also be ginned up), or to analyze the parameter values obtained from individual NLIN runs, and treat those parameters as the response variables.  We know they are correlated, so it is probably best to treat them as R side variables.  What I think you are looking for is the effect of your design variables on those parameters, and the two stage method is the only one I can think of that does not involve a major amount of coding in NLMIXED.

Steve Denham

Corn_
Calcite | Level 5

Well, it's becoming increasingly clear that I have very little idea of what I'm actually doing, but what better purpose in life than to serve as a warning to others. 

GD10 (growing degree days divided by 10) has replaced days since it is a more reliable measure of time for growth data, and is supposed to be continuous as a covariate.  Density really should be set up as a class statement variable, so I went ahead and did that.  However, running the proc only led to it not converging, even with iterations increased to 100, so looks like we're back to square one no matter what.

Poisson distribution is used for counts, but would gamma distribution or some other one work here?  The NLOPTIONS page states that most models fit with the GLIMMIX procedure typically have one or more nonlinear parameters, so does it have any use with my conundrum?  The data does take a logistic shape over time, with the shape of that logistic curve being affected by density and population, so is there any way to treat the length-GD10 relationship as logistic rather than linear?   I would like to stay away from trying to utilize A, B, and C as dependent variables to be analyzed because that takes us into the cold, dark pit that is multivariate statistics.  So let's see what we can do with NLMIXED or GLIMMIX.  I've been looking through the SAS-L archives like you've said, but everything there that has to do with NLMIXED seems to be about people wanting to find the values of parameters to model their equations, while I simply want to find type III fixed effects for density, population, and density*population.


Thanks for being so patient with me, I know it seems like we're taking one step forward and two steps back.

SteveDenham
Jade | Level 19

Yeah, I think it is time to take one really big backward step.  For instance, can we suppose that these are random samples of a population of growth curves, where the effects of interest (environment, population, density) are additive?  Because if we can, then this can all be addressed in GLIMMIX (or MIXED for that matter).  This is going to depend on treating the time variable as a class variable.  The shape of the response curve could be anything--we just know the values at specific points along the curve.  And we don't need to assume that a three, four or five parameter logistic curve fits (unless we are interested in the logistic curve parameter estimates themselves).  So, before getting hopes up too high, how many discrete values of GD10 are available in your data?  Are they the same across population and density?

Steve Denham

Corn_
Calcite | Level 5

We've pretty much been acting like population and density effects will be additive, however I'm not so sure about environment.  We've been treating that, and its interaction with population and density, as a source of error in the random statement.  Since we already have the logistic curve parameters from NLIN, we can pretty much cross that off the list.  We have a total of 42 different GD10 points.  While they should technically contain each population/density combination, things like bad weather and poor growth made it so that in several of the points we're missing one or more of these combinations.  Time constraints also made it so that some days had less samples taken than others, leading to somewhat unbalanced data.  Additionally, only one or two environments is represented at each GD10 mark since different environments had different temperatures and therefore different GDD values.  Also, since the last two GD10 values were final harvest values, we have vastly more data for those two points than any of the other ones, if that matters.

jrbrauer
Fluorite | Level 6

I think Steve's suggestion to consider growth curve modeling using proc mixed (or glimmix) is worth pursuing as an initial step before addressing additional complications. If a logistic function fits the observed growth trajectories, then could you simply log transform the dependent variable, then estimate a basic log-lin growth model (i.e., regress the log transformed length variable on the covariates)? To do so, you would start by modifying the proc mixed code that Steve offered earlier (replacing length with loglength, adding density as a class variable, and exploring the most appropriate covariance structure with adjustments to the random statement). Then, the average "effect" parameters that you are looking for can be back-transformed after estimation to represent effects on raw length (rather than on the log of length).

(P.S. - I'm sorry in advance if I grossly misunderstand your issue or if this is an overly simplistic application to your data. Also, in my experience, you are in good hands with Steve's help.)

Best,

Jon

SteveDenham
Jade | Level 19

Jon, I wouldn't even bother log transforming.  If there is a variance heterogeneity problem, I'd address that with a covariance structure that allowed for heterogeneity by time point.  The great thing about having the independent variable as a class variable is that you get the equivalent of a semi-parametric spline through the time points, with more flexibility than including the independent variable as a continuous variable (or several polynomial variables, to approximate the curve).  In fact, in GLIMMIX, you can specify an EFFECT that truly splines the categorical time variable.  This is the direction I want to point the OP in.

1."Fluff" the dataset so that all possible GDD are seen in all combinations of the fixed effects. Insert missing values for the dependent variable as needed.

2. If you have STAT12.1 or higher, use the EFFECT statement in GLIMMIX to define a spline, which will represent the growth curve much more accurately than a three parameter logistic curve.

3. Steal liberally from the code in Example 41.6 Radial Smoothing of Repeated Measures Data. Look especially at the code that generates Output 41.6.9 Observed and Predicted Profiles.

4. If you don't have access to SAS 9.4/STAT12.3, go from door to door in the department until you find out who has the license keys, and get an update.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 16 replies
  • 2956 views
  • 5 likes
  • 4 in conversation