About sld

sld · ‎08-13-2018

If I were to answer only your last question, I would say, No. But I think there are many additional considerations. Because the variance is a function of the mean for the Poisson distribution (specifically, the variance is equal to the mean), there may be no need for "variance stabilization" as in models assuming a normal distribution; the variance will "naturally" increase as the mean increases. However, it is possible that overdispersion exists for a particular data set in a particular model; there are different ways to deal with overdispersion, including using the negative binomial distribution rather than the Poisson or fitting a quasi-Poisson using random _residual_. I doubt that your current model is correct. We first need clarification on the design: did you measure count on the same leaf in each of the 13 intervals, or did you measure count on different leaves in different intervals? I presume leaves are nested within sites, but are leaves also clustered by plants (which are nested within sites) or subject to some other design constraint? Other thoughts: To achieve count per unit leaf area, you need to use the ln(leafarea) as the offset, not plain leafarea. When you incorporate previous_para as a simple covariate in the model, you are assuming that previous_para is not affected by either climate or season. That seems unlikely, and if so it is a nontrivial modeling problem. You may want to consider a structural equation model. "subject=climate" in RANDOM statements is wrong if climate is a fixed effects factor. The assignment of 13 (consecutive?) intervals to 3 seasons could be arbitrary and hard to justify. Or it could be fine, as long as you can adequately justify how and why you categorized what might otherwise be seen as a continuous-scale variable. You may want to consider regressing on interval and dropping season. You may want to consider including interactions about climate, season, and interval(season). Many of the problems that I see your model specification are not specific to using a Poisson distribution rather than a normal distribution. So I recommend careful study of SAS® for Mixed Models, Second Edition which deals largely with normal error models but is still a valuable resource for you. Other model considerations are specific to GL(M)Ms, for which Generalized Linear Mixed Models: Modern Concepts, Methods and Applications is an excellent (if dense) resource. It's not only the syntax that is difficult--the concepts can be as well 🙂 I hope this helps.

sld · ‎08-02-2018

@kc wrote: I wonder, Why do you want a piecewise linear regression? How many breakpoints do you need? Do you know where the breakpoint(s) for the pieces is(are) (for example, at 6 months), or do you need to estimate the location(s) of the breakpoint(s)? Well, from prior clinical knowledge, there is no significant effect of treatment on summary score beyond the 6 month timepoint. Therefore, there is need for only 2 breakpoints, one each at 1 month and 6 month. I included only one breakpoint at 6 month in my code as an example. I'm thinking of a breakpoint as a value at which the slope changes, i.e., the boundary between the segments. You don't have any data prior to 1 month, so you can't have a breakpoint there. Multiple methods (paired t-tests, ANCOVA, mixed effects models) are employed in analyzing these data. Also, an underlying assumption of longitudinal growth curve models is that the missing data is missing at random. It is certainly convenient to assume that data are missing completely at random. But convenience does not necessarily make it true. If data are not MCAR, then any statistical method will be subject to bias. So, any help with the syntax in running a piecewise regression to fill the table in my original post would be great. Let me know if more data (in CSV format this time) would be helpful in carrying out this task! Here is some code to consider; I provide no guarantees so you'll want to understand it thoroughly. It includes some graphics that might help you understand what the model is doing and confirm visually that it might be doing what you want. /* Create variable for breakpoint */ data plr; set plr; time_6 = max(time, 0.5); /* Breakpoint at 0.5 */ run; proc tabulate data=plr; class time time_6; table time, time_6; run; /* Fit random coefficients model */ proc glimmix data=plr; class subject_id group; model summary_score = group|time group|time_6 / solution ; random intercept time time_6 / subject=subject_id type=un g gcorr; /* random intercepts, random slopes */ output out=plr_out2 pred(noblup)=predpa pred=pred; run; proc sort data=plr_out2 out=plr_out2_srt; by time; /* Plot fitted regression for each subject and population-averaged regression */ proc sgpanel data=plr_out2_srt; panelby group; series x=time y=pred / group=subject_id markers; series x=time y=predpa / lineattrs=(thickness=2 color=black); run; /* Plot population-averaged regression by group in one figure for comparison */ proc sgplot data=plr_out2_srt; series x=time y=predpa / group=group; run; /* Plot observed data for each subject with its regression */ proc sgpanel data=plr_out2_srt; panelby subject_id / columns=3; series x=time y=pred / markers lineattrs=(thickness=2 color=black); series x=time y=summary_score / markers; run; I'll give you a hint about acquiring the mean comparisons that you want: use the LSMEANS statement with the AT option. I hope this helps.

sld · ‎08-02-2018

Your desired comparisons could be obtained by using time as a categorical factor in a mixed model ANOVA, rather than by using time as a continuous factor in a random coefficients regression. When I plot your data, using proc sgpanel data=plr noautolegend; panelby group; series x=time y=summary_score / group=subject_id markers lineattrs=(pattern=1); run; I wonder, Why do you want a piecewise linear regression? How many breakpoints do you need? Do you know where the breakpoint(s) for the pieces is(are) (for example, at 6 months), or do you need to estimate the location(s) of the breakpoint(s)? Is your actual dataset larger than the one you posted? (In the future, please post as a CSV file, rather than Excel.) Few subjects in your posted dataset have values for all 7 times (5 out of 27 subjects), and many have only 1 value (8 out of 27). Only 14 subjects have data at both 1 month and 6 months. Can you sensibly fit a model (with or without random slopes) for multiple linear pieces using a data set that is so incomplete? I'd say, probably not. I also would be concerned about potential bias in either ANOVA or regression models due to missing data and why data are missing. I hope this helps.

sld · ‎07-31-2018

If the response is a proportion, then I probably would use the beta distribution rather than the binomial. The binomial distribution theoretically is appropriate either for binary (Bernoulli) data (taking values of 0 or 1) or for responses specified as "number of successes" out of "number of trials" using the syntax MODEL events/trials = <fixed-effects> </ model-options>; where "events" is number_of_activated cells and "trials" is total_number_of_cells.

sld · ‎07-31-2018

The documentation may provide the answer to your question: The GENMOD Procedure: Type 3 Analysis. Among other things, it depends upon whether you have missing cells, whether you have additional factors in the MODEL statement, whether you have interactions, etc.

sld · ‎07-31-2018

I'm glad you found the Community to be helpful. Before we drop this topic, I want to follow up because I'm not sure that I am entirely happy with your model. (1) The response variable is activated and the distribution is binomial. Does activated take values of 0 or 1, or is it a proportion? (2) An important distinction between the normal distribution and distributions available in generalized linear (mixed) model procedures is that the variance is a function of the mean for these non-normal distributions. Consequently, although there are residuals, there is no such thing as residual variance because once the mean is estimated, the variance is also known. And consequently then, I do not think I would use random _residual_ / subject=T_type(donor) type=un ; at least, not without a lot of thought about what it is doing and whether it is valid.

sld · ‎07-30-2018

Your current model assumes a linear relationship between sessionScore and sessionContinuous, rather than some sort of smoothed curve. For an example of smoothed curves fit to individual subjects, you might find this GLIMMIX example useful: Example 47.6 Radial Smoothing of Repeated Measures Data. Documentation about the radial smoother is here: Radial Smoothing Based on Mixed Models. Splines are available using the EFFECT statement in GLIMMIX: see Rick Wicklin's blog Regression with restricted cubic splines in SAS and the links within, as well as the GLIMMIX documentation. I would not expect a graphic with 702 separate curves to be visually useful (too much overplotting, I would think), but you could plot a small sample of the 702. I hope this helps.

sld · ‎07-27-2018

Hmm. You can ignore this post from Community_Guide that is attributed to me. I don't know how it happened. I have more specific comments in my other post.

sld · ‎07-27-2018

Do you have a repeated measures design? In other words, is Y measured at each of several TIMEPOINTs (0, 3, 6, 9, 12) for each subject? Or is each subject measured at only one timepoint? The choice of appropriate procedure and the specification of the correct statistical model depends on the experimental design; if you have repeated measures, then you would use MIXED or GLIMMIX rather than GLM. Although you have a "continuous" factor (Essai?), it apparently has only two levels; I would incorporate it as a categorical (classification) factor in the statistical model. You could use TIMEPOINT as a continuous factor, but you say that you do not expect it to have a linear relationship with the response Y. So I would initially incorporate TIMEPOINT as a categorical factor. You might find that you can fit some form of linear model (e.g., curvilinear, like the quadratic specification in your MODEL statement) or a spline, but without seeing the data, we cannot tell. If COMPONENT, ESSAI and TIMEPOINT are categorical factors (and so included in the CLASS statement), then you would have a 3-way factorial which may, or may not, include all possible interactions. If multiple observations are made on each subject, then you would have a mixed model. I hope this helps. Follow up as need be.

sld · ‎07-23-2018

Some "more friendly" (i.e., less mathematical) introductions into distributional choices: For the beta distribution, A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables. For binomial, etc., The arcsine is asinine: the analysis of proportions in ecology. Looking forward to seeing what you find out....

sld · ‎07-23-2018

Yes, DHARMa is an R package, not SAS. The "23/45" location does not direct me to the text that I think you are referring to (the document does not have page numbers, and the numbering depends on your particular computer). DHARMa functions are able to accommodate and assess overall overdispersion, but from my reading of it, I do not think it assesses overdispersion that varies depending by levels of predictor variables. Although GLIMMIX has the ability to specify heterogeneous variances, the SAS code in your message does not specify heterogeneous variances in any way (note what you have commented out using /* ... */). For distributions other than the normal, I think great care would need to be taken with specification of heterogeneous variances. I am not in the position to advise you about that coding: I have not done much of that sort of modeling, and when I undertake it at some time in the future, I would do it only in the framework of a lot of simulation exercises--where you know the answer, and see how well you can recover it. But maybe you are not trying to specify heterogeneous variances in combination with a gamma distribution? It is not clear from your message. I hope this helps.

sld · ‎07-23-2018

The design is definitely a mixed model. The MIXED procedure assumes that the response is normally distributed (conditional on the predictors); GLIMMIX allows other distributional assumptions, among them normal, beta and binomial. Generally speaking, but there are always exceptions, a percent (or proportion) response uses either a beta or a binomial distribution, so GLIMMIX is typically more appropriate than MIXED. A proportion that is obtained as a ratio of counts (e.g., number of "successes" out of number of "trials") calls for a binomial distribution; a proportion measured directly calls for a beta distribution, as noted by @plf515. If I have a two-way (here, 2x2) factorial, I usually specify the model as A x B rather than a single factor with 4 levels. But the statistical model is the same either way; it's just a different parameterization, and one form may deliver what you want more directly than the other. You get to choose. I would consider the following model AS A STARTING POINT (for a beta distribution; syntax would differ for binomial): proc glimmix data= work.cd4cd8 ; class donor a b t_type; model activation = a | b | t_type / dist=beta; random intercept a*b / subject=donor; lsmeans a | b | t_type / ilink; run; Keep in mind that there are a lot of options that might be better than the default options implied in the code above. Default options are not always the best choice for generalized linear mixed models. You may want to start with a normal distribution assumption within GLIMMIX before attempting more challenging models, even though a normal distribution is probably not a valid choice. It's good to get your feet under you with general linear mixed models before you dive into generalized linear mixed models. I hope this helps.

sld · ‎07-23-2018

To elaborate on the responses by @Reeza and @PaigeMiller: Clearly, you need to use a procedure for data that are binary or binomial. GLM is definitely not the correct procedure, because it assumes the the response is normally distributed (conditional on the predictors). In your data snippet, it does not look like each individual cow is independent of all other cows. Does each line in your data snippet represent one cow? If so, it seems that there are one or more cows receiving a particular treatment at each of four farms. Cows on the same farm receiving the same treatment are subsamples, and the statistical model should incorporate cows accordingly. Assuming that FARM is a fixed effects factor, I see two options, one of which uses the LOGISTIC procedure, one of which uses the GLIMMIX procedure (you could also use GENMOD): (1) Combine the data over multiple cows on the same farm and receiving the same treatment so that a new response is defined by the number of cows with outcome=1 (i.e., number of "successes") out of total number of cows. You could then use the LOGISTIC (or GENMOD) procedure with a binomial distribution using the "events/trials" response specification. See http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_logistic_syntax22.htm&docsetVersion=14.3&locale=en (2) Use the data in the current format in the GLIMMIX procedure, specifying a mixed model with a RANDOM statement which clusters cows within sets of cows at a given farm receiving the same treatment. Both approaches will produce the same results, but the first approach using LOGISTIC is more intuitive and that's what I would recommend. I hope this helps. I think you will want to do some studying about logistic regression (or in this case, logit models because farm and treatment are categorical) and how to implement these models using SAS.

sld · ‎07-19-2018

Your code might work correctly, or it might not. Among other things, whether your code is correct depends upon the data structure (what observations exist for what levels of what factors) in addition to how the data set is sorted (because the class variable is not in the CLASS statement) plus how you have coded the levels for your random effects factors. How many classes do you have? What does the tabulation of age x gender look like for classes: in other words, how many classes do you have for each combination of age (how many levels?) and gender (2 levels presumably) and for the marginal totals for age and gender? At the class level, notice that you have a two-way factorial (either without or with interaction) of age and gender. There is intuitive value in recognizing this ANOVA for class-level data as you set up a mixed model with students nested within classes (which is likely a form of split-plot design). If you do not have adequate replication (i.e., multiple classes) of teacher age and gender, then your ability to model the effects of teacher age and gender on student outcomes is limited or even not possible. The MIXED procedure is able to do multivariate analysis using Kronecker products as discussed here. However, I recommend starting simply, with univariate analyses and make sure those work correctly before attempting multivariate analysis. SAS® for Mixed Models, Second Edition is an excellent resource that addresses many aspects of your modeling challenge.

sld · ‎07-18-2018

I would say that your interpretation of my interpretation is correct. And consistent with the results you get. An argument could be made that there is little need for Type I error control among these 9 comparisons: no mean is used in more than one comparison, and the overall test of interaction assesses whether all 9 differences are equal (versus the alternative that at least two differences are unequal). If you wanted control then, as you note, you could save the SLICEDIFF output to a dataset using ODS and then use MULTTEST. You could also specify the 9 comparisons in a LSMESTIMATE statement with the ADJUST option.

Online Status	Offline
Date Last Visited	‎01-22-2021 05:52 PM

Re: The appropriate econometric model when the dependent variable is p...

Re: Repeated measures analysis with SAS: specifying the variable

Re: MIxed models fixed and random effects

Re: generalized and general linear mixed effects model on RBD

Re: MIxed models fixed and random effects

Re: generalized and general linear mixed effects model on RBD

Re: generalized and general linear mixed effects model on RBD

Re: Longitudinal growth model using proc mixed

Re: Longitudinal growth model using proc mixed

Re: Interpreting PROC GLIMMIX output

Fix for blurry editor fonts in Windows 10

Re: Clarification needed for glimmix covariance parameters test

Re: Clarification needed for glimmix covariance parameters test

Re: The appropriate econometric model when the dependent variable is p...

Re: Requesting aid in understanding how to use SAS to build a multiple...

Re: lsmeans "adjust=" not working

Re: The appropriate econometric model when the dependent variable is p...

Re: Repeated measures analysis with SAS: specifying the variable

Re: generalized and general linear mixed effects model on RBD

Re: MIxed models fixed and random effects

Re: Are variance stabilizing transformations appropriate for GLMM?

Re: Piecewise Linear Regression

Re: Piecewise Linear Regression

Re: Mixed modelling, help with correct procedure and code

Re: GENMOD parametrization and TYPE3/"Joing Table" interpretation

Re: Mixed modelling, help with correct procedure and code

Re: Mixed Model Spline Regression Question

Re: Design of experiments across the time [how to improve your questio...

Re: Design of experiments across the time

Re: Mixed modelling, help with correct procedure and code

Re: creating residual variable at proc glimmix

Re: Mixed modelling, help with correct procedure and code

Re: PROC GLM WITH BINOMIAL RESPONSE VARIABLES

Re: Setting up a mixed model

Re: lsmeans "adjust=" not working