SAS Support Communities

wateas

I did tinker around with two-part and zero-inflated models a bit. What I found, and please take this with a grain of salt, as I'm a n00b, but the models fell apart with the factorial structure of my study. I understand that these types of models require substantial data, and when all my experimental units are parsed by the treatments, there are only 4 replicates in a blocked design. I did have some success with zero-inflated and hurdle models with a simpler model using a single, continuous predictor (yield).

wateas

Thanks all for the input. @SteveDenham setting the iteration limit to a higher value allowed the procedure to execute properly, and the results look good. The residuals look to be the best that I've seen (minimal fanning and patterns), and the results themselves are congruent with my knowledge of the data. I don't appear to have issues with the lack of an upper bound, as only a few of the observations get close to the total plot area (~500 sq ft, highest observations are ~350 sq ft). Just a few remaining questions: Is adding a small constant (+1) to the response variable kosher for preventing the loss of data from zeros getting tossed out? From my perspective, most of these plots have at least some small degree weed pressure, so it makes sense in that regard. I'm wondering more so if there is some subtlety related to the statistics that I may not be realizing. Can I use the ilink option to obtain lsmeans in the original units of the response variable as shown below? lsmeans Trt_Amend_App / lines adjust=tukey ilink; lsmeans Trt_CC / lines ilink; Thank you all!

wateas

Thanks for your reply @SteveDenham. To answer your first question, the plots are identical in size, or at least very similar. The code below is showing how I've attempted to fit the model using a gamma distribution in GLIMMIX. I've added a small constant (+1) to avoid zeros (variable named "Area_Weed_Plus"). I've followed the code with a few screenshots showing the output and the failure to converge. proc glimmix data=df plots=studentpanel method=rspl; class Trt_Amend_App Trt_CC ID_S Block; model Area_Weed_PLUS = Trt_Amend_App | Trt_CC / dist=gamma link=log; random Block; run; So far, the most promising model that I have is the normal distribution in GLIMMIX with the arcsine-transformed response variable. That model shows some fanning in the residuals, but so have the other models that I've tried. The gamma and binomial models successfully converge when Block is not included as a random effect. Thank you for your time and attention.

wateas

You're not concerned about that fanning in the residuals plot? I'm still getting accustomed to interpreting residuals plots, so its hard for me to know what constitutes an issues vs. a "moderate" deviation that the model should be robust enough to handle.

wateas

@StatDave wrote: If you know the number of square feet in each plot and you measure the number of square feet with weeds above the required height, then the ratio is just a binomial proportion that you could model with an ordinary logistic model. No need to do any transformation. With a data set containing one observation per plot and with a variable containing the number of affected square feet in the plot and another with the total number of square feet in the plot, you could use the events/trials response syntax in PROC LOGISTIC to fit the model. model Naffected/Ntotal = ... ; You could include your BLOCK variable in the model if you have blocks of plots. Or, if you really want to use a random effect, then you could use the same model syntax, with DIST=BINOMIAL, in PROC GLIMMIX. I've only seen binomial distribution used in the context of binomial outcomes (yes/no, live/die, etc.). It makes sense to me why I could use the binomial distribution, but I'm concerned that a reviewer would take issue using a response variable that is continuous in nature. If you happen to be aware of any examples or tutorials of using a binomial distribution for a continuous response variable, anything you can pass along would be much appreciated!

wateas

I have data on the severity of weed pressure for an agricultural study with a factorial, RCBD. The weed pressure variable is the area in square footage within a plot where weeds were measured to be above a certain height. It is an atypical metric for weediness, and I have not been able to find anything in the literature like it, so I'm unsure of how to analyze this data. The raw, untransformed variable "Area_Weed", in units of square feet, exhibits right-skewness. A few of the observations are zero. The histogram is shown below: One approach that I've tried is to convert the response variable "Area_Weed", in square feet, to a proportion of weed area divided by total plot area, the apply an arcsine transformation. The residuals plot exhibits non-constant variance. The code and residuals are shown below: proc glimmix data=df plots=studentpanel method=rspl; class Trt_Amend_App Trt_CC Block; model Area_Weed_PROP_ANG = Trt_Amend_App | Trt_CC / ddfm=kr2; random Block; run; Considering an alternative distribution for the untransformed response variable "Area_Weed", which is positive and right-skewed, the gamma distribution seems promising, aside from the fact that it can't accommodate zeros in the data. If I apply a transformation of adding a small constant (+1) then I can get the model to run using PROC GENMOD, with the caveat that I can't include the variable "Block" as a random effect. proc genmod data=df plots=all; class Trt_Amend_App Trt_CC Block; model Area_Weed_PLUS = Trt_Amend_App | Trt_CC / dist=gamma link=log; ods output ParameterEstimates=pe; output out=outmean pred=mu; run; My questions amount to the following: Based on what I've presented here, is the gamma distribution appropriate for my response variable, which is positive, right-skewed, and includes a few zeros? If I use PROC GENMOD and a gamma distribution to model, how might I go about evaluating the residual plots? What might be a good option for the parameter plots aside from "all"? If I attempt to incorporate Block as a random effect using PROC GLIMMIX, the model does not converge. How can I address this issue? Thank you for reading. Please let me know if I can provide any other information.

wateas · ‎12-23-2024

I can't use LSMEANS for a multinomial distribution. The issue is not interpreting the content of the solution table, but rather the odd ratios table. The multinomial model is modeling the probabilities of a lower value of the response variable "weed_rating" (screenshot below). The odds ratio table is easier to interpret when the first level has a lower response score than the second level, as the odds ratio estimate is a measure of the odds that the first level will have a LOWER score than the second level. So if the estimate is 10, then the odds are 10x more likely that the first level will have a lower score than the second. When the odds ratio is less than 1, the interpretation is not as intuitive to me, so it would be advantageous if I can rearrange factor level orders to accommodate that and make my life easier. Here is a good link for reference and further detail. Of course, I can simply manually rename my factor levels so they will have the desired ordering, but I thought I would try for a more sophisticated approach at first. It is easy to reorder factor levels in R, so I thought there might be similarly easy solution in SAS. Thanks for your help.

wateas · ‎12-23-2024

I'm conducting ANOVA for a multinomial, ordinal response variable in proc glimmix. I would like to alter the order of factor levels to make interpretation of odd ratios easier, but short of renaming the factor levels, I'm not sure how to do it. The variable is "Trt_CC" and the levels are "CC" and "no_CC". The default ordering is alphabetical, but I would like "no_CC" ordered first as to help me interpret the procedure output. So far, I've only been successful at renaming the factor levels /* Change order of factor levels in factor2 such that no_CC is first. */ proc format; value $trtfmt 'no_CC' = 1 'CC' = 2; run; /* Reponse_Var is an ordered, categorical variable with 5 levels.*/ proc glimmix data=df; class Site Block factor1 factor2; format factor2 $trtfmt.; model Response_Var = factor1 | factor2 | Site / dist=multinomial link=cumlogit oddsratio(diff=all) solution; random Block(Site) / solution; run; The above code successfully renames the levels of "factor2" as numbers and provides to desired ordering, but that has obvious drawbacks with respect to interpretability. Any advice as to how to change the ordering of "factor2" so that "no_CC" is before "CC" would be appreciated. Thanks for reading.

wateas · ‎11-26-2024

Hey now that I'm looking at it again, I appear to have made a error with my strings! So the problem is solved now. Thanks for your help.

wateas · ‎11-26-2024

Edited: There was an error in my factor level strings causing the issue. The code below works otherwise. Thanks. Hey Ksharp - thank you for your response. From running PROC FREQ, I see that I can easily see the level values, but I'm not sure exactly what you mean that I need to use their 'FORMATTED values'. PROC CONTENTS indicates that these factors are all type "char". This code returned the same error, "cannot find control level for effect...": lsmeans Trt_Amend_App * Trt_CC / pdiff=control('Control-Fert-nan' 'no_CC') adjust=dunnett;

wateas · ‎11-25-2024

Using a mixed effects model in PROC GLIMMIX, I would like to determine the differences between lsmeans with a chosen control. I'm trying to mimic the method outlined in the documentation to the best of my ability, but I get an error indicating "cannot find control level". proc glimmix data=df_y1 plots=studentpanel; class Trt_Amend_App Trt_CC ID_S Block; model Yield_Grain_Mg_ha = Trt_Amend_App | Trt_CC | ID_S / ddfm=kr2; random Block(ID_S); lsmeans Trt_Amend_App * Trt_CC / pdiff=control('7' '2') adjust=dunnett; run; The control corresponds to the 7th level of factor Trt_Amend_App and the 2nd level of Trt_CC. Any help clarifying this issue would be much appreciated.

wateas · ‎11-22-2024

Regarding this issue in "Build Models", I see now that variables can be set to have a role of "partition", but that option appears to disappear after the pipeline has been run.

wateas · ‎11-22-2024

Thanks for your response. Regarding your suggest to address the issue in the "Explore and Visualize" module, if I select "new partition" after right-clicking Block I get an error: "Cannot set this item as a partition because it is already in use by a partitionable analytic". So apparently some data item in my project is interfering. Regardless, it's good to know that I have an option to do this in the Explore and Visualize Module. What about the "Build Pipelines" module? From the partition data menu in project settings: "Note: These settings are active only when a partition variable is not set within the data. Using a data source with a pre-defined partition variable or manually selecting a partition variable will override these settings." How might I go about establishing a pre-defined partition variable?

wateas · ‎11-14-2024

I'm working in SAS Visual Analytics version 4.0. I would like to partition data in Build Models, but the corresponding menu in project settings is grayed out (see screenshot below). Additionally, I would like to partition data in a specific way using my blocking variable "block". Based on the options shown above, there does not appear to be a way to do this. However, it does seem that the "Explore and Visualize" module has this capability, however, my blocking variable is not listed as an option for some reason (see screenshot below). Available are a number of categorical variables, and for some reason a single continuous variable. Note that the variable "Block" is visible in the data pane as a categorical variable with 4 levels. Ideally, my data partitions would set training, validation, and test partitions to 50, 25, and 25%, respectively, using any of blocks 1, 2, 3, of 4. Thanks for reading!