About sld

sld · ‎04-01-2017

A well-posed question 🙂 First, create a balanced data set so that you aren't trying to juggle the impacts of unbalanced data while you sort out syntax. data newtest; call streaminit(33445); do id=1 to 10; rid=rand('normal'); *random effect for subject=id; do trt= 1 to 3; if trt in (2,3) then trt2=2; else trt2=trt; do time=1 to 2; y=trt + trt*time + rand('normal') + rid; output; end; end; end; run; proc tabulate data=newtest; class trt trt2; table trt, trt2; run; Then run your two models. Note that the estimates of the difference now match, but SEs and DFs do not. The fundamental difference in the two models lies in the REPEATED statement. The first model using repeated time / subject=id(trt) type=cs; identifies 30 subjects (10 IDs for each of 3 TRTs). But the REPEATED statement in the second model using repeated time / subject=id(trt2) type=cs; identifies only 20 subjects (10 IDs for each of 2 TRT2s). Consequently SEs and DFs differ. If my experiment randomly assigned 3 treatments to 10 subjects per treatment so that I actually had 30 subjects in total, I would use the first model rather than the second because the first model preserves the experimental design; the second makes up a new one.

sld · ‎03-30-2017

This looks like the same study on which you've posted multiple questions and gotten multiple responses. If you are continuing to have issues with normality, perhaps it is because your "score" is inherently non-normal. If you provide the community with more information about your data--specifically, what the nature of your "score" is, what it measures, what values it takes, even post the dataset--someone may be more likely to be able to suggest a solution. That said, if the response (conditional on the predictors) is non-normal then, well, it's non-normal. Whether you have to transform to achieve normality or whether you use a generallized linear model with a non-normal distribution, there will be interpretation issues with respect to the original scale. It's just the nature of the beast.

sld · ‎03-28-2017

Oh, yes, switch to MIXED or even (my favorite) GLIMMIX! This example in the MIXED procedure may produce what you are looking for, if what you want is a random coefficients model https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect034.htm Using that example as a template, and throwing in a plot, your code would look like proc sgplot data=example; reg x=X y=Y / group=ID; run; proc mixed data=example; class ID; model Y = X / solution; random Intercept X / type=un sub=ID solution; run; "type=un" specifies random intercepts, random slopes, and covariance between intercepts and slopes. You may find that a simpler structure (e.g., setting covariance to zero, or only random intercepts) might provide a better fit to your data. The estimates of regression parameters (intercept and slope) for the population of IDs is produced by the SOLUTION option on the MODEL statement.

sld · ‎03-25-2017

To address your research question, I think that you would want to compare the slope of the regression of HeartBeat on DrinkLevel for the pre level of Trial to the regression of HeartRate on DrinkLevel for the post level of Trial, keeping in mind that these two regressions exist for (and are paired by) each patient. Do you agree? If so, then this is a form of random coefficients model, and I would consider this code (which, of course, I have not tested and cannot guarantee) using variables as defined by @peter9 : proc glimmix data=have; class Trial Patient_ID; model HeartBeat = Trial | DrinkLevel / solution; random intercept / subject=Patient_ID; random intercept DrinkLevel / subject=Patient_ID*Trial; run; You'd want a fair number of different DrinkLevels within each Trial for this model to work well and/or "well-behaved" data.

sld · ‎03-17-2017

That's helpful, thanks. ANCOVA might not be a bad approach after all. When I'm embarking on an ANCOVA, the first thing I do is plot data. Let post be the post-intervention score, pre be the pre-intervention score, and exposure be the binary "whether prior exposure" variable. Plot post versus pre, distinguishing by exposure, and add a reference line that depicts pre=post: proc sgplot data=have; scatter x=pre y=post / group=exposure; lineparm x=0 y=0 slope=1; run; Things to look for: (1) Are the two exposure scatters sitting on top of each other (implying no effect of exposure), or are they shifted in some way? Up or down, left or right. (2) Are the relationships between post and pre for each exposure group linear? (In its basic form, ANCOVA assumes linearity, as well as normality and homgeneity of variance.) (3) Are the relationships between post and pre for each exposure group parallel to the reference line? If so, then the difference between post and pre does not depend upon the value of pre--the difference is constant. In this case, you could use the model that @Reeza suggested proc glm data=have; class exposure; model diff = exposure; run; where diff = (post - pre). Essentially, this is your paired t-test with exposure added. (4) If one or both relationships are not parallet to the reference line, then the difference between post and pre depends upon the value of pre, and an analysis of the response diff = (post - pre) would be a non-optimal choice. Perhaps, for example, the difference increases as pre increases. ANCOVA is useful in this scenario. proc glm data=have; class exposure; model post = pre exposure; run; (5) If the two relationships are not parallel to each other--if the slopes of the two linear regressions are not equal--then add interaction to the model. proc glm data=have; class exposure; model post = pre exposure pre*exposure; run; Chapter 7 in this text deals with ANCOVA and would probably be useful https://www.sas.com/store/books/categories/usage-and-reference/sas-for-linear-models-fourth-edition/prodBK_56655_en.html Note that the covariate (here, pre) is centered in the mathematical model: the mean of X is subtracted from each value of X. Another resource here: https://onlinecourses.science.psu.edu/stat502/node/183

sld · ‎03-16-2017

(1) Are you looking at the distribution of the post values all together prior to fitting a model, or are you looking at the distribution of the residuals after fitting the model? The normality assumption applies to the response variable y conditional on the predictor variables x. For ANOVA, this means that y is normally distributed within each treatment level; clearly, if you have few replicates within each treatment level, your ability to assess the normality assumption is limited or even nonexistent. For regression, this means that y is normally distributed within each level of x; again, if you don't have lots of replicates within each level of x, you will not be able to assess this assumption prior to fitting a model. What are we to do? We look at the distribution of the residuals. (2) Are post and pre in your study measuring the same variable? If pre is "whether or not the subject had prior exposure to the task performed", then it sounds like a categorical variable (yes/no), and if so, it is not a continuous covariate as needed for ANCOVA and it should be listed in the CLASS statement. Is post measured on a continuous scale? Is covariate measured on a categorical scale (as I would expect given that it is in the CLASS statement)? I'm beginning to think that you are on the wrong track entirely and that the appropriate model might be something like a two-way factorial ANOVA-like model. But you haven't provided enough information to tell. If you don't provide enough detail about your study design and your variables, you risk getting a correct answer to the wrong question.

sld · ‎03-16-2017

model change = pre cov pre*cov; would not be appropriate. You could augment the code provided by @Ksharp as model post = pre cov pre*cov; The interaction allows the regression of post on pre to have different slopes for each value of cov. As @Ksharp notes, these models fall under analysis of covariance. You'll want to get up to speed with ANCOVA before you try to make sense of your results; ANCOVA is trickier than it appears on first glance, IMO. See this example in the GLM documentation: https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_glm_sect049.htm

sld · ‎03-16-2017

Is the MIXED procedure also telling you that the estimated G matrix is not positive definite? Most likely, this result implies that there is little variation among reps for this response variable. The procedure has set the estimate to zero (hence, the SE is missing) and continued on its merry way. See Section III in this paper http://support.sas.com/resources/papers/proceedings12/332-2012.pdf In the documentation https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect028.htm under Parameter Constraints, you'll find: "For some data sets the final estimate of a parameter might equal one of its boundary constraints. This is usually not a cause for concern, but it might lead you to consider a different model. For instance, a variance component estimate can equal zero; in this case, you might want to drop the corresponding random effect from the model. However, be aware that changing the model in this fashion can affect degrees-of-freedom calculations. If that doesn't seem to be the reason, then you'll want to post your code and an example dataset so that people here have more to work with.

sld · ‎03-16-2017

Do you mean the interaction of group and strata at each value of time? In other words, a two-way interaction rather than three-way? If so, then you can do this using the LSMESTIMATE statement, with the AT option to specify an appropriate value for baseline_score at which to estimate the lsmeans. It's worth taking the time to figure out how to use the nonpositional syntax for LSMESTIMATE. See https://support.sas.com/resources/papers/proceedings11/351-2011.pdf

sld · ‎03-06-2017

You're welcome. I'm happy it's making sense. Using AIC based on ML works for comparing models that differ in fixed and/or random effects. (AIC based on REML is good only for models that differ in random effects.) Variance estimates under REML are thought to be less biased than those under ML (although extent of bias depends on sample size), so sometimes you'll see people select a model using ML, then refit with REML. But if your sample size is adequate and you are not aiming for a parsimonious model, this may not matter much to you this time. Regarding your question about interpretation: in this case (with no covariates measured at Level1), I don't see that moving from a 3-level model to a 2-level model by averaging over Level1 alters your interpretation; it just makes the model simpler.

sld · ‎02-25-2017

I'm happy my response was coherent and, better yet, useful to you. You get to choose how to build your model. If the number of observations is small relative to the number of predictor variables, then you may have to implement some form of variable selection to avoid overfitting. If the number of observations is large, then you might throw everything in (assuming no problems with multicollinearity, etc.) even if some predictor variables are not significant. Or you might want a parsimonious model. If you are assessing the support for fixed effects factors, e.g., Xmean, and want to use an information criterion to decide, then remember that you must use a true maximum likelihood method (method=ml) rather than the default (method=reml) for the IC to be valid. I'd try something like this (which I have not tested). It's close to what you have but includes everything apart from the kitchen sink, and the naming of things is different. Your new dataset looks fine. /* Compute means over Level1 within Level2*Level3 */ proc sort data= SampleData; by level3 level2; run; proc means data= SampleData noprint; by level3 level2; id c d e f x b; var y; output out=SampleData_means mean= ; run; /* Compute Xmean and Bmean (means of variables measured at Level2) */ proc means data=SampleData_means noprint; by level3; var x b; output out=level3_means mean= xmean bmean; run; /* Merge means with data */ data SampleData_means_2; merge SampleData_means level3_means; by level3; run; /* Fit really busy, very full model */ proc mixed data=SampleData_means_2 covtest; class level3 ; model y = c d e f xmean bmean x b / ddfm=satterth s; random intercept x b / subject=level3 type=un; run; Regarding unequal numbers of levels of Level1: Paraphrasing and quoting from a chapter from Carl Schwarz's online text http://people.stat.sfu.ca/~cschwarz/Stat-650/Notes/PDFbigbook-SAS/SAS-part017.pdf : If the number of levels of Level1 varies among levels of Level2, then levels of Level2 with more levels of Level1 will have smaller sampling uncertainty than levels of Level2 with fewer levels of Level1. Consequently, "process + sampling error is not constant over the entire regression line. Estimates are still unbiased, but not fully efficient." (p 1104) The analysis using means is approximate to the extent that the Level1 data are unbalanced across the levels of Level2. If the unbalance is less, then the approximation is "less approximate". Hoping that makes sense. If you don't have a lot of levels for Level3, you may have estimation problems with type=UN. You could try UN(1), which sets the covariance to zero, then you just have random intercepts and random slopes. (My datasets rarely allow estimation of the covariance, they are just too small.)

sld · ‎02-23-2017

Thank you for clarifying your data set structure. I'm more sure now that we're on the same page. Focusing on the covariate X: Because there are multiple (X,Y) observations (one for each level of Level2) within each level of Level3, we (or more properly, the statistical model) are able fit a regression of Y on X for each level of Level3; as you note, this set of regressions may have appreciable variance among intercepts, variance among slopes, and covariance between intercepts and slopes. These (co)variances are derived from the multiple Level3 regressions. Consequently, although you can assess whether there are random intercepts and random slopes, I'd say that assessment is "among" levels of Level3; there is no random intercept/slope among levels of Level2 because the model is using the different levels of Level2 (within each level of Level3) to define the regressions. I hope that make sense. I failed to define "Xmean" and "Bmean". Xmean is the mean of the X values over the levels of Level2 for each level of Level3--it's like moving the X values up a tier, from Level2 to Level3, as if Xmean was measured at Level3. I hope that makes sense, too. This concept is addressed in the Singer paper (SES and MEANSES) I linked in an earlier response. Although I didn't intend them as centered variables, they certainly could be, and are in the Singer paper. If you center them correctly, both should be variable (i.e., not constant zero, although the mean would be zero). Should you center? Your call. If the model includes interactions (including polynomial terms, like X*X), then centering is very useful and potentially does reduce collinearity. In a model without interactions, it's less critical, I think. Centering doesn't hurt; you just have to rescale results to un-do centering if you want results on the original scales. Should you include Xmean and Bmean? Again, your call. If it was me, because there are no covariates at Level1, I would compute the mean Y over the levels of Level1 for each level of Level2 within each level of Level3 and then use the mean Y as the response in the simpler, two-level model. Nothing wrong with an easier life 🙂 You would then be able to omit the second RANDOM statement. If the number of levels of Level1 are the same for all combinations of Level2 and Level3, then the statistical tests for fixed effects will be very similar, if not identical, to those from the three-level model. If the number of levels of Level1 varies dramatically among combinations of Level2 and Level3, then I might keep the three-level model. I haven't looked in any detail at the paper you found with the macro for assessing assumptions. If you adequately understand how the macro is addressing assumptions, and know what the assumptions are and how to extract what you need from the MIXED procedure, you theoretically would be able to extend the methods to a three-level model. In a sense, your statistical model is a multiple regression in a mixed model, so you have all the assumptions associated with multiple regression plus the assumptions associated with a mixed model. A busy task, but not horribly difficult. Good luck!

sld · ‎02-23-2017

Either I'm not understanding your study design or it is not described correctly. I'm looking at the sample dataset you attached. If variables C, D, E, and F are covariates measured at Level3, then I would expect to see the same value for each variable (C, D, E, or F) for all observations with the same value of Level3. For example, if Level3=1 then C=12 regardless of the values of Level2 and Level1. However, I see different values for C for Level3=1. Likewise, I would expect to see the same value for variable X or B for all observations with the same value of Level2. I'm perplexed. IF your dataset had an appropriate multilevel structure (which I am not yet convinced of) and IF I correctly understand your design (which I also am not yet convinced of), then I would consider the following model: PROC MIXED data=SampleData covtest; class level2 level3; model Y= C D E F Xmean Bmean X B / ddfm = SATTERTHWAITE s; random intercept X B / subject=level3 type=UN; random intercept / subject=level2; run; BUT I would think of this model as merely a first attempt, definitely not a final model. Even if this model is correct to some degree, there are many data characteristics and assumptions to be assessed (normality, homogeneity of variance, linearity, multicollinearity issues, which TYPE to use in the RANDOM statement, etc.). If you want to respond to this message, please post a reply rather than edit your original message. It will be easier to track the discussion that way.

sld · ‎02-15-2017

Look to the LSMEANS birdtype output. Each line reports the predicted estimates and the test of whether "Estimate" is equal to zero for the specified birdtype. Note that the mean of the "Estimate" column over the 3 birdtypes is equal to the "Estimate" for intercept. The same thing is roughly true for the "Mean" column, but not exactly due to the nonlinear nature of the logit link.

sld · ‎02-15-2017

To be clear, I did not recommend heterogeneous variances by Group. You can test whether the heterogeneous variances model fits better using the COVTEST statement, or by doing a likelihood ratio test by hand. I would do that test before deciding to go with a more complex model; the homogeneous variances model might be good enough. The intercept is the mean over pair, day, and birdtype. The p-value tests whether the intercept (on the logit scale) is equal to zero. A zero value on the logit scale is equivalent to a value of 0.5 on the proportion scale. If you look at the ESTIMATE output, you'll see that the point estimate for intercept on the proportion scale ("Mean") is 0.4486, with a 95% CI of (0.4257, 0.4717), which does not include the value 0.50 and so is consistent with the intercept test reported in the Type III Tests of Fixed Effects table. Birds prefer CFL over LED by a relatively small margin (55% to 45%). If you had evidence of a difference among birdtypes, you would use the LSMEANS output, possibly adding pertinent LSMEANS statements for interactions if interactions were significant. For these data, proportionLED is similar for birdtypes: none of the terms that include birdtype is significant. Hence, the intercept is a valid summary. What is the difference between birdtype "CFL laye" and "LED laye"?

Online Status	Offline
Date Last Visited	‎01-22-2021 05:52 PM

Re: The appropriate econometric model when the dependent variable is p...

Re: Repeated measures analysis with SAS: specifying the variable

Re: MIxed models fixed and random effects

Re: generalized and general linear mixed effects model on RBD

Re: MIxed models fixed and random effects

Re: generalized and general linear mixed effects model on RBD

Re: generalized and general linear mixed effects model on RBD

Re: Longitudinal growth model using proc mixed

Re: Longitudinal growth model using proc mixed

Re: Interpreting PROC GLIMMIX output

Fix for blurry editor fonts in Windows 10

Re: Clarification needed for glimmix covariance parameters test

Re: Clarification needed for glimmix covariance parameters test

Re: The appropriate econometric model when the dependent variable is p...

Re: Requesting aid in understanding how to use SAS to build a multiple...

Re: lsmeans "adjust=" not working

Re: The appropriate econometric model when the dependent variable is p...

Re: Repeated measures analysis with SAS: specifying the variable

Re: generalized and general linear mixed effects model on RBD

Re: MIxed models fixed and random effects

Re: How Do I Combine Multiple Dose Groups to Compare Against Placebo

Re: Non-parametric ANCOVA for single group pre/post data

Re: Slope and intercept in repeated measures linear regression using P...

Re: Pre-Post data with exposure-varying intervals

Re: one group pre post data

Re: one group pre post data

Re: one group pre post data

Re: PROC MIXED with RANDOM fails to ESTIMATE Random Effect coefficient...

Re: proc mixed 3-way overall interaction p-value

Re: How do I write syntex for a nested 3-level multilevel model

Re: How do I write syntex for a nested 3-level multilevel model

Re: How do I write syntex for a nested 3-level multilevel model

Re: How do I write syntex for a nested 3-level multilevel model

Re: Proc Mixed for animal preference study (can't even tell what kind ...

Re: Proc Mixed for animal preference study (can't even tell what kind ...