About StatDave

StatDave

I suspect that you ultimately want to make inferences about the effects of L and N at the overall population level which suggests using a marginal, population-averaged model like a GEE model. That could be done with a model like the following using your CHANNEL variable which seems to be unique for each row. This treats the 3 rows in each L/N/V/week combo as independent. It includes DAT (week) as a continuous, linear effect in the model since this is reflects time. TYPE=AR allows the correlations across the weeks to depend on the amount of time between measurements, but you could choose some other structure. proc gee data = fallphysiology; CLASS channel Light Ncon Var; MODEL A = Light | Ncon | Var | DAT / type3; Repeated subject=channel / type=ar corrw; RUN; Note that the estimated correlations are very small or even negative suggesting that there might not be a need to allow for correlation among the repeated measurements. Indeed, allowing correlation might further reduce the standard errors, making some effects even more significant. The following fits the model without allowing correlations among repeated measurements. As you noted with your mixed model analysis, many effects in the model are still highly significant, including multiple interactions, presumably because the variability in the data is even smaller than the effect sizes. So, the code also plots the fitted model showing the effect of each of L, N, and V over time within each combination of the other two variables, and provides tests of those effects. proc genmod data = fallphysiology; CLASS Light Ncon Var; MODEL A = Light | Ncon | Var | DAT / type3; effectplot slicefit(x=dat plotby=ncon*var); slice Light*Ncon*Var / sliceby=ncon*var plots=none; effectplot slicefit(x=dat plotby=light*var); slice Light*Ncon*Var / sliceby=light*var plots=none; effectplot slicefit(x=dat plotby=light*ncon); slice Light*Ncon*Var / sliceby=light*ncon plots=none; RUN;

StatDave

Since you use the SLICEDIFF= option in the LSMEANS statement, I assume that you are using PROC GLIMMIX to fit your model since this option isn't in the LSMEANS statement in other procedures. Regardless, I don't see a problem with you using the statements you propose as long as they are preplanned comparisons of interest. The first LSMEANS statement you show provides comparisons of the treatments in each combination of YEAR and LOCATION. If you are concerned about the multiple comparisons it provides, you could add the ADJUST= option to apply a suitable multiple comparison method. Or you could save the p-values from the multiple tests (using an ODS OUTPUT statement) and use one of the multiple comparison adjustments available in PROC MULTTEST. However, if what you want is a single test of the treatment difference across the YEAR and LOCATION combinations, you could specify: lsmeans treatment / ilink diff e; I don't know what your response distribution is, so the ILINK option gives the individual treatment means for that distribution. The DIFF option compares the treatments averaged over a balanced distribution of YEAR and LOCATION levels as shown by the E option. Since the LSMEANS statement simply estimates linear combinations of the model parameters, it is always advisable to use the E option to see how each estimate is calculated. If you are not comfortable with a balanced distribution given the nature of your data, you can consider using the OM option to change it. But another option is to simply treat all observations as being in one of the treatments and compute their predicted means, and then do the same treating all observations as being in another treatment and comparing the averages of those two sets of predicted values. The YEAR and LOCATION values are not restricted in this as they are using LSMEANS. Those are the predictive margins for treatment. This can be done using the Margins macro. For example, the following fits the 3-way interaction model (using PROC GENMOD), computes the predictive margins for treatment and provides a test comparing them. %margins(data=your_data, class=year location treatment, response=y, model=year|location|treatment, margins=treatment, diff=all)

StatDave

Since you say you have a binary response and are using a Poisson response distribution, I assume that what you mean by "RERI" is relative risk. If so, this approach seems more convoluted than necessary. While I don't see a clear statement in terms of your predictor levels as to what you want to estimate, it might be that you want to estimate the relative risks for the separate effects of your 3-level predictor and for your binary medical condition predictor. If that is correct, then simplify this by using your separate variables in the model and using SLICE statements to do the estimation. For example, assuming your outcome response is coded 1,0 and that your adequate/inadequate/excessive predictor is called LEVEL with values 0=inadequate, 1=adequate, 2=excessive, and MEDCOND has values 'No' and 'Yes', the following gives you the relative risk for each level of each predictor at each level of the other predictor (allowing for the predictors to have significant interaction). proc genmod data=mydata; class level(ref='0') medcond(ref='No') id; model outcome = level|modcond [covariates] / dist=poisson link=log; repeated subject=id / type=exch; slice level*medcond / sliceby=level means exp cl; slice level*medcond / sliceby=medcond means exp cl; run;

StatDave

The results of that CONTRAST give you a test of the difference, with respect to lower levels of your ordinal response, between the treatments averaged over the two times. You can see this by adding the E option in the CONTRAST statement which shows the coefficients used in the linear combination of the model parameters that is estimated. The exponentiated result from the EXP option gives you an odds ratio comparing the treatments. But since there seems to be a significant interaction between treatments and time, this could be misleading. As with any regression model where interaction is found, it is best to look separately at the effect of one of the variables at each level of the other variable. It's also always helpful to get a summarizing plot of the fitted model so that you can visualize the effect of interest. You can do both of these by adding an EFFECTPLOT statement and a SLICE statement. The arthritis data in the example titled "Alternating Logistic Regression for Ordinal Multinomial Data" in the PROC GEE documentation is similar to your situation. Using just two of the visits in that data and an interaction model like yours, the following does the analysis comparing the treatments at each visit. Note that in these data, the interaction is not significant. proc genmod data=data9.arthritis; where visit in (1,3); class id treatment visit; model Rating= Treatment|visit/dist=mult; repeated subject=id; effectplot interaction(x=visit sliceby=treatment); estimate 'trt diff' treatment 1 -1 / exp e; slice treatment*visit / sliceby=visit e ilink means plots=none; run; The plot shows the difference between the treatments at each visit on each of the cumulative probabilities estimated by the ordinal model. The results from the SLICE statement shows, separately for the two visits, the cumulative probability estimates for each treatment. And then an overall test of the treatment difference is given, again for each visit. As suggested by the plot for each of the cumulative probabilities, the treatment difference is a little bit larger at visit 3 than at visit 1. This is seen in the two overall tests - in visit 1, the overall test p-value is not significant, but is significant in visit 3.

StatDave

Values in the Estimate column of the first LS-means table are on the logit (log odds) scale. In the second, differences table, they are differences of log odds which are log odds ratios. When you add the ILINK option, the values in the Mean column are on the event probability scale. The Estimate column is still on the log odds scale. With the ODDSRATIO option, the Odds Ratio column in the second, differences table are the odds ratio estimates - just exponentiating the differences of the log odds.

StatDave · ‎08-14-2025

Regarding the overall relative risk, in the model with interaction it would be the average of the two differences of ANYBPZ in the HEALTH levels (or vice versa if preferred). You could do that with an LSMESTIMATE statement like this: lsmestimate anybpz*health 'anybpz RR' 0.5 0.5 -0.5 -0.5 / exp cl; But since there is no strong evidence of an interaction, that estimate would probably be pretty close to the estimate form a model without interaction which you would get from your LSMEANS statement with the DIFF and EXP options. I'm assuming you are fitting a model to a binary response with DIST=POISSON LINK=LOG and an offset so that you are modeling the log of the event probability.

StatDave · ‎08-13-2025

The main effect tests in the Parameter Estimates and Type3 tables show that ANYBPZ is strongly significant but HEALTH is not. So, it is not surprising to see that the two comparisons in the LS-means differences table that are significant differ on ANYBPZ. You might find it more useful to use a SLICE statement to analyze the interaction. The following will give tests of the ANYBPZ difference in each level of HEALTH: slice anybpz*health / sliceby=health;

StatDave · ‎08-09-2025

The ROC curve area and its confidence interval resulting from any binary-response model or classifier can be computed using the ROC statement in PROC LOGISTIC. Several examples are shown in this note. As can be seen in the note, the computations depend only on the predicted and actual classifications from the model/classifier. The method used is a nonparametric method based on U statistic theory as discussed in "Receiver Operating Characteristic Curves" in the Details section of the PROC LOGISTIC documentation. As such, I believe that if you use the predicted classifications from a proper analysis of your survey data, as could be done using PROC SURVEYLOGISTIC, then the area and confidence interval can be obtained using the ROC statement in PROC LOGISTIC. You can further investigate by seeing the DeLong et al. paper cited in the above documentation section which details the method including the variance computation.

StatDave · ‎08-05-2025

This is covered in "Odds Ratio Estimation" in the Details section of the PROC LOGISTIC documentation.

StatDave · ‎08-04-2025

The estimated odds ratios are the 1-unit changes in odds starting at each of the values you specified in the AT option. That is, the first estimate, 1.010, is the change in odds from 174 to 175. If what you want are the changes for more than 1 unit, then use the UNITS statement in addition to the ODDSRATIO statement.

StatDave · ‎07-30-2025

As mentioned in the note I referred you to in your direct standardization post, the directly standardized rate, DSR, is just a weighted crude rate, where the weight is the exposure proportion of the particular stratum in the reference population, P. Given that proportion, DSR=P*CR and stderr(DSR)=P*stderr(CR), where CR is the crude rate. This is easily computed by hand or you can use the NLEST macro to do it for you after using the Margins macro call as I showed, but adding COVOUT in OPTIONS= in the Margins macro call. For example, replace P with its numerical value in the following and use it after the Margins macro call. %nlest(inest=_margins, incovb=_covmarg, f=b_p1*P, label=direct stdzd rate)

StatDave · ‎07-29-2025

This note might prove helpful.

StatDave · ‎07-29-2025

The easiest way to obtain an estimated rate for each CID is to simply average the predicted rates within each CID. Using the SCORE statement in PROC PLM to obtain the predicted rates as described in this note, the following produces a data set of the average predicted rates for the CIDs: PROC GENMOD data=sample_data; CLASS x1 cid; MODEL y=x1-x4 /d=poisson link=log offset=log_offset; REPEATED subject=cid/ type=exch; store out=mod; run; proc plm source=mod; score data=sample_data out=pred pred stderr lclm uclm / nooffset ilink; run; proc means data=pred; class cid; var predicted; run; This is essentially a predictive margin, which is an average predicted value, but averaged only over the predicted values in a CID. It might be slightly better to get the sum of the predicted counts, not rates, in a CID and then divide it by the sum of the offset values in the CID. You can use the P= option in the OUTPUT statement to get the predicted counts. But the above only provides point estimates, not standard errors. To obtain predictive margin point estimates and standard errors and confidence intervals, you could use the Margins macro including the WITHIN= option to identify a specify CID. For example: %margins(data=sample_data, class=x1 cid, response=y, model=x1 x2 x3 x4, dist=poisson, offset=log_offset, geesubject=cid, geecorr=exch, within=cid='AC-1010', options=cl rate ) By itself, the macro only allows you to do this for one CID at a time, so you would need to run the macro for each CID. Or you could do them all in one shot by using the Margins macro with the RunBY macro as described in the Margins macro documentation and shown in the last two examples in the Results tab there.

StatDave · ‎07-27-2025

LS-means as computed by the LSMEANS statement are defined for fixed effects. So, I assume that you consider your categorical variable to be a fixed effect and you are just using the RANDOM statement and ZERO= option to do the equivalent of creating a set of dummy variables which you would then include in the model as described in the ZERO= option description. That probably explains the warning you get since no dummy variables are created. In any case, I think the way to estimate LS-means involves processing the OUTPOST= data set as you mentioned. You can use a DATA step to compute the linear combination of model parameters that define the LS-mean for each observation. If you need to know the coefficients of that linear combination, you can fit a model with the same effects in MIXED or GLIMMIX and use an LSMEANS statement with the E option. That prints a table showing the coefficients of the linear combination for each LS-mean. With that done, you can easily compute a reasonable confidence interval for the LS-means by using the quantile options in PROC UNIVARIATE. For example, using the example titled "Random-Effects Model" in the Getting Started section of the MCMC documentation, the following computes the point estimates and quantile-based 95% confidence intervals for the Gender LS-means. data lsm; set postout; lsmgf0=b0; lsmgf1=b0+b1; run; proc univariate data=lsm noprint; var lsmgf0 lsmgf1; output out=out mean=mngf0 mngf1 pctlpre=gf0p gf1p pctlpts=2.5 97.5; run; proc print; run; If you insist on computing HPD style intervals, then that should be possible with a DATA step (probably using the LAG function after sorting) following the computation of an HPD interval as described in "Summary Statistics > Highest Posterior Density (HPD) Interval" in the "Introduction to Bayesian Analysis Procedures" chapter of the SAS/STAT User's Guide. Of course, for this simple example of a single fixed effect model, it's easier to just reparameterize the model without an intercept and with separate male and female dummy variables in the data and associated parameters defined in MCMC. Then the estimates and HPD intervals are directly provided in the MCMC Posterior Summaries table.

StatDave · ‎07-24-2025

If you want the hazard ratios comparing the predicted hazard from each observation with the predicted hazard when each observation is at a fixed reference value, then you can do that by saving the predicted log hazards using the XBETA= option in the OUTPUT statement for the original data and for data with all values set to the reference. Then compute the hazard ratios from the difference and plot them. For example data ref; set Myeloma; protein=10; time=.; run; data both; set Myeloma ref(in=inref); ref=inref; run; proc phreg data=both; effect spl=spline(protein/naturalcubic); model Time*VStatus(0)=spl ; output out=out xbeta=xb; run; data xb; set out; where ref=0; lhz=xb; keep lhz protein; run; data xbref; set out; where ref=1; lhzref=xb; keep lhzref; run; data hr; merge xb xbref; hr=exp(lhz-lhzref); run; proc sort data=hr; by protein; run; proc sgplot data=hr noautolegend; pbspline y=hr x=protein; refline 1 / axis=y; run;

Online Status	Offline
Date Last Visited	Tuesday

Re: Seeking advice on repeated measurement code

Re: Using SliceDiff to address a factorial interaction

Re: Calculating RERI with Confidence Intervals

Re: I need help with a repeated measures ordinal logistic regression a...

Re: Interpreting LSMEANS in PROC LOGISTIC

Re: Interpreting interaction from proc genmod

Re: Interpreting interaction from proc genmod

Re: ROC curve and c-statistic with proper confidence intervals for Com...

Re: Need help with interpreting a logistic regression result with rest...

Re: Need help with interpreting a logistic regression result with rest...

Re: Model for Correlated data

Re: Seeking advice on repeated measurement code

Re: Using SliceDiff to address a factorial interaction

Re: I need help with a repeated measures ordinal logistic regression a...

Re: Interpreting LSMEANS in PROC LOGISTIC

Re: Interpreting interaction from proc genmod

Re: Seeking advice on repeated measurement code

Re: Using SliceDiff to address a factorial interaction

Re: Calculating RERI with Confidence Intervals

Re: I need help with a repeated measures ordinal logistic regression a...

Re: Interpreting LSMEANS in PROC LOGISTIC

Re: Interpreting interaction from proc genmod

Re: Interpreting interaction from proc genmod

Re: ROC curve and c-statistic with proper confidence intervals for Com...

Re: Need help with interpreting a logistic regression result with rest...

Re: Need help with interpreting a logistic regression result with rest...

Re: Adjusted rate from a GEE model

Re: Regression based standardization of rates

Re: Adjusted rate from a GEE model

Re: PROC MCMC Include Random Variables in BEGINNODATA/ENDNODATA Statem...

Re: How to Plot Spline Curve Using a Reference Value in PROC PHREG?