BookmarkSubscribeRSS Feed
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Here's a link to a comment by Bill Venables about dist=normal link=log that might be of interest

https://stat.ethz.ch/pipermail/r-help/2004-December/062683.html

Susan

jrbrauer
Fluorite | Level 6

Ok, now that I have a better idea about your data/question and some models are converging, let's see if I can help (or muddy the waters even more)...

"I am trying to analyze a dataset where each subject has 12 repeated measures (quarterly over 3 years). I want to extract subject specific estimates of the time slope..."

** This statement implies that you want subject-specific solutions, not population-averaged as estimated in GEE, so I would stick with Glimmix (though I agree with Steve's implied point that there are a lot of ways to tackle the question you seem to be asking). **

"to evaluate if the subjects are changing significantly over time."

** More on this later. 


"The code I currently have consistently suggests that each subject is demonstrating a highly significant increase over time. This seems unlikely but I'm not sure how to adjust my syntax to run a more accurate model. Does anyone know how/why this model would find the slope coefficient for time significant for all cases?"

** Why is this unlikely?

** Just to be sure, you are interpreting the "fixed" effect slope coefficient and not the random coefficient, right? 

** Is the linear growth trajectory adequate for these data (I'd be surprised if so with 12 time points and assuming temporal variability in billing cycles, subject payments, and new procedures, but I have no idea). Does inspection of subject-specific XY plots of billing "counts" by quarterly billing period confirm a consistent linear increase for most/all subjects?

"A quick description of the study: We are creating a trending report which should flag procedure codes (subjects) that are showing a significant increase in the number of times it was billed over the time period being analyzed (3 years, by quarter). The outcome variable is being treated as a count (bounded at 0 but not necessarily whole numbers). ... To answer some of the above questions- I am getting some fractional values because I am attempting to analyze medical billing data. For some procedures (like those involving anesthesia), billing is done in 15 minute increments (thus, we could end up with counts at 1.5 as a hour and a half of anesthetic administration). This is not generally the case, however. If I additionally log transformed these counts, would a lognormal distribution be better fit?"

** Does it make sense to code your billing counts consistently across subjects and procedures? If it is possible and makes sense to do so, I would. In other words, if it is generally the case that one procedure = 1 bill count, then an anesthetic administration billed as 1.5 hrs (or 6 15-min increments) should also be coded as = 1 bill count. It is important that the Y values mean the same thing (as much as possible) in all cases.

** I personally recommend against indiscriminately logging the Y values to attain a more normal distribution of residuals or a better model fit. Doing so typically has substantial implications for interpretation of results that are too easily overlooked. In this case, I would try to recode to create a count (or at least a discrete whole-integer) variable and then apply the appropriate modeling strategy while specifically diagnosing and addressing sources of heteroskedasticity or other residual/fit issues.

In regards to the concerns of zero inflation-This was absolutely a concern initially. However, I have filtered for a complete case analysis where there is observed data in all the time points.

** Does this mean subjects' observations are only retained if they have an observation > 0? In other words, a subject that received one bill in time 2, 3, and 4 but no bills in 1, 5, 6, 7+ periods would only have three observations? Or, is a subject coded as having a 'zero' observation if they did not receive a bill in a particular billing period (e.g., in the example, the subject billing variable would be coded 011100000000 across the 12 observations)? The answer to this question is essential for interpreting your models.

Assuming you want "to evaluate if [and how] the subjects are changing significantly over time" overall then I would personally start by estimating the following model (before estimating subject-specific fixed effects models):

proc glimmix data=dataset ;

    class code;

    model billing_count = period_count / dist=nb;

    random intercept period_count / subject=code type=un;

    run;

** Does this model converge? If not, try the simpler "type=vc" (Like Susan, I am skeptical that an autoregressive structure is appropriate for these data, but I could certainly be wrong).

** Does this model fit better than alternatives that specify "dist=normal" (or "dist=p")? (Compare -2LL and BIC across models)

** Does this model fit better than alternatives that specify non-linear growth parameters? (Compare significance of fixed effects coefficients for quadratic/cubic terms and IC statistics). For example, compare the following:

proc glimmix data=dataset ;

    class code;

    model billing_count = period_count period_count*period_count/ dist=nb;

    random intercept period_count / subject=code type=vc;

    run;

** Also try including the quadratic term (period_count*period_count) in the random statement and check significance of coefficient.

That's enough for now. I need some sleep... Smiley Wink

Good luck!

Jon

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 16 replies
  • 3695 views
  • 0 likes
  • 4 in conversation