BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
rsanchez87
Obsidian | Level 7

Hi All, 

 

I am conducting a time series analysis on rates of health care utilization, I understand PROC AUTOREG is the most appropriate option in SAS. However, I do not have SAS/ETS. I have read several articles that use PROC Reg and call DWPROB to test for autocorrelation. Other sources have used Proc GLM or Proc GLIMMIX, but I do not think these can account for autocorrelation. We have designed this study well, but the intervention itself may not be strong. Are there any SAS alternative(s) for PROC Autoreg (not in SAS/ETS) that I can leverage. 

 

The proc reg and proc glimmix models are:

 

PROC REG DATA = DATA; 
	MODEL CHANGE  = TIME POST TIME_AFTER / DWPROB; 
RUN;

PROC GLIMMIX DATA = DATA;
MODEL CHANGE = TIME POST TIME_AFTER/ SOLUTION CHISQ CORRB;
RANDOM _RESIDUAL_;
RUN;

 

The Darwin & Watson test returns a value of 1.980, which suggests to me there is no autocorrelation between these variables and the outcome.

  

PROC Reg and Prog GLIMMIX produce the same result - I am not sure if this is correct.

 

Parameter Estimates
Effect, Estimate, Standard Error, DF, t Value, Pr > |t|
Intercept, 0.03472, 0.1190, 80, 0.29, 0.7713
Time, 0.01221, 0.004228, 80, 2.89, 0.0050
Post, 0.04177, 0.1800, 80, 0.23, 0.8171
time_after, -0.022390, .007764, 80, -2.88, 0.0050
 

This output would then suggest that the intervention resulted in a statically significant reduction of rates, albeit very small (-2% reduction). 

 

Looking forward to your ideas, questions, concerns, clarifications, all! 

 

Description of data: 

Change = change in rates of utilization between intervention and control

Time = sequential number for the time period (84 records representing all months between 2011 and 2017)

Post = 0 or 1 value for "no intervention" (0) or "intervention" (1) period

Time_After = sequential number starting on the first intervention month on 01/01/2015

 

Full data below: 

SRCTIMEEFFECTIVE_PERIODPOSTPERIODINT_RATECONTROL_RATEINT_CLAIMSCONTROL_CLAIMSCHANGECLAIMS_CHANGETIME_AFTER
ALL11/1/20110NO INTERVENTION4.64.06137712190.541580
ALL22/1/20110NO INTERVENTION3.913.67126511110.241540
ALL33/1/20110NO INTERVENTION4.64.43156614400.171260
ALL44/1/20110NO INTERVENTION3.423.8611371155-0.44-180
ALL55/1/20110NO INTERVENTION3.884.0413391274-0.16650
ALL66/1/20110NO INTERVENTION3.694.2811821207-0.59-250
ALL77/1/20110NO INTERVENTION3.793.56120810270.231810
ALL88/1/20110NO INTERVENTION3.623.621229110201270
ALL99/1/20110NO INTERVENTION4.574.15138611290.422570
ALL1010/1/20110NO INTERVENTION4.474.23135412200.241340
ALL1111/1/20110NO INTERVENTION4.864.24145912560.622030
ALL1212/1/20110NO INTERVENTION3.943.85123311290.081040
ALL131/1/20120NO INTERVENTION4.593.84147312270.752460
ALL142/1/20120NO INTERVENTION4.414.1136913330.31360
ALL153/1/20120NO INTERVENTION4.514.5714871405-0.07820
ALL164/1/20120NO INTERVENTION4.184.02134613100.16360
ALL175/1/20120NO INTERVENTION4.974.65166414110.332530
ALL186/1/20120NO INTERVENTION4.623.81140912280.811810
ALL197/1/20120NO INTERVENTION3.94.1712541269-0.27-150
ALL208/1/20120NO INTERVENTION4.213.74126711510.471160
ALL219/1/20120NO INTERVENTION3.824.1411041221-0.32-1170
ALL2210/1/20120NO INTERVENTION4.394.27122312760.13-530
ALL2311/1/20120NO INTERVENTION3.954.0511281156-0.11-280
ALL2412/1/20120NO INTERVENTION3.633.349969780.29180
ALL251/1/20130NO INTERVENTION5.123.88145710251.244320
ALL262/1/20130NO INTERVENTION3.563.1811068940.382120
ALL273/1/20130NO INTERVENTION4.023.73120610180.291880
ALL284/1/20130NO INTERVENTION4.374.09133711300.282070
ALL295/1/20130NO INTERVENTION4.123.91120411210.21830
ALL306/1/20130NO INTERVENTION3.863.5810379830.27540
ALL317/1/20130NO INTERVENTION3.833.8910761022-0.06540
ALL328/1/20130NO INTERVENTION3.964.0710001118-0.11-1180
ALL339/1/20130NO INTERVENTION4.73.9310849910.77930
ALL3410/1/20130NO INTERVENTION4.74.13121311470.57660
ALL3511/1/20130NO INTERVENTION4.243.1910988721.052260
ALL3612/1/20130NO INTERVENTION3.713.169358310.541040
ALL371/1/20140NO INTERVENTION4.773.5512638911.223720
ALL382/1/20140NO INTERVENTION3.923.339999030.59960
ALL393/1/20140NO INTERVENTION5.264.54140611350.722710
ALL404/1/20140NO INTERVENTION5.014.64133912190.371200
ALL415/1/20140NO INTERVENTION5.184.69131712030.491140
ALL426/1/20140NO INTERVENTION4.484.39117510610.091140
ALL437/1/20140NO INTERVENTION4.394.36116310920.03710
ALL448/1/20140NO INTERVENTION3.723.8110331003-0.09300
ALL459/1/20140NO INTERVENTION5.394.18141510791.213360
ALL4610/1/20140NO INTERVENTION5.184.41138112100.771710
ALL4711/1/20140NO INTERVENTION4.093.5611559230.532320
ALL4812/1/20140NO INTERVENTION4.723.87127810790.851990
ALL491/1/20151INTERVENTION5.154.22141811060.933121
ALL502/1/20151INTERVENTION4.33.7311749940.571802
ALL513/1/20151INTERVENTION6.134.56163112211.574103
ALL524/1/20151INTERVENTION5.424.66147313000.771734
ALL535/1/20151INTERVENTION4.743.83127611500.911265
ALL546/1/20151INTERVENTION4.814.15132511440.661816
ALL557/1/20151INTERVENTION4.814.25139111310.562607
ALL568/1/20151INTERVENTION4.494.44127812210.05578
ALL579/1/20151INTERVENTION4.534.46125211350.071179
ALL5810/1/20151INTERVENTION5.524.61149312450.9124810
ALL5911/1/20151INTERVENTION4.594.33126812000.266811
ALL6012/1/20151INTERVENTION4.684.63136312200.0514312
ALL611/1/20161INTERVENTION4.814.2126711680.619913
ALL622/1/20161INTERVENTION4.364.7612411258-0.41-1714
ALL633/1/20161INTERVENTION6.014.84167913701.1730915
ALL644/1/20161INTERVENTION4.944.73140813790.22916
ALL655/1/20161INTERVENTION4.844.49140712630.3514417
ALL666/1/20161INTERVENTION5.95.07147214150.845718
ALL677/1/20161INTERVENTION5.064.73131613440.33-2819
ALL688/1/20161INTERVENTION5.594.98145413380.6111620
ALL699/1/20161INTERVENTION4.995.1413211271-0.165021
ALL7010/1/20161INTERVENTION5.094.56134313100.533322
ALL7111/1/20161INTERVENTION4.954.41133611740.5416223
ALL7212/1/20161INTERVENTION4.614.43128912340.185524
ALL731/1/20171INTERVENTION4.44.5811851356-0.18-17125
ALL742/1/20171INTERVENTION4.664.44124711870.226026
ALL753/1/20171INTERVENTION5.594.89148013990.78127
ALL764/1/20171INTERVENTION4.244.3311701244-0.09-7428
ALL775/1/20171INTERVENTION5.74.64149613361.0616029
ALL786/1/20171INTERVENTION5.164.46145112700.718130
ALL797/1/20171INTERVENTION4.413.92131812000.4911831
ALL808/1/20171INTERVENTION5.34.2145512681.1118732
ALL819/1/20171INTERVENTION5.174.65143412760.5215833
ALL8210/1/20171INTERVENTION5.055.1513791421-0.1-4234
ALL8311/1/20171INTERVENTION5.274.89131613670.38-5135
ALL8412/1/20171INTERVENTION4.354.2124311960.154736

 

Thank you!

 

Some References: 

1. (USES PROC GLIMMIX) Wong, EC. Analysing Phased Intervention with Segmented Regression and Stepped Wedge Deisngs: https://www.lexjansen.com/wuss/2014/74_Final_Paper_PDF.pdf

 

2. (USES PROC AUTOREG) Penfold, R. Use of Interrupted Time Series Analysis in Evaluating Health Care Quality Improvements https://www.academicpedsjnl.net/article/S1876-2859(13)00210-6/pdf

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

What's missing in your first reference? It seems to cover the use of GLIMMIX with AR(1) covariance structure for segmented models quite well.

PG

View solution in original post

11 REPLIES 11
PGStats
Opal | Level 21

What's missing in your first reference? It seems to cover the use of GLIMMIX with AR(1) covariance structure for segmented models quite well.

PG
rsanchez87
Obsidian | Level 7

Thank you, I overlooked including that in my logic. The estimate for Time_After is -0.0223 (p = 0.0105). 

 

How should I interpret the AR(1) output: 

AR(1) estimate: 0.05051

AR(1) SE: 0.115

Residual estimate: 0.1655

Residual SE: 0.026

 

Thank you for your help. 

PGStats
Opal | Level 21

About AR(1) estimate

 

Estimate +/- SE interval includes zero, so I would consider autocorreletion as insignificant. Keep it in anyway, it doesn't hurt.

PG
rsanchez87
Obsidian | Level 7

Thanks again for the clarification. 

 

I had to update the methodology to count visits in our process, thus changing the values of our outcome variable. 

 

Running the program again, I get an Ar(1) of 03354 (0.1134). This would mean that there is auto-correlation? 

 

I am using PROC GLIMMIX to run the stats - that is the only available procedure I have access too. Code below:

 

PROC GLIMMIX DATA = OUTFILE._MEDS0_&SRC. 
	PLOTS = RESIDUALPANEL ;
	MODEL ENCOUNTERS_DIFF = TIME POST TIME_AFTER POST*TIME_AFTER/ SOLUTION CHISQ CORRB; 
	OUTPUT OUT = _GLIMMIX_POPULATION_&SRC. PRED = REGPRED; 
	ODS OUTPUT PARAMETERESTIMATES = _MEDS0_&SRC._GLIMMIX; 
	RANDOM _RESIDUAL_ / TYPE = AR(1); 
	TITLE "PROC GLIMMIX - REGRESSION FOR: &SRC."; 
RUN;

Essentially, this is a time series between an intervention and a control group for 84 months (2011 - 2017). In 2015 (49th month), the intervention began. The goal is to evaluate the impact on the intervention group compared to themselves and the comparison group. 

 

If the AR(1) stat is showing auto-correlation, is this model appropriate? Also, the AIC value is 970 with a Chi-Sq. of 8847. 

 

I also have the data t the individual level, but the AR(1) increases, and the AIC value skyrockets. And the distribution isn't Gaussian, it appears to be negative binomial to me (as this is health care utilization data). 

 

Thoughts on how to handle this panel/time series data?

 

 

Thank you for your help!

 
PGStats
Opal | Level 21

That the AR(1) is significant is not a problem at all. Quite the contrary, it ensures the validity of your inference in the presence of first order autocorrelation.

 

AIC (or better, AICC) is useful for comparing models fitted on the same data.  The AIC cannot be used to compare models fitted on different datasets.

 

If your residuals are far from normal, you can either try to transform the data, or try fitting the data to a different distribution.

 

hth

PG
rsanchez87
Obsidian | Level 7

Thanks again. 

 

I am using health care utilization data, which is typically composed of a scattered high utilizers and low utilizers and many zeroes. 

 

This is count data per month per person for 84 months (N = 6,017 person). How would you assess the distribution here? 

 

Raw data: 

member_level_encounters_scattter.png

 

PGStats
Opal | Level 21

Histograms would be a lot more useful to guess the possible distribution(s).

PG
rsanchez87
Obsidian | Level 7

Apologies, please see below. 

 

member_level_encounters_histogram.png

 

Poisson or negative binomial? How would you proceed? 

 

 

Some thoughts: 

1. The data is first order auto-regressive (i.e. what happens last month, influences this month). I rolled up the data to the year-quarter, and it remains auto-regressive. Not surprised. There is no way around it, as this is real-world healthcare data. Still, I need Proc Autoreg - but can I transform the data or account for the autocorrelation somehow in an estimate, akin to a two-step model?. 😞

 

2. Most members have utilization of 0 at each month, would it make sense to roll the members to the year-quarter or simply the year mark to reduce zeroes? 

 

Thanks for your help!

 

PGStats
Opal | Level 21

Real life data is never pure Poisson, except maybe radioactive counts. So if you try Poisson, add an overdispersion term. The very long tail however would suggest a negative binomial distribution. In either case, there appears to be an overabundance of zeros, so if you constantly get poor fits at zero, you should try fitting a zero inflation term.

 

My strategy would be: start from the simplest model (Poisson), add overdispersion, then add zero inflation, then switch to NB and ZINB. Use AICC to compare the fits.

 

If you have access to SAS/ETS, use proc COUNTREG. otherwise, use GLIMMIX.

 

That said, I was wondering if many of your care utilisation counts refer to the same people over time. And if so, whether you should account for that. 

PG
rsanchez87
Obsidian | Level 7

THANK YOU SO MUCH! I was exploring those option, but I confused myself. Going to start fresh with that plan of action.

 

Unfortunately, I don't have SAS/ETS, so GLIMMIX it is. I will follow your guidance and report out. 

 

I am looking at the data in two ways: (1) population level, and (2) individual level. 

 

At the population level, there are 84 records, one for each month of the study with columns indicating when the intervention started, utilization for the intervention and control, the difference in utilization rates between intervention and control, and a counter for the time after the intervention. Our outcome variable here is the difference in utilization rates. The rates are nicely distributed, with a normal distribution - but they still arise from count data. However, the difference is always negative - the intervention rates are always less than the control rates. Is Poisson, NegBin, and/or ZBIN still appropriate for negative values? 

 

At the population level,  the GLIMMIX is as follows: 

PROC GLIMMIX DATA = OUTFILE._MEDS1_&SRC. 
	PLOTS = RESIDUALPANEL ;
	MODEL RATE_DIFF = TIME POST TIME_AFTER POST*TIME_AFTER / SOLUTION CHISQ CORRB ;  
	OUTPUT OUT = _GLIMMIX_POPULATION_&SRC. PRED = PREDICTED RESID = RESIDUALS; 
	ODS OUTPUT PARAMETERESTIMATES = _MEDS0_&SRC._GLIMMIX; 
	RANDOM _RESIDUAL_ / TYPE = ARMA(1,1); 
	TITLE "PROC GLIMMIX - REGRESSION FOR: &SRC."; 
RUN;

At the member level, we add the same variables as above, except we include the member identifier. The outcome variable here is the count of utilization. Code:

PROC GLIMMIX DATA = OUTFILE.RATES_MEDS00_&SRC.
	PLOTS = RESIDUALPANEL;
	CLASS MEMBNO;
	MODEL ENCOUNTERS = TIME POST TIME_AFTER / SOLUTION CHISQ CORRB; 
	OUTPUT OUT = _GLIMMIX_SUBJECT_&SRC. PRED = REGPRED; 
	ODS OUTPUT PARAMETERESTIMATES = RATES_MEDS00_&SRC._GLIMMIX; 
	RANDOM _RESIDUAL_ / SUBJECT = MEMBNO TYPE = ARMA(1,1); 
	TITLE "PROC GLIMMIX - MEMBER LEVEL REGRESSION UNADJUSTED FOR: &SRC."; 
RUN;

So, to your question: yes, the utilization accounts for the same people over time. Am I appropriately account for this with the "SUBJECT = MEMBNO" statement. MEMBNO is the individual identifier.

 

Ideally, I need the analysis at the individual level so I can adjust for additional variables. 

 

Let me know what you think, I'll be glad to fill in any holes and or share data. Thank you.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2633 views
  • 1 like
  • 2 in conversation