Solved: Re: Alternatives to Proc AUTOREG for healthcare time series analysis

rsanchez87 · Posted 08-10-2018 08:58 PM

Hi All,

I am conducting a time series analysis on rates of health care utilization, I understand PROC AUTOREG is the most appropriate option in SAS. However, I do not have SAS/ETS. I have read several articles that use PROC Reg and call DWPROB to test for autocorrelation. Other sources have used Proc GLM or Proc GLIMMIX, but I do not think these can account for autocorrelation. We have designed this study well, but the intervention itself may not be strong. Are there any SAS alternative(s) for PROC Autoreg (not in SAS/ETS) that I can leverage.

The proc reg and proc glimmix models are:

PROC REG DATA = DATA; 
	MODEL CHANGE  = TIME POST TIME_AFTER / DWPROB; 
RUN;

PROC GLIMMIX DATA = DATA; 
 MODEL CHANGE = TIME POST TIME_AFTER/ SOLUTION CHISQ CORRB; 
 RANDOM _RESIDUAL_; 
RUN;

The Darwin & Watson test returns a value of 1.980, which suggests to me there is no autocorrelation between these variables and the outcome.

PROC Reg and Prog GLIMMIX produce the same result - I am not sure if this is correct.

Parameter Estimates

Effect, Estimate, Standard Error, DF, t Value, Pr > |t|

Intercept, 0.03472, 0.1190, 80, 0.29, 0.7713

Time, 0.01221, 0.004228, 80, 2.89, 0.0050

Post, 0.04177, 0.1800, 80, 0.23, 0.8171

time_after, -0.022390, .007764, 80, -2.88, 0.0050

This output would then suggest that the intervention resulted in a statically significant reduction of rates, albeit very small (-2% reduction).

Looking forward to your ideas, questions, concerns, clarifications, all!

Description of data:

Change = change in rates of utilization between intervention and control

Time = sequential number for the time period (84 records representing all months between 2011 and 2017)

Post = 0 or 1 value for "no intervention" (0) or "intervention" (1) period

Time_After = sequential number starting on the first intervention month on 01/01/2015

Full data below:

SRC	TIME	EFFECTIVE_PERIOD	POST	PERIOD	INT_RATE	CONTROL_RATE	INT_CLAIMS	CONTROL_CLAIMS	CHANGE	CLAIMS_CHANGE	TIME_AFTER
ALL	1	1/1/2011	0	NO INTERVENTION	4.6	4.06	1377	1219	0.54	158	0
ALL	2	2/1/2011	0	NO INTERVENTION	3.91	3.67	1265	1111	0.24	154	0
ALL	3	3/1/2011	0	NO INTERVENTION	4.6	4.43	1566	1440	0.17	126	0
ALL	4	4/1/2011	0	NO INTERVENTION	3.42	3.86	1137	1155	-0.44	-18	0
ALL	5	5/1/2011	0	NO INTERVENTION	3.88	4.04	1339	1274	-0.16	65	0
ALL	6	6/1/2011	0	NO INTERVENTION	3.69	4.28	1182	1207	-0.59	-25	0
ALL	7	7/1/2011	0	NO INTERVENTION	3.79	3.56	1208	1027	0.23	181	0
ALL	8	8/1/2011	0	NO INTERVENTION	3.62	3.62	1229	1102	0	127	0
ALL	9	9/1/2011	0	NO INTERVENTION	4.57	4.15	1386	1129	0.42	257	0
ALL	10	10/1/2011	0	NO INTERVENTION	4.47	4.23	1354	1220	0.24	134	0
ALL	11	11/1/2011	0	NO INTERVENTION	4.86	4.24	1459	1256	0.62	203	0
ALL	12	12/1/2011	0	NO INTERVENTION	3.94	3.85	1233	1129	0.08	104	0
ALL	13	1/1/2012	0	NO INTERVENTION	4.59	3.84	1473	1227	0.75	246	0
ALL	14	2/1/2012	0	NO INTERVENTION	4.41	4.1	1369	1333	0.31	36	0
ALL	15	3/1/2012	0	NO INTERVENTION	4.51	4.57	1487	1405	-0.07	82	0
ALL	16	4/1/2012	0	NO INTERVENTION	4.18	4.02	1346	1310	0.16	36	0
ALL	17	5/1/2012	0	NO INTERVENTION	4.97	4.65	1664	1411	0.33	253	0
ALL	18	6/1/2012	0	NO INTERVENTION	4.62	3.81	1409	1228	0.81	181	0
ALL	19	7/1/2012	0	NO INTERVENTION	3.9	4.17	1254	1269	-0.27	-15	0
ALL	20	8/1/2012	0	NO INTERVENTION	4.21	3.74	1267	1151	0.47	116	0
ALL	21	9/1/2012	0	NO INTERVENTION	3.82	4.14	1104	1221	-0.32	-117	0
ALL	22	10/1/2012	0	NO INTERVENTION	4.39	4.27	1223	1276	0.13	-53	0
ALL	23	11/1/2012	0	NO INTERVENTION	3.95	4.05	1128	1156	-0.11	-28	0
ALL	24	12/1/2012	0	NO INTERVENTION	3.63	3.34	996	978	0.29	18	0
ALL	25	1/1/2013	0	NO INTERVENTION	5.12	3.88	1457	1025	1.24	432	0
ALL	26	2/1/2013	0	NO INTERVENTION	3.56	3.18	1106	894	0.38	212	0
ALL	27	3/1/2013	0	NO INTERVENTION	4.02	3.73	1206	1018	0.29	188	0
ALL	28	4/1/2013	0	NO INTERVENTION	4.37	4.09	1337	1130	0.28	207	0
ALL	29	5/1/2013	0	NO INTERVENTION	4.12	3.91	1204	1121	0.21	83	0
ALL	30	6/1/2013	0	NO INTERVENTION	3.86	3.58	1037	983	0.27	54	0
ALL	31	7/1/2013	0	NO INTERVENTION	3.83	3.89	1076	1022	-0.06	54	0
ALL	32	8/1/2013	0	NO INTERVENTION	3.96	4.07	1000	1118	-0.11	-118	0
ALL	33	9/1/2013	0	NO INTERVENTION	4.7	3.93	1084	991	0.77	93	0
ALL	34	10/1/2013	0	NO INTERVENTION	4.7	4.13	1213	1147	0.57	66	0
ALL	35	11/1/2013	0	NO INTERVENTION	4.24	3.19	1098	872	1.05	226	0
ALL	36	12/1/2013	0	NO INTERVENTION	3.71	3.16	935	831	0.54	104	0
ALL	37	1/1/2014	0	NO INTERVENTION	4.77	3.55	1263	891	1.22	372	0
ALL	38	2/1/2014	0	NO INTERVENTION	3.92	3.33	999	903	0.59	96	0
ALL	39	3/1/2014	0	NO INTERVENTION	5.26	4.54	1406	1135	0.72	271	0
ALL	40	4/1/2014	0	NO INTERVENTION	5.01	4.64	1339	1219	0.37	120	0
ALL	41	5/1/2014	0	NO INTERVENTION	5.18	4.69	1317	1203	0.49	114	0
ALL	42	6/1/2014	0	NO INTERVENTION	4.48	4.39	1175	1061	0.09	114	0
ALL	43	7/1/2014	0	NO INTERVENTION	4.39	4.36	1163	1092	0.03	71	0
ALL	44	8/1/2014	0	NO INTERVENTION	3.72	3.81	1033	1003	-0.09	30	0
ALL	45	9/1/2014	0	NO INTERVENTION	5.39	4.18	1415	1079	1.21	336	0
ALL	46	10/1/2014	0	NO INTERVENTION	5.18	4.41	1381	1210	0.77	171	0
ALL	47	11/1/2014	0	NO INTERVENTION	4.09	3.56	1155	923	0.53	232	0
ALL	48	12/1/2014	0	NO INTERVENTION	4.72	3.87	1278	1079	0.85	199	0
ALL	49	1/1/2015	1	INTERVENTION	5.15	4.22	1418	1106	0.93	312	1
ALL	50	2/1/2015	1	INTERVENTION	4.3	3.73	1174	994	0.57	180	2
ALL	51	3/1/2015	1	INTERVENTION	6.13	4.56	1631	1221	1.57	410	3
ALL	52	4/1/2015	1	INTERVENTION	5.42	4.66	1473	1300	0.77	173	4
ALL	53	5/1/2015	1	INTERVENTION	4.74	3.83	1276	1150	0.91	126	5
ALL	54	6/1/2015	1	INTERVENTION	4.81	4.15	1325	1144	0.66	181	6
ALL	55	7/1/2015	1	INTERVENTION	4.81	4.25	1391	1131	0.56	260	7
ALL	56	8/1/2015	1	INTERVENTION	4.49	4.44	1278	1221	0.05	57	8
ALL	57	9/1/2015	1	INTERVENTION	4.53	4.46	1252	1135	0.07	117	9
ALL	58	10/1/2015	1	INTERVENTION	5.52	4.61	1493	1245	0.91	248	10
ALL	59	11/1/2015	1	INTERVENTION	4.59	4.33	1268	1200	0.26	68	11
ALL	60	12/1/2015	1	INTERVENTION	4.68	4.63	1363	1220	0.05	143	12
ALL	61	1/1/2016	1	INTERVENTION	4.81	4.2	1267	1168	0.61	99	13
ALL	62	2/1/2016	1	INTERVENTION	4.36	4.76	1241	1258	-0.41	-17	14
ALL	63	3/1/2016	1	INTERVENTION	6.01	4.84	1679	1370	1.17	309	15
ALL	64	4/1/2016	1	INTERVENTION	4.94	4.73	1408	1379	0.2	29	16
ALL	65	5/1/2016	1	INTERVENTION	4.84	4.49	1407	1263	0.35	144	17
ALL	66	6/1/2016	1	INTERVENTION	5.9	5.07	1472	1415	0.84	57	18
ALL	67	7/1/2016	1	INTERVENTION	5.06	4.73	1316	1344	0.33	-28	19
ALL	68	8/1/2016	1	INTERVENTION	5.59	4.98	1454	1338	0.61	116	20
ALL	69	9/1/2016	1	INTERVENTION	4.99	5.14	1321	1271	-0.16	50	21
ALL	70	10/1/2016	1	INTERVENTION	5.09	4.56	1343	1310	0.53	33	22
ALL	71	11/1/2016	1	INTERVENTION	4.95	4.41	1336	1174	0.54	162	23
ALL	72	12/1/2016	1	INTERVENTION	4.61	4.43	1289	1234	0.18	55	24
ALL	73	1/1/2017	1	INTERVENTION	4.4	4.58	1185	1356	-0.18	-171	25
ALL	74	2/1/2017	1	INTERVENTION	4.66	4.44	1247	1187	0.22	60	26
ALL	75	3/1/2017	1	INTERVENTION	5.59	4.89	1480	1399	0.7	81	27
ALL	76	4/1/2017	1	INTERVENTION	4.24	4.33	1170	1244	-0.09	-74	28
ALL	77	5/1/2017	1	INTERVENTION	5.7	4.64	1496	1336	1.06	160	29
ALL	78	6/1/2017	1	INTERVENTION	5.16	4.46	1451	1270	0.7	181	30
ALL	79	7/1/2017	1	INTERVENTION	4.41	3.92	1318	1200	0.49	118	31
ALL	80	8/1/2017	1	INTERVENTION	5.3	4.2	1455	1268	1.11	187	32
ALL	81	9/1/2017	1	INTERVENTION	5.17	4.65	1434	1276	0.52	158	33
ALL	82	10/1/2017	1	INTERVENTION	5.05	5.15	1379	1421	-0.1	-42	34
ALL	83	11/1/2017	1	INTERVENTION	5.27	4.89	1316	1367	0.38	-51	35
ALL	84	12/1/2017	1	INTERVENTION	4.35	4.2	1243	1196	0.15	47	36

Thank you!

Some References:

1. (USES PROC GLIMMIX) Wong, EC. Analysing Phased Intervention with Segmented Regression and Stepped Wedge Deisngs: https://www.lexjansen.com/wuss/2014/74_Final_Paper_PDF.pdf

2. (USES PROC AUTOREG) Penfold, R. Use of Interrupted Time Series Analysis in Evaluating Health Care Quality Improvements https://www.academicpedsjnl.net/article/S1876-2859(13)00210-6/pdf

PGStats · Posted 08-11-2018 12:08 AM

What's missing in your first reference? It seems to cover the use of GLIMMIX with AR(1) covariance structure for segmented models quite well.

PG

View solution in original post

PGStats · Posted 08-11-2018 12:08 AM

What's missing in your first reference? It seems to cover the use of GLIMMIX with AR(1) covariance structure for segmented models quite well.

PG

rsanchez87 · Posted 08-11-2018 02:43 PM

Thank you, I overlooked including that in my logic. The estimate for Time_After is -0.0223 (p = 0.0105).

How should I interpret the AR(1) output:

AR(1) estimate: 0.05051

AR(1) SE: 0.115

Residual estimate: 0.1655

Residual SE: 0.026

Thank you for your help.

PGStats · Posted 08-11-2018 05:13 PM

About AR(1) estimate

Estimate +/- SE interval includes zero, so I would consider autocorreletion as insignificant. Keep it in anyway, it doesn't hurt.

PG

rsanchez87 · Posted 08-11-2018 05:16 PM

Many thanks!

rsanchez87 · Posted 08-31-2018 03:53 PM

Thanks again for the clarification.

I had to update the methodology to count visits in our process, thus changing the values of our outcome variable.

Running the program again, I get an Ar(1) of 03354 (0.1134). This would mean that there is auto-correlation?

I am using PROC GLIMMIX to run the stats - that is the only available procedure I have access too. Code below:

PROC GLIMMIX DATA = OUTFILE._MEDS0_&SRC. 
	PLOTS = RESIDUALPANEL ;
	MODEL ENCOUNTERS_DIFF = TIME POST TIME_AFTER POST*TIME_AFTER/ SOLUTION CHISQ CORRB; 
	OUTPUT OUT = _GLIMMIX_POPULATION_&SRC. PRED = REGPRED; 
	ODS OUTPUT PARAMETERESTIMATES = _MEDS0_&SRC._GLIMMIX; 
	RANDOM _RESIDUAL_ / TYPE = AR(1); 
	TITLE "PROC GLIMMIX - REGRESSION FOR: &SRC."; 
RUN;

Essentially, this is a time series between an intervention and a control group for 84 months (2011 - 2017). In 2015 (49th month), the intervention began. The goal is to evaluate the impact on the intervention group compared to themselves and the comparison group.

If the AR(1) stat is showing auto-correlation, is this model appropriate? Also, the AIC value is 970 with a Chi-Sq. of 8847.

I also have the data t the individual level, but the AR(1) increases, and the AIC value skyrockets. And the distribution isn't Gaussian, it appears to be negative binomial to me (as this is health care utilization data).

Thoughts on how to handle this panel/time series data?

Thank you for your help!

PGStats · Posted 09-01-2018 01:10 AM

That the AR(1) is significant is not a problem at all. Quite the contrary, it ensures the validity of your inference in the presence of first order autocorrelation.

AIC (or better, AICC) is useful for comparing models fitted on the same data. The AIC cannot be used to compare models fitted on different datasets.

If your residuals are far from normal, you can either try to transform the data, or try fitting the data to a different distribution.

hth

PG

rsanchez87 · Posted 09-05-2018 07:41 PM

Thanks again.

I am using health care utilization data, which is typically composed of a scattered high utilizers and low utilizers and many zeroes.

This is count data per month per person for 84 months (N = 6,017 person). How would you assess the distribution here?

Raw data:

PGStats · Posted 09-06-2018 12:51 AM

Histograms would be a lot more useful to guess the possible distribution(s).

PG

rsanchez87 · Posted 09-06-2018 07:00 PM

Apologies, please see below.

Poisson or negative binomial? How would you proceed?

Some thoughts:

1. The data is first order auto-regressive (i.e. what happens last month, influences this month). I rolled up the data to the year-quarter, and it remains auto-regressive. Not surprised. There is no way around it, as this is real-world healthcare data. Still, I need Proc Autoreg - but can I transform the data or account for the autocorrelation somehow in an estimate, akin to a two-step model?. 😞

2. Most members have utilization of 0 at each month, would it make sense to roll the members to the year-quarter or simply the year mark to reduce zeroes?

Thanks for your help!

PGStats · Posted 09-06-2018 11:01 PM

Real life data is never pure Poisson, except maybe radioactive counts. So if you try Poisson, add an overdispersion term. The very long tail however would suggest a negative binomial distribution. In either case, there appears to be an overabundance of zeros, so if you constantly get poor fits at zero, you should try fitting a zero inflation term.

My strategy would be: start from the simplest model (Poisson), add overdispersion, then add zero inflation, then switch to NB and ZINB. Use AICC to compare the fits.

If you have access to SAS/ETS, use proc COUNTREG. otherwise, use GLIMMIX.

That said, I was wondering if many of your care utilisation counts refer to the same people over time. And if so, whether you should account for that.

PG

rsanchez87 · Posted 09-07-2018 09:18 AM

THANK YOU SO MUCH! I was exploring those option, but I confused myself. Going to start fresh with that plan of action.

Unfortunately, I don't have SAS/ETS, so GLIMMIX it is. I will follow your guidance and report out.

I am looking at the data in two ways: (1) population level, and (2) individual level.

At the population level, there are 84 records, one for each month of the study with columns indicating when the intervention started, utilization for the intervention and control, the difference in utilization rates between intervention and control, and a counter for the time after the intervention. Our outcome variable here is the difference in utilization rates. The rates are nicely distributed, with a normal distribution - but they still arise from count data. However, the difference is always negative - the intervention rates are always less than the control rates. Is Poisson, NegBin, and/or ZBIN still appropriate for negative values?

At the population level, the GLIMMIX is as follows:

PROC GLIMMIX DATA = OUTFILE._MEDS1_&SRC. 
	PLOTS = RESIDUALPANEL ;
	MODEL RATE_DIFF = TIME POST TIME_AFTER POST*TIME_AFTER / SOLUTION CHISQ CORRB ;  
	OUTPUT OUT = _GLIMMIX_POPULATION_&SRC. PRED = PREDICTED RESID = RESIDUALS; 
	ODS OUTPUT PARAMETERESTIMATES = _MEDS0_&SRC._GLIMMIX; 
	RANDOM _RESIDUAL_ / TYPE = ARMA(1,1); 
	TITLE "PROC GLIMMIX - REGRESSION FOR: &SRC."; 
RUN;

At the member level, we add the same variables as above, except we include the member identifier. The outcome variable here is the count of utilization. Code:

PROC GLIMMIX DATA = OUTFILE.RATES_MEDS00_&SRC.
	PLOTS = RESIDUALPANEL;
	CLASS MEMBNO;
	MODEL ENCOUNTERS = TIME POST TIME_AFTER / SOLUTION CHISQ CORRB; 
	OUTPUT OUT = _GLIMMIX_SUBJECT_&SRC. PRED = REGPRED; 
	ODS OUTPUT PARAMETERESTIMATES = RATES_MEDS00_&SRC._GLIMMIX; 
	RANDOM _RESIDUAL_ / SUBJECT = MEMBNO TYPE = ARMA(1,1); 
	TITLE "PROC GLIMMIX - MEMBER LEVEL REGRESSION UNADJUSTED FOR: &SRC."; 
RUN;

So, to your question: yes, the utilization accounts for the same people over time. Am I appropriately account for this with the "SUBJECT = MEMBNO" statement. MEMBNO is the individual identifier.

Ideally, I need the analysis at the individual level so I can adjust for additional variables.

Let me know what you think, I'll be glad to fill in any holes and or share data. Thank you.