Dear SAS Community,
I am working with monthly phosphorus observations collected from a sewage treatment plant, as reported in the book by Hipel and McLeod (Time Series Modelling of Water Resources and Environmental Systems, 1994). The data are entered in the code below.
According to the authors, a pollution abatement procedure implemented in February 1974 led to a decrease in the mean level of the series. This change appears to be visually supported by the data. Throughout the book, the authors apply a logarithmic transformation to the series to stabilize the variance.
To formally detect the change in the mean level, I first ensured that the transformed series was both detrended and deseasonalized via the UCM procedure. As shown below, the procedure reports an additive outlier in April 1975. I am not sure if this is reasonably close to the actual intervention date.
I also experimented with the SAS CUSUM procedure. However, the one-sided CUSUM test indicates several large spikes, with the largest occurring in January 1975. My interpretation is that, although the mean change began at the start of 1974, it may take some time before CUSUM can reliably detect it.
Could you please advise whether the approach I have taken is appropriate?Maybe detrending or deseasonalizing this way is not appropriate? Is there a more precise method for detecting this type of structural change in the time series via SAS?
Thank you in advance for your insights.
title 'Phosphorous Data, January, 1972-December, 1977'; data Data_Phosphorous; input Yt @@; Date = intnx('month','1Jan1972'd,_N_-1); format date monyy.; logYt=log(Yt); datalines; 0.47 0.51 0.35 0.19 0.33 0.1524 0.365 0.65 0.825 1 0.385 0.9 0.295 0.14 0.22 0.2 0.14 0.4 0.2144 0.495 1.1 0.59 0.27 0.3 0.3064 0.065 0.24 0.058 0.079 0.065 0.12 0.091 0.058 0.12 0.12 0.11 0.46 0.15 0.086 0.028 0.1342 0.11 0.36 0.18 0.065 0.13 0.12 0.19 0.15 0.107 0.047 0.055 0.08 0.071 0.121 0.108 0.169 0.066 0.079 0.104 0.157 0.14 0.07 0.056 0.042 0.116 0.106 0.094 0.097 0.05 0.079 0.114 ; run; proc ucm data=data_phosphorous; id date interval=month; model logYt; irregular; level plot=smooth checkbreak; estimate; forecast lead=0 plot=decomp outfor=ucm_all; run; proc means data=ucm_all(obs=25); var s_irreg; output out=stats mean=mu0 std=sigma0; run; /* Store computed values as macro variables */ data _null_; set stats; call symputx("mu0", mu0); call symputx("sigma0", sigma0); run; title 'One-sided CUSUM Chart'; proc cusum data=ucm_all; xchart s_irreg*Date /scheme=onesided mu0 = &mu0 /* target mean for process */ sigma0 = &sigma0 /* known standard deviation */ delta = 1 /* shift to be detected */ h = 2 /* cusum parameter h */ k = 0.5 /* cusum parameter k */ scheme = onesided /* one-sided decision interval */ tableall /* table */ cinfill = ywh cframe = bigb cout = salmon cconnect = salmon climits = black coutfill = bilg; label s_irreg = 'Cusum of Detrended and Deseasonalized LogYt'; run; options gstyle;
If you have longitudinal data without too much autocorrelation (serial correlation) , you can look here :
Otherwise (in case of time series with serious autocorrelation ) , you can look here :
Ciao,
Koen
Dear Koen,
Thank you for sending these very useful links.
I will mostly be working with autocorrelated and nonstationary models, and I found the link to your Nile example particularly helpful. It clearly illustrates how to:
fit an appropriate ARIMA model to the series, and
use that model as the basis for outlier detection.
This approach is extremely valuable.
That said, I’d appreciate any thoughts on the UCM and CUSUM issues I mentioned above. Thanks again.
@sasalex2024 wrote:
That said, I’d appreciate any thoughts on the UCM and CUSUM issues I mentioned above. Thanks again.
This is how I would use the CUSUM method for Structural Break Detection / Change Point Detection in time series:
Ciao,
Koen
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!