BookmarkSubscribeRSS Feed
sasalex2024
Quartz | Level 8

Dear SAS Community,

I am working with monthly phosphorus observations collected from a sewage treatment plant, as reported in the book by Hipel and McLeod (Time Series Modelling of Water Resources and Environmental Systems, 1994). The data are entered in the code below.

According to the authors, a pollution abatement procedure implemented in February 1974 led to a decrease in the mean level of the series. This change appears to be visually supported by the data. Throughout the book, the authors apply a logarithmic transformation to the series to stabilize the variance.

To formally detect the change in the mean level, I first ensured that the transformed series was both detrended and deseasonalized via the UCM procedure. As shown below, the procedure reports an additive outlier in April 1975. I am not sure if this is reasonably close to the actual intervention date.

I also experimented with the SAS CUSUM procedure. However, the one-sided CUSUM test indicates several large spikes, with the largest occurring in January 1975. My interpretation is that, although the mean change began at the start of 1974, it may take some time before CUSUM can reliably detect it.

Could you please advise whether the approach I have taken is appropriate?Maybe detrending or deseasonalizing this way is not appropriate? Is there a more precise method for detecting this type of structural change in the time series via SAS? 

Thank you in advance for your insights.

title 'Phosphorous Data, January, 1972-December, 1977';
data Data_Phosphorous;
input Yt @@;
Date = intnx('month','1Jan1972'd,_N_-1);
format date monyy.;
logYt=log(Yt);
datalines;
0.47 0.51 0.35 0.19 0.33 0.1524 0.365 0.65 0.825 1 0.385 0.9 0.295 0.14 0.22 0.2 0.14 0.4 0.2144 0.495 1.1 0.59 0.27 0.3 0.3064 0.065
0.24 0.058 0.079 0.065 0.12 0.091 0.058 0.12 0.12 0.11 0.46 0.15 0.086 0.028 0.1342 0.11 0.36 0.18 0.065 0.13 0.12 0.19 0.15 0.107
0.047 0.055 0.08 0.071 0.121 0.108 0.169 0.066 0.079 0.104 0.157 0.14 0.07 0.056 0.042 0.116 0.106 0.094 0.097 0.05 0.079 0.114
;
run;

proc ucm data=data_phosphorous;
id date interval=month;
model logYt;
irregular;
level plot=smooth checkbreak;
estimate;
forecast lead=0 plot=decomp outfor=ucm_all;
run;

proc means data=ucm_all(obs=25);
var s_irreg;
output out=stats mean=mu0 std=sigma0;
run;

/* Store computed values as macro variables */
data _null_;
set stats;
call symputx("mu0", mu0);
call symputx("sigma0", sigma0);
run;

title 'One-sided CUSUM Chart';
proc cusum data=ucm_all;
xchart s_irreg*Date /scheme=onesided 
mu0 = &mu0 /* target mean for process */
sigma0 = &sigma0 /* known standard deviation */
delta = 1 /* shift to be detected */
h = 2 /* cusum parameter h */
k = 0.5 /* cusum parameter k */
scheme = onesided /* one-sided decision interval */
tableall /* table */
cinfill = ywh
cframe = bigb
cout = salmon
cconnect = salmon
climits = black
coutfill = bilg;
label s_irreg = 'Cusum of Detrended and Deseasonalized LogYt';
run;
options gstyle;
3 REPLIES 3
sbxkoenk
SAS Super FREQ

If you have longitudinal data without too much autocorrelation (serial correlation) , you can look here :

Otherwise (in case of time series with serious autocorrelation ) , you can look here :

Ciao,

Koen

sasalex2024
Quartz | Level 8

Dear Koen,

Thank you for sending these very useful links.

I will mostly be working with autocorrelated and nonstationary models, and I found the link to your Nile example particularly helpful. It clearly illustrates how to:

  1. fit an appropriate ARIMA model to the series, and

  2. use that model as the basis for outlier detection.

This approach is extremely valuable.

That said, I’d appreciate any thoughts on the UCM and CUSUM issues I mentioned above. Thanks again.

 

sbxkoenk
SAS Super FREQ

@sasalex2024 wrote:

 

That said, I’d appreciate any thoughts on the UCM and CUSUM issues I mentioned above. Thanks again.


This is how I would use the CUSUM method for Structural Break Detection / Change Point Detection in time series:

  • Make an appropriate time-series model for your time series (it should fit the data well).
    Use PROC ARIMA or PROC SSM (for linear state space models) or PROC UCM or ...
  • then proceed to Model-based Break detection methods that use CUSUM and/or CUSUMSQ statistics.
  • CUSUM and CUSUMSQ statistics are computed using the cumulative sum and cumulative sum of squares of "one-step-ahead residuals" , respectively.
  • With only a little bit of effort , you can use these residuals to do the CUSUM/CUSUMSQ-based change point detection as a post-fit diagnostic step.
  • The CUSUM/CUSUMSQ-based change point diagnostics can be complementary to the De Jong-Penzer algorithm-based change point diagnostics.
  • In particular, the CUSUM/CUSUMSQ-based change point diagnostics can be useful to detect changes in the variance of the process.

Ciao,

Koen