This article explains how you can automatically insert and label reference lines into your line-chart that indicate events in your data. The focus is on the automation of this task. You can always add reference lines using the REFLINE statement in the SGPLOT procedure. The method that is described here, uses output values from SAS analytics procedures (e.g. PROC ADAPTIVEREG or PROC X13) to automatically position and label the reference lines, instead of having to rewrite the REFLINE code every time your re-run the analysis.
The article uses two examples. In the first examples it is show how you can automatically position the reference lines.It explains how you can use the output dataset shown on the left to add reference lines as shown in the graph on the right.
The second example extends the task from example 1 by also automatically adding a label to the reference line based on the analytic output.
This articles shows you can use a SAS DATA step with random number generators and a SAS Informat to simulate monthly time series data with specific patterns like trends, seasonal variation, breakpoints and outliers. It outlines options to analyze the course of the time series with analytical methods to identify breakpoints and outliers.
This tip is taken from the book Applying Data Science - Business Case Studies Using SAS. In case study 2 I show methods to detect structural changes like breakpoints and outliers in your data.
You can find more details on the detection of structural changes also in my Youtube Video Detecting Structural Changes and Outliers in Longitudinal Data. There is a related article at SAS Communities on how to Simulate timeseries data with a SAS DATA Step and SAS Functions.
In order to automatically display events as vertical reference lines in the line chart, the following steps are needed:
In the ADAPTIVEREG procedure, you use the ODS OUTPUT statement to save the knot points that shall be shown at the x-axis
proc adaptivereg data= AirlinePassengersSmooth plots=all ;
model pass_smooth_backwTr = date/ maxbasis=11;
ods output BWDParams=KnotPoints;
output out=flights_adpt predicted=pred_adpt;
run;
The content and structure of the this table looks as follows:
This data set can now be used to generate SAS code that displays the vertical reference lines. The following statements use the values in the KNOTS variable of the KNOTPOINTS data set and combine them with syntax statements.
filename reflines 'c:/tmp/reflines_brk.sas';
data _NULL_;
set KnotPoints;
where name not in ("Basis0", "Basis1");
file reflines;
put @04 "refline " knot " / axis = x;";
run;
The content of the file REFLINE_brk.SAS in the above example is the following:
refline 14792 / axis = x;
refline 15553 / axis = x;
refline 15096 / axis = x;
refline 11596 / axis = x;
refline 15706 / axis = x;
You see that SAS statements that can directly be used in the SGPLOT procedures were generated automatically.
Finally, you use the SGPLOT procedures to create the line chart with the reference lines.
proc sgplot data=flights_adpt;
series x=date y=passengers;
series x=date y=pass_smooth_backwTr/lineattrs=(pattern=4);
series x=date y=pred_adpt/lineattrs=(pattern=3);
%include reflines;
run;
The output of this procedures looks as follows:
In order to illustrate the location of the automatically identified outliers, you might want to add reference lines to the line plot. In order to do this, you follow the same procedure automatically detected breakpoints above. This example extends the approach form above by also adding an automatic label to the reference line.
In the X13 procedure, you use the ODS OUTPUT statement to store the list of detected outliers in a SAS data set.
proc x13 data=flights_911 date=date;
var passengers;
arima model=( (0,1,1)(0,1,1) );
outlier;
ods output RegParameterEstimates=RegParameterEstimates;
run ;
This outputs the following information in a SAS datset:
Next you use a DATA _NULL_ step to write the REFLINE statements to a SAS file.
filename reflines 'c:/tmp/reflines_outliers.sas';
data _NULL_;
set RegParameterEstimates;
file reflines;
Date=cats("'01",compress(substr(regvar,3,length(regvar))),"'d");
put @04 "refline " Date " / axis = x label = '"regvar "';";
run;
RegParameterEstimates
is used as input data set. The content of the file REFLINES_OUTLIERS.SAS in the above example is the following:
refline '01SEP2001'd / axis = x label = 'LS SEP2001 ';
refline '01NOV2001'd / axis = x label = 'LS NOV2001 ';
refline '01DEC2002'd / axis = x label = 'AO DEC2002 ';
refline '01DEC2003'd / axis = x label = 'LS DEC2003 ';
You can now use this file directly and include it in your PROC SGPLOT code.
proc sgplot data=flights_911;
series x=date y=passengers;
yaxis label='Passengers';
xaxis label="Date";
%include reflines;
run;
Alternatively, you might want to add specific labels to each vertical reference line. In that case, you copy and paste the SAS statements from the REFLINES.SAS file to your SGPLOT procedure statements. Then you customize the labels in the LABEL option of the REFLINE statement.
The following example code creates the output below
proc sgplot data=flights_911;
series x=date y=passengers;
yaxis label='Passengers';
xaxis label="Date";
refline '01SEP2001'd / axis = x label="Level Shift at 9/11";
refline '01NOV2001'd / axis = x label="Level Shift +";
refline '01DEC2002'd / axis = x label="Outlier +";
refline '01DEC2003'd / axis = x Label="Level Shift -";
run;
Yes, you can define simple business rules. For example:
Pass_Change_PrevMonth = dif(passengers);
EasterMonth = (month(date)=month(holiday('Easter',year(date)));
SAS Communities Article: 3 ways to consider movable holidays in SAS
filename refl_dif 'c:/tmp/sasensei/reflines_dif.sas';
data flights_911_flag;
set flights_911;
Pass_Change_PrevMonth = dif(passengers);
EasterMonth = (month(date) = month(holiday('Easter',year(date))));
run;
** Be careful when combining DIF/LAG with IF statements in a datastep;
data _null_;
set work.flights_911_flag;
format Date 8.;
file refl_dif;
if month(date)=4 and EasterMonth=0 then do;
if Pass_Change_PrevMonth<0 then put @04 "refline " Date " / axis = x label = 'March-Easter April Decrease';";
else put @04 "refline " date " / axis = x label = 'March-Easter April Increase';";
end;
run;
proc sgplot data=flights_911;
series x=date y=passengers;
yaxis label='Passengers';
xaxis label="Date";
%include refl_dif;
run;
See also the article Simulate timeseries data with a SAS DATA Step and SAS Functions
and the Youtube Videos:
Nice article @gsvolba! I suspect that we'll see some more "level shift" events in travel data -- and everything else -- from this year!
thank you @ChrisHemedinger
agree that we have to draw a lot of reference lines into this years line charts - so the solution to automatically insert these reference lines is even more important
😉😉
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.