BookmarkSubscribeRSS Feed

Automatically highlight data-driven events with reference lines in line-charts

Started ‎05-07-2020 by
Modified ‎06-02-2020 by
Views 3,631

This article explains how you can automatically insert and label reference lines into your line-chart that indicate events in your data. The focus is on the automation of this task. You can always add reference lines using the REFLINE statement in the SGPLOT procedure. The method that is described here, uses output values from SAS analytics procedures (e.g. PROC ADAPTIVEREG or PROC X13) to automatically position and label the reference lines, instead of having to rewrite the REFLINE code every time your re-run the analysis.

 

The article uses two examples. In the first examples it is show how you can automatically position the reference lines.It explains how you can use the output dataset shown on the left to add reference lines as shown in the graph on the right.

Brkpnt_Output.PNG  c6f4_Break51.png

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The second example extends the task from example 1 by also automatically adding a label to the reference line based on the analytic output.

Arima_output.PNGc7f7_x13outlierslabel1.png

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This articles shows you can use a SAS DATA step with random number generators and a SAS Informat to simulate monthly time series data with specific patterns like trends, seasonal variation, breakpoints and outliers. It outlines options to analyze the course of the time series with analytical methods to identify breakpoints and outliers.

 

DSCS_Cover_klein.jpgThis tip is taken from the book Applying Data Science - Business Case Studies Using SAS. In case study 2 I show methods to detect structural changes like breakpoints and outliers in your data. 

You can find more details on the detection of structural changes also in my Youtube Video Detecting Structural Changes and Outliers in Longitudinal Data. There is a related article at SAS Communities on how to Simulate timeseries data with a SAS DATA Step and SAS Functions

 

 

 

Overview over the necessary steps

In order to automatically display events as vertical reference lines in the line chart, the following steps are needed:

  1. Use ODS OUTPUT to save the points on the x-axis calculated by a SAS procedures in a SAS data set.
  2. Use a SAS DATASTEP to convert the dataset into SAS statements that print the vertical reference lines and optionally also label them. Save these program statements in a SAS program file.
  3. Use the SGPLOT procedure to create your line chart. Include this SAS program file in the SGPLOT procedures that contains the the REFLINE statements.

 

Example 1 - Create reference line for breakpoints calculated by the ADAPTIVEREG procedure

 

Create the Output Data Sets using ODS OUTPUT

In the ADAPTIVEREG procedure, you use the ODS OUTPUT statement to save the knot points that shall be shown at the x-axis

 

proc adaptivereg data= AirlinePassengersSmooth plots=all ;
 model pass_smooth_backwTr = date/ maxbasis=11;
 ods output BWDParams=KnotPoints;
 output out=flights_adpt predicted=pred_adpt;
run;

The content and structure of the this table looks as follows:

Brkpnt_Output.PNG

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Use a SAS DATASTEP to create the REFLINE statements

This data set can now be used to generate SAS code that displays the vertical reference lines. The following statements use the values in the KNOTS variable of the KNOTPOINTS data set and combine them with syntax statements.

 

filename reflines 'c:/tmp/reflines_brk.sas';
data _NULL_;
 set KnotPoints;
 where name not in ("Basis0", "Basis1");
 file reflines;
 put @04 "refline " knot " / axis = x;";
run;

 

  • The FILENAME statement creates a file reference to a file in your directory. You can modify this path and filename. Note that you need write permissions in the directory that you specify.
  • A _NULL_ DATA step is used as no output data set shall be created but the code is written to the above specified file.
  • Data set KNOTPOINTS is used as input data set. Only the records starting with the 3rd line are used.
  • The FILE statement exports the result of the PUT statement in the next line to the file specified above.
  • In the PUT statement, you see a combination of text, for example, “REFLINE” and the value of the knot variable in the KNOTPOINTS data set.

The content of the file REFLINE_brk.SAS in the above example is the following:

 

   refline 14792  / axis = x;
   refline 15553  / axis = x;
   refline 15096  / axis = x;
   refline 11596  / axis = x;
   refline 15706  / axis = x;

 

You see that SAS statements that can directly be used in the SGPLOT procedures were generated automatically.

 

Plotting the lines with the SGPLOT procedure

Finally, you use the SGPLOT procedures to create the line chart with the reference lines.

 

proc sgplot data=flights_adpt;
 series x=date y=passengers;
 series x=date y=pass_smooth_backwTr/lineattrs=(pattern=4);
 series x=date y=pred_adpt/lineattrs=(pattern=3);
 %include reflines;
run;

 

  • The SERIES statement is used to plot a separate line for the actual values, the smoothed values, and the predictive values from the multivariate adaptive splines regression.
  • The PATTERN option in the LINEATTRS option is used to create dashed and dotted line types.
  • The statements for the reference lines are included with the %INCLUDE statement.

The output of this procedures looks as follows:

c6f4_Break51.png

 

 

 

 

 

 

 

 

 

 

 

 

 

Example 2 - Create reference line for outliers detected with the X13 procedure

 

Overview

In order to illustrate the location of the automatically identified outliers, you might want to add reference lines to the line plot. In order to do this, you follow the same procedure automatically detected breakpoints above. This example extends the approach form above by also adding an automatic label to the reference line.

  1. Save the list of detected events in a SAS data set
  2. Use this data set to create SAS statements that print and label the vertical reference lines. Save these program statements in a SAS program file
  3. Use the SGPLOT procedures to draw the actual values. Include the SAS program file from step 2 in the SGPLOT procedure to create the reference lines with a REFLINE statement.

 

Save the list of detected events in a SAS data set

In the X13 procedure, you use the ODS OUTPUT statement to store the list of detected outliers in a SAS data set.

 

proc x13 data=flights_911 date=date;
   var passengers;
   arima model=( (0,1,1)(0,1,1) );
   outlier;
   ods output RegParameterEstimates=RegParameterEstimates;
run ;

This outputs the following information in a SAS datset:

Arima_output.PNG

 

 

 

 

 

 

 

 

 

Use a SAS DATASTEP to create the REFLINE statements

 

Next you use a DATA _NULL_ step to write the REFLINE statements to a SAS file.

 

filename reflines 'c:/tmp/reflines_outliers.sas';
data _NULL_;
 set RegParameterEstimates;
 file reflines;
  Date=cats("'01",compress(substr(regvar,3,length(regvar))),"'d");
  put @04 "refline " Date " / axis = x label = '"regvar "';";
run;
  • The FILENAME statement creates a file reference to a file in your directory. You can modify this path and filename. Note that you need write permissions in the directory that you specify.
  • A _NULL_ DATA step is used as no output data set shall be created but the code is written to the above specified file.
  • Data set RegParameterEstimates is used as input data set. 
  • The FILE statement exports the result of the PUT statement in the next line to the file specified above.
  • In the PUT statement, you see a combination of text, for example, “REFLINE” and the values from the dataset (note that in the dataset the values shown as PARAMETER in the output text are stored in variable REGVAR).
    • Note that SAS text functions are used to isolate and concatenate the date value "01SEP2001"d from the text string "LS SEP2001" in the output data set.
    • The value of REGVAR is also used as label in the LABEL= option.

The content of the file REFLINES_OUTLIERS.SAS in the above example is the following:

   refline '01SEP2001'd  / axis = x label = 'LS SEP2001 ';
   refline '01NOV2001'd  / axis = x label = 'LS NOV2001 ';
   refline '01DEC2002'd  / axis = x label = 'AO DEC2002 ';
   refline '01DEC2003'd  / axis = x label = 'LS DEC2003 ';

You can now use this file directly and include it in your PROC SGPLOT code.

 

proc sgplot data=flights_911;
   series x=date y=passengers;
   yaxis label='Passengers';
   xaxis label="Date";
   %include reflines;
run;

 

c7f7_x13outlierslabel1.png

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Creating Individual Labels

Alternatively, you might want to add specific labels to each vertical reference line. In that case, you copy and paste the SAS statements from the REFLINES.SAS file to your SGPLOT procedure statements. Then you customize the labels in the LABEL option of the REFLINE statement.

The following example code creates the output below

 

proc sgplot data=flights_911;
   series x=date y=passengers;
   yaxis label='Passengers';
   xaxis label="Date";
   refline '01SEP2001'd  / axis = x label="Level Shift at 9/11";
   refline '01NOV2001'd  / axis = x label="Level Shift +";
   refline '01DEC2002'd  / axis = x label="Outlier +";
   refline '01DEC2003'd  / axis = x Label="Level Shift -";
run;

c7f8_x13IndivLabel1.png

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Can I use this method also without analytical procedures?

 

Yes, you can define simple business rules. For example:

  • Flag all months where Easter-Sunday does not fall into April
  • And indicate whether there is an increase or decline from March to April
Pass_Change_PrevMonth = dif(passengers);
EasterMonth = (month(date)=month(holiday('Easter',year(date)));

SAS Communities Article: 3 ways to consider movable holidays in SAS

 

filename refl_dif 'c:/tmp/sasensei/reflines_dif.sas';

data flights_911_flag;
 set flights_911;
 Pass_Change_PrevMonth = dif(passengers);
 EasterMonth = (month(date) = month(holiday('Easter',year(date))));
run;

** Be careful when combining DIF/LAG with IF statements in a datastep;
data _null_;
 set work.flights_911_flag;
 format Date 8.; 
 file refl_dif;
 if month(date)=4 and EasterMonth=0 then do;
    if Pass_Change_PrevMonth<0 then put @04 "refline " Date  " / axis = x label = 'March-Easter April Decrease';";
	else put @04 "refline " date  " / axis = x label = 'March-Easter April Increase';";
 end;
run;

proc sgplot data=flights_911;
 series x=date y=passengers;
 yaxis label='Passengers';
 xaxis label="Date";
 %include refl_dif;
run;

easter_flight.PNG

 

 

 

 

 

 

 

 

 

 

 

 

 

Links

See also the article Simulate timeseries data with a SAS DATA Step and SAS Functions

and the Youtube Videos: 

 

 

 

 

Comments

Nice article @gsvolba!  I suspect that we'll see some more "level shift" events in travel data -- and everything else -- from this year!

thank you @ChrisHemedinger 
agree that we have to draw a lot of reference lines into this years line charts - so the solution to automatically insert these reference lines is even more important

😉😉

Version history
Last update:
‎06-02-2020 12:27 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags