The purpose of this article it to provide an overview of concepts and SAS tools related to creating and processing timeseries as arrays. The usefulness of SAS timeseries array processing functionality, also known as the SAS data-step-for-timeseries toolbox, is illustrated in three demonstrations.
A timeseries is an indexed set of equally spaced values. Information or signal can exist in the order of and distance between values, so sequences need to remain intact in the processes of timeseries data creation, exploration and processing. A natural way to think about a timeseries in the context of data handling is as an array. An array provides a way to process a sequence of values based on an index and other user provided attributes. Timeseries data handling based on the idea of array processing is featured in both SAS Viya and SAS 9, and we’ll generally refer to this functionality as the SAS data-step-for-timeseries toolbox.
The purpose of this series of articles is to introduce and explain the tools and to illustrate their usefulness through a series of examples. This article provides an overview of concepts on creating and processing timeseries as arrays. Subsequent articles are previewed here with three demonstrations. Article 2 will focus on timeseries BY group processing. Multiple timeseries arrays are defined and processed using BY group or sub-setting variables. Article 3 focuses on creating user defined subroutines and functions and then using them in an array processing block of syntax. Topics covered in future articles will depend on reader feedback, so let us know what you think and provide suggestions for data-step-for-timeseries topics.
Demo 1, Initial and New Arrays
In the first example, we’ll use the AIR data set in the SASHELP library. This table and contains two variables: a count of US airline passengers, AIR, and a time index, DATE. The natural interval of the data is month and there are 144 observations. A portion of this table is shown.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
It may be useful to think of the TIMEDATA Procedure (SAS/ETS) processing shown here in two steps. First, selected variables in the input data set are named and initial arrays are created. Second, new arrays are created by operating on elements of initial arrays defined in the first step.
Note that the ID and VARS statements combine to uniquely define the array, AIR. In this case, one observation on passenger count per quarter is derived by averaging the monthly observations in the input table. Then, the DO block syntax creates four new arrays by operating on elements of the array, AIR.
proc timedata data=sashelp.air out=work.air outarray=work.airarray print=(arrays);
id date interval=quarter accumulate=average format=yymmdd.;
vars air;
outarrays rw_trend lin_trend quad_trend s4 c4;
twopi=2*constant("pi");
do t= 1 to dim(air);
rw_trend[t] = air[t-1];
lin_trend[t] = t;
s4[t] = sin(twopi*t/4);
c4[t] = cos(twopi*t/4);
end;
Title "Create arrays for different trends and sinusoids";
run;
Let’s investigate the results of the TIMEDATA call and discuss some details.
The AIR variable in the WORK.AIRARRAY table is a quarter interval time series. It’s first value is the average of the first three values of (month interval) AIR in the input table. The DATE variable has a quarter interval as specified in the ID statement. Four new arrays are created. These are common timeseries model features.
Note that SAS functions, commonly found in DATA step syntax, are valid to use in the TIMEDATA procedure. TIMEDATA and other tools implementing the data-step-for-timeseries approach accommodate most of the SAS programming statements and SAS functions that you can use in a DATA step.
Demo 2, BY Group Processing
This example will feature SAS Viya. While this software framework is different than SAS 9, shown in the first demonstration, the approach is consistent and the data-step-for-timeseries tools are implemented in a similar way. In general, a table that will be used for BY group processing has sequences stacked on top of each other, and the data is sorted according to the BY variables and the time ID. In the simple table we’ll use here there’s one sub-setting variable, and the data has been sorted by: BY_GRP, DATE. BY_GRP values identify two P_STATUS sequences. A subsequent article will illustrate how a table with more BY variables is organized.
Notes on the TSMODEL procedure (SAS Viya/Visual Forecasting) syntax:
proc tsmodel data = mylib.bygrp_in outarray=mylib.bygrp_out outscalar=mylib.scalars outsum=mylib.summary_stats;
id date interval=month;
var p_status;
by by_grp;
outarray ln_pstatus;
outscalar sum_sq;
submit;
do t = 1 to dim(p_status);
ln_pstatus[t] = log(p_status[t]);
sum_sq += ln_pstatus[t]**2;
end;
endsubmit;
run;
Let’s investigate the results of the TSMODEL procedure call and discuss some details.
The MYLIB.BYGRP_OUT table contains the two P_STATUS (group 1 & 2) timeseries defined by the ID, BY and VAR statements and the two new timeseries, LN_PSTATUS created in the SUBMIT block.
The MYLIB.SCALARS table contains the generated system scalars. One scalar is created for each LN_PSTATUS array.
A portion of the columns of the MYLIB.SUMMARY_STATS table is shown. Summary statistics on each of the four timeseries are listed.
Demo 3, Creating and Calling a User Defined Subroutine
This example switches back to SAS 9. In the first part, a user defined subroutine is created. The subroutine is then called in the TIMEDATA procedure to create a new array. The SAS Function Compiler procedure (FCMP, BASE/SAS) lets you to create, test and store SAS functions, CALL routines and subroutines. Here, a subroutine named MYLEAD is created and then stored in a compile library that can be referenced in subsequent steps.
options cmplib = work.timefnc;
proc fcmp outlib=work.timefnc.funcs;
subroutine mylead(actual[*], transform[*]);
outargs transform;
actlen = DIM(actual);
do t = 1 to actlen;
transform[t] = (actual[t+1]);
end;
endsub;
run;
quit;
The ID and VARS statements combine to create the quarter interval array AIR. The new array, LEADAIR, is declared in the OUTARRAYS statement and then created by calling the MYLEAD subroutine. The arguments, AIR and LEADAIR correspond to ACTUAL and TRANSFORM listed in the definition of the subroutine.
proc timedata data=sashelp.air out=work.air2 outarray=airarray2 print=(arrays);
id date interval=qtr accumulate=total format=yymmdd.;
vars air;
outarrays leadair;
call mylead(air, leadair);
run;
A portion of the OUTARRY table, AIRARRAY2 is shown.
As you can see, the array approach provides a flexible and efficient approach to timeseries data handling. In the next article, we’ll discuss BY group processing in more detail and provide more in-depth examples. Stay tuned for more data-step-for-timeseries action!
Find more articles from SAS Global Enablement and Learning here.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.