I'm working with unevenly-spaced timeseries data from IoT devices. I'm new to SAS and had previously been using the Pyton traces package for handling this data.
One handy thing that traces enables is "simple analyses of unevenly-spaced time series data without making an awkward / lossy transformation to evenly-spaced representations." For example, if I wanted to know when any of a set of IoT devices were in an "on" state (do a logical OR), I could do that very easily with traces Timeseries objects - without first converting the data to a evenly-spaced representations and without writing any custom code to implement the logical or based on the uneven time steps.
Does SAS have any capabilities for directly performing simple analyses on unevenly-spaced time series data?
Here is the approach I would use to perform simple operations with base SAS coding:
/* Example data */
data t;
input timeStamp :datetime20. devId :$8. state;
format timestamp datetime17.;
datalines;
13SEP2016:01:01:00 A 0
13SEP2016:01:01:10 A 1
13SEP2016:01:01:11 A 0
13SEP2016:01:01:55 B 1
13SEP2016:02:20:00 A 1
13SEP2016:02:30:00 C 0
13SEP2016:02:33:00 B 0
13SEP2016:04:01:00 A 0
13SEP2016:05:01:00 C 1
13SEP2016:05:12:00 C 0
13SEP2016:05:23:00 C 1
13SEP2016:05:35:00 B 1
13SEP2016:06:00:00 A 1
13SEP2016:06:01:00 A 0
;
/* Get the list of devices */
ods output SQL_Results=devFmt;
proc sql number;
select unique devId, "devFmt" as fmtname, "I" as type
from t;
select cats("v_", Row, "=", quote(compress(devId)))
into :devList separated by " "
from devFmt;
quit;
proc format cntlin=devFmt(rename=(devId=start Row=label)); run;
/* Accumulate device states and perform simple operations */
data wide_t;
set t;
label &devList.;
retain v_:;
array _v v_:;
_v{input(devId, devFmt.)} = state;
OR_state = max(of v_:);
AND_state = min(of v_:);
NB_ON = sum(of v_:);
NB_OFF = n(of v_:) - NB_ON;
run;
proc print data=wide_t label noobs; run;
Note Specialized tools for time series are provided under the Forecasting and Econometrics (SAS/ETS) license. You might want to address your question to that Community
https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/bd-p/forecasting_econometrics
It depends on your data structure.
Determining what's an ON state could be as simple as a filter and not require any special package.
Re unevenly spaced time series, I'll let someone else answer.
Thanks Reeza. Really the key issue for me is the unevenly-spaced aspect of the problem, not the logical or part. The logical or is just an example. I'm trying to avoid going from unevenly-spaced to evenly spaced to do a computation like a logical or or sum or something, and then back again to unevenly-spaced because that's more concise and relevant for later analyses.
Hello -
SSM mastermind @rselukar point out to me that unevenly spaced time series are also called longitudinal data - there are 3 examples in PROC SSM documentation which deal with this type of situation:
Furthermore he was kind enough to point me to PROC SSM documentation, that describes the types of sequential data SSM can handle:
And ouf course you may want to check out his SAS Global Forum paper: http://support.sas.com/resources/papers/proceedings15/SAS1580-2015.pdf on this very topic.
Many thanks Rajesh!
Udo
I was not aware of the traces package. Taking a quick look at the package link I realize that you are interested in more basic analysis of unevenly spaced timeseries. SSM procedure is designed for model based analysis of such timeseries and you can do rather sofisticated analysis of such data. Without going into too many details, I am going to explain how to use the SSM procedure to do basic interpolation/extrapolation of an unevenly spaced timeseries. Assume that your input data set, test, has two columns time and y. The the time column contains the times associated with the measurements y (the times need not be evenly spaced, in fact, you can even have multiple measurements at the same time point). It is also assumed that the data set test is sorted according time, and the time points at which you want interpolated/extrapolated values of y are included in the data set with corresponding y missing. So the first few rows of the data set might look something like this:
time y
1.2 .
1.8 -2.6
2.0 .
8.3 1.3
Note y values at time points 1.2 and 2.0 are missing. You can obtain a smooth interpolation of y values as follows:
proc ssm data=test;
id time;
trend scurve(ps(2)) checkbreak;
irregular noise;
model y = scurve noise / print=smooth;
output out=for press;
run;
The interpolated/extrapolated values of y are printed. SSM procedure stores the estimate of the smoothed curve (called smoothed_scurve) in the output data set. You can plot it and see the fit as follows:
proc sgplot data=for;
series x=time y=smoothed_scurve;
scatter x=time y=y;
run;
Note that you can specify many more interesting models (see the examples mentioned earlier), use predictor info if available, the CHECKBREAK option in the TREND statement identifies possible locations of abrupt changes in the smoothed curve, and lot more things.
Thank you @udo_sas and @rselukar for this information about SSM! I see that it will be very helpful for extrapolation/interpolation, and also more powerful modeling of unevenly spaced timeseries data.
I'm still wondering if there is anything that will do simple arithmetic and logical operators on a set of unevenly-spaced timeseries data (such as what would come off of a set of related sensors). Each sensor will send data at different unevenly-spaced times. The example from the traces package website about adding up all the light switched on in a building (see below) is a good one, but I may also want to do different things like logical ORs to see when any light is on in the building, find the average number of lights on in the building, etc. Would SSM help with this? Or something else in SAS?
I think for this type of problem PROC TIMESERIES (SAS/ETS: http://support.sas.com/documentation/cdl/en/etsug/68148/HTML/default/viewer.htm#etsug_timeseries_toc...) and DATA STEP might be a better match. PROC TIMSERIES can take different time series that are recorded at different time instances and put them on a uniform time grid of your choice. It has many options to "fill" the gaps (the ACCUMULATE option) with suitable values, e.g., for your sensor data zero (indicating absence) might be a possible choice. After a data set of all uniformly "filled" series is created, you can do a variety of operation on these columns (summing, and/or, ...) via DATA step. PROC SSM could also be used for the same data preparation step if you want to do a more model based interpolation/extrapolation. For your simpler setup, PROC TIMESERIES might be sufficient and will be whole lot faster.
Yes this is basically what I'm doing now, although with a DATA step instead of PROC TIMESERIES. I was hoping to avoid the somewhat awkward and potentially memory-intensive transition from unevenly-spaced to evenly spaced for simple arithmetic and logical computations, and then back to unevenly-spaced for further analysis.
Anyhow thanks for your help @rselukar !
Here is the approach I would use to perform simple operations with base SAS coding:
/* Example data */
data t;
input timeStamp :datetime20. devId :$8. state;
format timestamp datetime17.;
datalines;
13SEP2016:01:01:00 A 0
13SEP2016:01:01:10 A 1
13SEP2016:01:01:11 A 0
13SEP2016:01:01:55 B 1
13SEP2016:02:20:00 A 1
13SEP2016:02:30:00 C 0
13SEP2016:02:33:00 B 0
13SEP2016:04:01:00 A 0
13SEP2016:05:01:00 C 1
13SEP2016:05:12:00 C 0
13SEP2016:05:23:00 C 1
13SEP2016:05:35:00 B 1
13SEP2016:06:00:00 A 1
13SEP2016:06:01:00 A 0
;
/* Get the list of devices */
ods output SQL_Results=devFmt;
proc sql number;
select unique devId, "devFmt" as fmtname, "I" as type
from t;
select cats("v_", Row, "=", quote(compress(devId)))
into :devList separated by " "
from devFmt;
quit;
proc format cntlin=devFmt(rename=(devId=start Row=label)); run;
/* Accumulate device states and perform simple operations */
data wide_t;
set t;
label &devList.;
retain v_:;
array _v v_:;
_v{input(devId, devFmt.)} = state;
OR_state = max(of v_:);
AND_state = min(of v_:);
NB_ON = sum(of v_:);
NB_OFF = n(of v_:) - NB_ON;
run;
proc print data=wide_t label noobs; run;
Thanks @PGStats ! This looks like the sort of thing I was hoping to find built in to some sort of PROC, but the code looks pretty straightforward. Thanks again!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.