BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mbloem
Fluorite | Level 6

I'm working with unevenly-spaced timeseries data from IoT devices. I'm new to SAS and had previously been using the Pyton traces package for handling this data.

 

One handy thing that traces enables is "simple analyses of unevenly-spaced time series data without making an awkward / lossy transformation to evenly-spaced representations." For example, if I wanted to know when any of a set of IoT devices were in an "on" state (do a logical OR), I could do that very easily with traces Timeseries objects - without first converting the data to a evenly-spaced representations and without writing any custom code to implement the logical or based on the uneven time steps. 

 

Does SAS have any capabilities for directly performing simple analyses on unevenly-spaced time series data?

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Here is the approach I would use to perform simple operations with base SAS coding:

 

/* Example data */
data t;
input timeStamp :datetime20. devId :$8. state;
format timestamp datetime17.;
datalines;
13SEP2016:01:01:00 A 0
13SEP2016:01:01:10 A 1
13SEP2016:01:01:11 A 0
13SEP2016:01:01:55 B 1
13SEP2016:02:20:00 A 1
13SEP2016:02:30:00 C 0
13SEP2016:02:33:00 B 0
13SEP2016:04:01:00 A 0
13SEP2016:05:01:00 C 1
13SEP2016:05:12:00 C 0
13SEP2016:05:23:00 C 1
13SEP2016:05:35:00 B 1
13SEP2016:06:00:00 A 1
13SEP2016:06:01:00 A 0
;

/* Get the list of devices */
ods output SQL_Results=devFmt;
proc sql number;
select unique devId, "devFmt" as fmtname, "I" as type 
from t;
select cats("v_", Row, "=", quote(compress(devId))) 
    into :devList separated by " " 
from devFmt;
quit;

proc format cntlin=devFmt(rename=(devId=start Row=label)); run;

/* Accumulate device states and perform simple operations */
data wide_t;
set t;
label &devList.;
retain v_:;
array _v v_:;
_v{input(devId, devFmt.)} = state;

OR_state = max(of v_:);
AND_state = min(of v_:);
NB_ON = sum(of v_:);
NB_OFF = n(of v_:) - NB_ON;

run;

proc print data=wide_t label noobs; run;
PG

View solution in original post

10 REPLIES 10
PGStats
Opal | Level 21

Note Specialized tools for time series are provided under the Forecasting and Econometrics (SAS/ETS) license. You might want to address your question to that Community

 

https://communities.sas.com/t5/SAS-Forecasting-and-Econometrics/bd-p/forecasting_econometrics

PG
Reeza
Super User

It depends on your data structure.

Determining what's an ON state could be as simple as a filter and not require any special package. 

 

Re unevenly spaced time series, I'll let someone else answer.

mbloem
Fluorite | Level 6

Thanks Reeza. Really the key issue for me is the unevenly-spaced aspect of the problem, not the logical or part. The logical or is just an example. I'm trying to avoid going from unevenly-spaced to evenly spaced to do a computation like a logical or or sum or something, and then back again to unevenly-spaced because that's more concise and relevant for later analyses.

udo_sas
SAS Employee

Hello -

SSM mastermind @rselukar point out to me that unevenly spaced time series are also called longitudinal data - there are 3 examples in PROC SSM documentation which deal with this type of situation:

Furthermore he was kind enough to point me to PROC SSM documentation, that describes the types of sequential data SSM can handle:

And ouf course you may want to check out his SAS Global Forum paper: http://support.sas.com/resources/papers/proceedings15/SAS1580-2015.pdf on this very topic.

 

Many thanks Rajesh!

Udo

 

 

rselukar
SAS Employee

I was not aware of the traces package.  Taking a quick look at the package link I realize that you are interested in more basic analysis of unevenly spaced timeseries.  SSM procedure is designed for model based analysis of such timeseries and you can do rather sofisticated analysis of such data.  Without going into too many details, I am going to explain how to use the SSM procedure to do basic interpolation/extrapolation of an unevenly spaced timeseries.  Assume that your input data set, test, has two columns time and y.  The the time column contains the times associated with the measurements y (the times need not be evenly spaced, in fact, you can even have multiple measurements at the same time point).  It is also assumed that the data set test is sorted according time, and the time points at which you want interpolated/extrapolated values of y are included in the data set with corresponding y missing.  So the first few rows of the data set might look something like this:

time    y

1.2      .

1.8      -2.6

2.0      .

8.3     1.3

 

Note y values at time points 1.2 and 2.0 are missing.  You can obtain a smooth interpolation of y values as follows:

 

proc ssm data=test;

   id time;

   trend scurve(ps(2)) checkbreak;

   irregular noise;

   model y = scurve noise / print=smooth;

   output out=for press;

run;

 

The interpolated/extrapolated values of y are printed.  SSM procedure stores the estimate of the smoothed curve (called smoothed_scurve) in the output data set.  You can plot it and see the fit as follows:

proc sgplot data=for;

   series x=time y=smoothed_scurve;

   scatter x=time y=y;

run; 

 

Note that you can specify many more interesting models (see the examples mentioned earlier), use predictor info if available, the CHECKBREAK option in the TREND statement identifies possible locations of abrupt changes in the smoothed curve, and lot more things. 

 

mbloem
Fluorite | Level 6

Thank you @udo_sas and @rselukar for this information about SSM! I see that it will be very helpful for extrapolation/interpolation, and also more powerful modeling of unevenly spaced timeseries data.

 

I'm still wondering if there is anything that will do simple arithmetic and logical operators on a set of unevenly-spaced timeseries data (such as what would come off of a set of related sensors). Each sensor will send data at different unevenly-spaced times. The example from the traces package website about adding up all the light switched on in a building (see below) is a good one, but I may also want to do different things like logical ORs to see when any light is on in the building, find the average number of lights on in the building, etc. Would SSM help with this? Or something else in SAS?


Capture.PNG
rselukar
SAS Employee

I think for this type of problem PROC TIMESERIES (SAS/ETS: http://support.sas.com/documentation/cdl/en/etsug/68148/HTML/default/viewer.htm#etsug_timeseries_toc...) and DATA STEP might be a better match.  PROC TIMSERIES can take different time series that are recorded at different time instances and put them on a uniform time grid of your choice.  It has many options to "fill" the gaps (the ACCUMULATE option) with suitable values, e.g., for your sensor data zero (indicating absence) might be a possible choice.  After a data set of all uniformly "filled" series is created, you can do a variety of operation on these columns (summing, and/or, ...) via DATA step.  PROC SSM could also be used  for the same data preparation step if you want to do a more model based interpolation/extrapolation.  For your simpler setup, PROC TIMESERIES might be sufficient and will be whole lot faster.

mbloem
Fluorite | Level 6

Yes this is basically what I'm doing now, although with a DATA step instead of PROC TIMESERIES. I was hoping to avoid the somewhat awkward and potentially memory-intensive transition from unevenly-spaced to evenly spaced for simple arithmetic and logical computations, and then back to unevenly-spaced for further analysis.


Anyhow thanks for your help @rselukar !

PGStats
Opal | Level 21

Here is the approach I would use to perform simple operations with base SAS coding:

 

/* Example data */
data t;
input timeStamp :datetime20. devId :$8. state;
format timestamp datetime17.;
datalines;
13SEP2016:01:01:00 A 0
13SEP2016:01:01:10 A 1
13SEP2016:01:01:11 A 0
13SEP2016:01:01:55 B 1
13SEP2016:02:20:00 A 1
13SEP2016:02:30:00 C 0
13SEP2016:02:33:00 B 0
13SEP2016:04:01:00 A 0
13SEP2016:05:01:00 C 1
13SEP2016:05:12:00 C 0
13SEP2016:05:23:00 C 1
13SEP2016:05:35:00 B 1
13SEP2016:06:00:00 A 1
13SEP2016:06:01:00 A 0
;

/* Get the list of devices */
ods output SQL_Results=devFmt;
proc sql number;
select unique devId, "devFmt" as fmtname, "I" as type 
from t;
select cats("v_", Row, "=", quote(compress(devId))) 
    into :devList separated by " " 
from devFmt;
quit;

proc format cntlin=devFmt(rename=(devId=start Row=label)); run;

/* Accumulate device states and perform simple operations */
data wide_t;
set t;
label &devList.;
retain v_:;
array _v v_:;
_v{input(devId, devFmt.)} = state;

OR_state = max(of v_:);
AND_state = min(of v_:);
NB_ON = sum(of v_:);
NB_OFF = n(of v_:) - NB_ON;

run;

proc print data=wide_t label noobs; run;
PG
mbloem
Fluorite | Level 6

Thanks @PGStats ! This looks like the sort of thing I was hoping to find built in to some sort of PROC, but the code looks pretty straightforward. Thanks again!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 3111 views
  • 6 likes
  • 5 in conversation