BookmarkSubscribeRSS Feed

Working with Time Series Data in IoT

Started ‎10-06-2022 by
Modified ‎10-05-2022 by
Views 835

612491256.jpg

 

The term IoT, or Internet of Things, refers to the collective network of connected devices and the technology that facilitates communication between devices and the cloud, as well as between the devices themselves. The measurements being sent by the sensors (wearables, connected devices, machinery, etc.) are always captured at a specific moment in time. This makes it vitally important to have the right tools to work with timestamp information be it for visualization of the timestamp data, creation of aggregated fields, or building analytical models.

 

In this article, we provide tips to makes it easier to work with timestamped IoT data using SAS’ Wrapper for Analytics Transfer (SWAT). SWAT is the Python module used to interface with SAS' Cloud Analytics Services (CAS).  It provides the best of both worlds by combining SAS Analytics in the cloud with Python’s large collection of open-source packages.

 

To help demonstrate this, let’s use a simple IoT csv file that has the following fields:

  1. Timestamp –Time at which the measurement was taken
  2. Measure – Measurement from a device taken every minute
  3. Device – Nominal variable indicating the device name

 

Below is a sample record from a csv file:

Timestamp,Measure,Device
2/24/2015 0:00,35,A
2/24/2015 0:01,32,A
2/24/2015 0:02,30,A
2/24/2015 0:03,31,A
2/24/2015 0:04,44,A
2/24/2015 0:05,29,A
2/24/2015 0:06,45,A
2/24/2015 0:07,43,A

 

First, load the data into CAS using the loadtable CAS action. If you load the file 'as is' it will render the timestamp variable to be of type varchar. You will need to ensure the timestamp variable has the right datetime format so it is amenable for any further processing using CAS actions, SAS functions, etc.

 

Tip #1:  Read in the Timestamp variable with the correct informat and format while loading the data using the loadtable CAS action. It ensures that the data type and format for the Timestamp variable is captured in a desired way.

s.loadtable("timestamp_test.csv",                            
              casout= dict(name='Timeseries', replace=True),
              importoptions= dict(filetype='csv', vars=[dict(name='Timestamp', informat='anydtdtm19.', format='DATETIME19.')])
            )

 

As a data analyst/scientist explores the timestamp data, he/she might be interested in sub-setting the data on a time-based condition.

 

Tip #2: The 'where' option in the CASTable object enables sub setting operations on the CAS table quite easily. The convenience of using the CASTable object enables the user to run different sub-setting scenarios on the fly without making changes to the actual in-memory CAS table. 

tbl = s.CASTable('timeseries', where = "Timestamp < '2015-02-24 00:06:00'dt ")

Any other operation you want to perform on the sub-setted data can be applied directly to the  above CASTable object, tbl, by using a variety of CAS actions available to the end user. For example, a simple summary statistic on the Measure variable of the sub-setted data can be achieved by simply applying the summary CAS action directly on the CASTable object, tbl.

tbl.summary(inputs='Measure')

The SWAT package uses many features of the Pandas Python package and other dependencies of Pandas. This enables the user to directly apply Pandas methods on the CASTable object. For example, a frequency count on the Device column can be computed using the value_counts methods of Pandas API. 

tbl['Device'].value_counts()

Since the SWAT API tries to blend the world of CAS and Pandas into a single entity, you have to be aware of whether you are calling a CAS action or a method from the Pandas API and understand the differences in the output objects produced from those operations. This link is very useful to the end user trying to navigate through these concepts. 

 

Often times when exploring the data, one is also interested in creating new computed columns based off certain conditions etc.

 

Tip #3: Let us assume that there is a 3-hour lag in the process of collecting the data. Essentially, we would like to create a new column that captures the time at which the measurements were truly registered by subtracting 3 hours from the current Timestamp variable. A quick way to do this is by use of the options computedvars and computedvars program.  We first declare the variables with the correct format in the computedvars option. The computedvarsprogram allows you to define those calculations. In the code snippet below the SAS function intnx is used to account for the 3-hour difference. We capture this in the new variable StartDate  with the desired datetime format. The second computed variable, MonthName, in the code snippet below,  shows an example of extracting the month value from the Timestamp variable which can be easily done by using the SAS datepart and PUT function with the choice of date format we want. 

tbl1 = s.CASTable('TimeSeries', computedvars=[ dict(format='DATETIME19.', name='StartDate'), 'MonthName' ], computedvarsprogram = "StartDate = intnx('HOUR',TimeStamp,-3,'SAME'); MonthName=put(datepart(Timestamp),MONNAME3.); ")

It is important to note that CASTable objects are client-side views of the data. Setting the parameters as shown in the tips will have no effect on the actual table in the server. However, it is a great tool for on-the-fly analysis during data exploration.

 

Tip #4: Once you have created a CASTable object after experimenting with all your filters and computed columns, you will want to materialize them to the server as an in-memory table. Let’s say we want to materialize the newly computed variables into an actual in-memory CAS table. Any CAS action that reads a CAS table and writes it to a new location (e.g. PartitionIndexShuffle, etc.) can be used to do this. Below, we will use the partition CAS action whose output has a key named casTable (note that this is different from the CASTable object we referenced earlier) that contains the reference to the new in-memory CAS table with the computed columns. 

tbl2 = tbl1.partition().casTable

 

We hope you found this helpful. For additional examples about working with IoT data, please visit developer.sas.com.

Version history
Last update:
‎10-05-2022 04:40 PM
Updated by:
Contributors

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags