BookmarkSubscribeRSS Feed

Tips: Identifying and Locating Missing Values and Gaps in Time Series Data

Started ‎12-11-2014 by
Modified ‎10-05-2015 by
Views 3,349

Long time series are often filled with missing values and gaps in time, but determining if your series has missing values and perhaps locating these values, isn’t always as easy as printing the data.


We will address the issue when a time series contains gaps, such that actual missing values are not observed. For example, you know you have a daily series, but two of the observations are separated by 11 days, such as 01Jan2000 and 12Jan2000. If you quickly print the data, you won’t observe a typical missing value, “.”, because there are none. This isn’t always a practical option with large data sets and multiple by groups.


First we will run PROC TIMEID to determine if there are gaps in the data.

 

proc timeid data=inputdata outintervaldetails=outint;

   id date interval=day;

run;

 

You could also just use


    id date;

 

Omitting the interval is useful if you are unsure of the interval your data follow. This will lead to a best guess based on the data and will detect the best interval, which will be printed in the outintervaldetails  output data set on the INTERVAL variable.


Most of the time we know the interval at which our data are recorded and thus the second line of code should be used rather than omitting the interval=option.

 

proc print  data=outint;

   var timeid start end nspans minspan maxspan msg interval;

run;

 

table1.png

 

These are the variables in the outintervaldetails data set that will give information about the series. If NSPANS=1 this means your data are uniformly spaced and no missing time ids are present. SPANS refers to the number of intervals between time ids.  If NSPANS > 1, then you have gaps in your data.  Notice our data has 17 spans with a minimum of 1 and a maximum of 71.  MINSPAN will tell you the minimal number of spans between your dates. MAXSPAN will tell you the max number of spans between your dates There are many variables in this data set. Information about these can be found in the documentation. If you wish, you can omit the VAR statement and view all of the variables, but for this exercise these are sufficient for the information we need.


Additionally, suppose you wish to know WHEN missing dates occur in the time series. This might be easy in a series of hundreds of observations, but becomes difficult with tens of thousands, especially amidst BY groups.

 

The following code is useful in determining such information.

We will try two approaches. I will only cover PROC TIMEID today but could give this example using PROC TIMESERIES as well.

  1. Using PROC TIMEID, spans component, print where spans >1.
  2. Using PROC TIMESERIES


Using Proc TIMEID :

 

proc timeid data=here.neah outintervaldetails=all print=values;

   id date interval=day  ;

   ods output Decomposition=decomp;

run;


proc print data=decomp;

   where span>1;

run;


Here is a subset of the output

table2.png

Version history
Last update:
‎10-05-2015 02:47 PM
Updated by:

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags