BookmarkSubscribeRSS Feed

Tips: Part 2, Identifying and Locating Missing Values and Gaps in Time Series Data

Started ‎12-18-2014 by
Modified ‎10-05-2015 by
Views 1,728

Long time series are often filled with missing values and gaps in time, but determining if your series has missing values and perhaps locating these values, isn’t always as easy as printing the data.

As a follow up to last week’s post, I will now explain how to use PROC TIMEDATA to find gaps in a time series.

I realize in my prior article I stated that I would be using PROC TIMESERIES, but for this example, “TIMESERIES” and “TIMEDATA” are interchangeable using this simple code shown.

You can simply replace the word TIMEDATA with TIMESERIES and keep the remaining syntax the same for the same results.

 

  1. Using PROC TIMEID, spans component, print where spans >1.
  2. Using PROC TIMEDATA

 

PROC TIMEDATA  will tell you how many missing variables you have, but will not tell you the number of gaps or where they are.

 

proc timedata data=here.neah outsum=outsum1 out=out1;

   id date interval=day  ;

   var varname;

run;

 

proc print data=outsum1;

run;

 

From the OUTSUM data set, you will see this table:


tab1.png

 

From the OUT data set, you can find where the series is missing by using the following code:

 

proc print data=out1;

   where _East__mm_= .;

run;

 

If we were dealing with all types of missing values, in the instance where the data set is already embedded with missing values rather than just gaps in time;

the following code would be more useful as to not overlook any special case missing values.

 

   where nmiss(_East__mm_) >= 0.;

 

Either will produce a table where the variable contains missing values. The only difference is that if we have special types of missing values in the data as well as gaps, they will not surface with the first code.

This would be an interesting topic to talk about at another time (Special Case Missing Values).

 

tab2.png

 

Summary:

PROC TIMEID tells you the number of time gaps greater than one interval, and then will locate them with a little extra code.

PROC TIMEDATA will tell you how many observations are actually missing.

It depends on which is more important to you, actual missing values or overall gaps in the data.

Additionally, TIMEID is able to determine the best interval if you are not sure of what it should be.

Both procedures are able to handle this task and will give you desirable results in a short amount of time. 

Version history
Last update:
‎10-05-2015 02:45 PM
Updated by:

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags