We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Tips: Part 2, Identifying and Locating Missing Values and Gaps in Time Series Data

by SAS Employee Jennifer_beeman_sas_com on ‎12-18-2014 10:40 AM - edited on ‎10-05-2015 02:45 PM by Community Manager (1,060 Views)

Long time series are often filled with missing values and gaps in time, but determining if your series has missing values and perhaps locating these values, isn’t always as easy as printing the data.

As a follow up to last week’s post, I will now explain how to use PROC TIMEDATA to find gaps in a time series.

I realize in my prior article I stated that I would be using PROC TIMESERIES, but for this example, “TIMESERIES” and “TIMEDATA” are interchangeable using this simple code shown.

You can simply replace the word TIMEDATA with TIMESERIES and keep the remaining syntax the same for the same results.

 

  1. Using PROC TIMEID, spans component, print where spans >1.
  2. Using PROC TIMEDATA

 

PROC TIMEDATA  will tell you how many missing variables you have, but will not tell you the number of gaps or where they are.

 

proc timedata data=here.neah outsum=outsum1 out=out1;

   id date interval=day  ;

   var varname;

run;

 

proc print data=outsum1;

run;

 

From the OUTSUM data set, you will see this table:


tab1.png

 

From the OUT data set, you can find where the series is missing by using the following code:

 

proc print data=out1;

   where _East__mm_= .;

run;

 

If we were dealing with all types of missing values, in the instance where the data set is already embedded with missing values rather than just gaps in time;

the following code would be more useful as to not overlook any special case missing values.

 

   where nmiss(_East__mm_) >= 0.;

 

Either will produce a table where the variable contains missing values. The only difference is that if we have special types of missing values in the data as well as gaps, they will not surface with the first code.

This would be an interesting topic to talk about at another time (Special Case Missing Values).

 

tab2.png

 

Summary:

PROC TIMEID tells you the number of time gaps greater than one interval, and then will locate them with a little extra code.

PROC TIMEDATA will tell you how many observations are actually missing.

It depends on which is more important to you, actual missing values or overall gaps in the data.

Additionally, TIMEID is able to determine the best interval if you are not sure of what it should be.

Both procedures are able to handle this task and will give you desirable results in a short amount of time. 

Your turn
Sign In!

Want to write an article? Sign in with your profile.