BookmarkSubscribeRSS Feed
SGraham
Calcite | Level 5

HI folks,

I am interested in developing an approach to impute missing data for a time series of ambient concentrations. I have provided a data set that contains time variables of month, day, and hour, considering an entire year. Variables of interest for imputation are concentration data for 5 years, each year containing two types of concentration data though reported for each hour.

 

for example, 'hour_2011' contains the value for the hourly average concentration for that particular hour of the day in the month of year 2011, while 'max5_2011'  contains the value for the 5-minute maximum concentration occuring in that same hour.

 

Conditions:

max5_xxxx can not be less than 'hour_xxxx' and cannot be greater than a factor of 12.

The mechanism for missing data could be considered as random.

 

There are some patterns in the concentrations: they can increase then decrease during certain times of the day, but it is more about when it happens it does increase then decrease, rather than occurring at specific times of day all the time.

 

Rather than simply using a mean value to impute between missing hour conc or max5 conc, i was looking to see if it were possible to include some variability in estimating both values within the particular year, informed by any trends that may be present such as:

1) considering the prior and post hour concentrations

2) other days' hourly patterns at similar concentration levels (seperating out when you have variation occuring over time periods vs for static conditions)

3) patterns occurring in other years?

 

though all the while preserving a reasonable relationship between the max5 and the hourly concentration predictions.

 

Thank you for your assistance.   

3 REPLIES 3
ballardw
Super User

Do you have access to SAS/ETS procedures? I think Proc Arima may do what you want but the data likely will need to be restructured. Your variable names make me think that you have multiple values representing multiple time points per record. The time series procedures in ETS will generally require one record per time period and the dates or datetime should be SAS date values.

PGStats
Opal | Level 21

What have you done on your own on this problem?

PG
PGStats
Opal | Level 21

1) Look at proc expand for spline based interpolation or proc loess for local regression. These techniques are limited to small holes in the time series.

PG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1320 views
  • 0 likes
  • 3 in conversation