03-24-2017 12:59 PM
I am interested in developing an approach to impute missing data for a time series of ambient concentrations. I have provided a data set that contains time variables of month, day, and hour, considering an entire year. Variables of interest for imputation are concentration data for 5 years, each year containing two types of concentration data though reported for each hour.
for example, 'hour_2011' contains the value for the hourly average concentration for that particular hour of the day in the month of year 2011, while 'max5_2011' contains the value for the 5-minute maximum concentration occuring in that same hour.
max5_xxxx can not be less than 'hour_xxxx' and cannot be greater than a factor of 12.
The mechanism for missing data could be considered as random.
There are some patterns in the concentrations: they can increase then decrease during certain times of the day, but it is more about when it happens it does increase then decrease, rather than occurring at specific times of day all the time.
Rather than simply using a mean value to impute between missing hour conc or max5 conc, i was looking to see if it were possible to include some variability in estimating both values within the particular year, informed by any trends that may be present such as:
1) considering the prior and post hour concentrations
2) other days' hourly patterns at similar concentration levels (seperating out when you have variation occuring over time periods vs for static conditions)
3) patterns occurring in other years?
though all the while preserving a reasonable relationship between the max5 and the hourly concentration predictions.
Thank you for your assistance.
03-24-2017 01:09 PM
Do you have access to SAS/ETS procedures? I think Proc Arima may do what you want but the data likely will need to be restructured. Your variable names make me think that you have multiple values representing multiple time points per record. The time series procedures in ETS will generally require one record per time period and the dates or datetime should be SAS date values.
03-24-2017 11:57 PM
1) Look at proc expand for spline based interpolation or proc loess for local regression. These techniques are limited to small holes in the time series.