HI folks, I am interested in developing an approach to impute missing data for a time series of ambient concentrations. I have provided a data set that contains time variables of month, day, and hour, considering an entire year. Variables of interest for imputation are concentration data for 5 years, each year containing two types of concentration data though reported for each hour. for example, 'hour_2011' contains the value for the hourly average concentration for that particular hour of the day in the month of year 2011, while 'max5_2011' contains the value for the 5-minute maximum concentration occuring in that same hour. Conditions: max5_xxxx can not be less than 'hour_xxxx' and cannot be greater than a factor of 12. The mechanism for missing data could be considered as random. There are some patterns in the concentrations: they can increase then decrease during certain times of the day, but it is more about when it happens it does increase then decrease, rather than occurring at specific times of day all the time. Rather than simply using a mean value to impute between missing hour conc or max5 conc, i was looking to see if it were possible to include some variability in estimating both values within the particular year, informed by any trends that may be present such as: 1) considering the prior and post hour concentrations 2) other days' hourly patterns at similar concentration levels (seperating out when you have variation occuring over time periods vs for static conditions) 3) patterns occurring in other years? though all the while preserving a reasonable relationship between the max5 and the hourly concentration predictions. Thank you for your assistance.
... View more