I am trying to interpolate data that has many missing values using proc expand. My problem is that I do not want to interpolate if two or more years of data are missing. From example below: I want to interpolate by item. Data from 2001-2006 are missing and I do not wish to interpolate values for 2007 from preceding data, because estimates would be inaccurate. But, I want to be able to use the data from 2007 to be able to interpolate values for 2008 on out. So, values in 2007 that are missing will remain missing during the interpolation, but the available values in 2007 (only the december value) can be linked to 2008.
Values are annual moving averages computed monthly, so I will have an annual average for 2007 even though I will have missing values for jan-nov. The omit var is an id var indicating that I wish to omit from interpolation because 2+ years are missing before it. See sample code below.
Example of data:
item year month value Omit
a 2000 1 # .
... ... .. ... ...
a 2000 12 # .
a 2007 1 . 1 *Here is the problem: item is only purchased for these two years but I need moving average to start over here.
a 2007 2 . 1
... ... .. ... ...
a 2007 12 # 1
a 2008 1 # .
... ... .. ... ...
a 2008 12 # .
a 2009 1 # .
... ... .. ... ...
a 2009 12 # .
proc expand data=old method=spline
out=new from=month outtest=test;
by itemclss item;
id date;
where omit~=1; **?????**
convert quantity value / observed=total
transformin=(log)
transformout=(exp);
run;
When I add the where statement, it removes the obs where omit=1, but still interpolates all values between and including 2000 and 2007. So, it creates outrageous values for months in 2001-2006. I just want to skip those observations and restart interpolation in 2007.
Is there any way I can create some sort of where statement, do loop, or macro even to accomplish this? -OR- is there another procedure that computes cubic splines and can handle conditional statements?
I found a solution to this problem. I simply used a count variable in date format to create a mock date variable. If you are having a similar problem and would like specifics, let me know.
If you do not wish to interpolate the value before 2007 Dec, you can set them missing and then interpolation would begin from 2007 Dec.
You said you found a good solution, would you please share it with me? thanks.
Another question, as you described in https://communities.sas.com/thread/39579?start=0&tstart=0
If a lot of continuous missing values occurred in one time series, using proc expand will product the extreme value which is unreasonable.
In my experience, using mean to fill will not make sense if the series have several high values.
I try to discard the period having too long missing or fill the missing use the value in the same point (e.g., use 2011 Aug to fill 2012 Aug ).but until now, I do not have a good solution for it.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.