I'd like to extrapolate missing values based on a fitted curve of existing data points, using version 9.4. Here's my data:
Year NPs_per_pop
2008 .
2009 .
2010 0.3624
2011 0.3971
2012 0.4366
2013 0.4804
2014 0.5291
2015 0.5859
Graphing the data looks like this:
I'd like to estimate 2008 and 2009 based on a fitted curve of the existing 2010-2015 values. Since I couldn't figure out how to "forecast" into the past, I first reversed the order of the data set, so that the first observation is from 2015 and the last is from 2008:
Order NPs_per_pop
1 0.5859
2 0.5291
3 0.4804
4 0.4366
5 0.3971
6 0.3624
7 .
8 .
The closest I've been able to come is estimating based on the straight line of best fit, using proc esm:
proc esm data=original out=estimated lead=2;
forecast NPs_per_pop / model=linear;
run;
which produces this (when the order is reversed again, so it goes from 2008-2015):
The linear model is an ok fit, but not great..... one reason is that the values should always be positive, fitting more of an exponential model. I tried the different options of proc esm (e.g., transform=logistic), but nothing seemed to populate 2008 and 2009 with a fitted curve. Any help on how to do that would be appreciated!
The SSM procedure handles a variety of models and it's syntax and output might take some getting used to. You can see Example 3 ("Backcasting, Forecasting, and Interpolation") in the SSM doc for an additional example. If you just want the back-casted values in your case, you can use the following modification of the code (print=smooth option in the MODEL statement):
proc ssm data=test;
id year interval=year;
trend curve(ps(2));
irregular wn;
model y = curve wn / print=smooth;
output out=for press;
run;
You could try PROC LOESS
or
proc reg data=have;
model y= x x^2 ;
quit;
@Rick_SAS wrote a blog about it before .
Thanks for you response.
Proc reg gave me an error under the exponent operator (**), saying "ERROR 22-322: Syntax error, expecting one of the following: a name, ;, -, /,:, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_, {."
proc reg data=reg;
model NPs_per_pop = year year**2;
quit;
I tried the syntax of proc loess in one of the examples given, but it wouldn't produce a smoothed plot of my variables:
ods graphics on;
proc loess data=reg;
ods output OutputStatistics = Fit
FitSummary=Summary;
model NPs_per_pop = year / degree=2 select=AICC(steps) smooth = 0.6 1.0
direct alpha=.01;
run;
ods graphics off;
Either way, though, neither procedure seems to produce an extrapolation of missing data points. Could you give a little more detail on how you were thinking those procedures would fill in missing values based on a fitted curve?
Like PROC ESM, PROC SSM is also part of SAS/ETS. You could use it for model based back-propagation, interpolation and forecasting. It is a bit more involved than ESM. Anyway, here is one possibility:
data test;
input year y@@;
year = mdy(1,1, year);
format year date.;
datalines;
2008 .
2009 .
2010 0.3624
2011 0.3971
2012 0.4366
2013 0.4804
2014 0.5291
2015 0.5859
;
proc ssm data=test;
id year interval=year;
trend curve(ps(2));
irregular wn;
model y = curve wn;
output out=for press;
run;
proc sgplot data=for;
scatter x=year y=y;
series x=year y=smoothed_curve;
reg x=year y=y;
run;
See the attached fit.
Thanks for your response. It looks like my computer doesn't have sufficient memory for proc ssm, though. I got the following error message:
8485 proc ssm data=reg;
8486 id year interval=year;
8487 trend curve(ps(2));
8488 irregular wn;
8489 model NPs_per_pop = curve wn;
8490 output out=for press;
8491 run;
ERROR: Insufficient memory for data reading.
I'll try to find a computer with more memory to run the code you suggested. Thanks again.
That is very strange. SSM should not have memory issues for such a small sized problem even on a basic computer. Anyway, keep me posted.
The SSM procedure handles a variety of models and it's syntax and output might take some getting used to. You can see Example 3 ("Backcasting, Forecasting, and Interpolation") in the SSM doc for an additional example. If you just want the back-casted values in your case, you can use the following modification of the code (print=smooth option in the MODEL statement):
proc ssm data=test;
id year interval=year;
trend curve(ps(2));
irregular wn;
model y = curve wn / print=smooth;
output out=for press;
run;
data test;
input year y@@;
datalines;
2008 .
2009 .
2010 0.3624
2011 0.3971
2012 0.4366
2013 0.4804
2014 0.5291
2015 0.5859
;
ods output sgplot=temp;
proc sgplot data=test;
reg x=year y=y/cli clm degree=2;
run;
proc print noobs;run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.