11-04-2011 08:20 AM
I have been using regression on a GPLOT to project data into our next fiscal year (how time flies...)
I've seen that the regression equation is noted in the log of the job.
My client would like me to determine the actual data points for each month of next year to "add value" to the line on the graph.
Problem is that we have not licenced any STAT products (basically just SAS BASE) so I can't do it that way.
Is there a way that I can capture the values in the regression equation from the GLPOT and use them to make the calculations? I would rather have an automated source because I have a boat load of graphs that are produced. I could just drop the date values into the equation (with a DO loop to span the next year) and get the plot points.
Is this even possible? I would rather not have to run the graphs, pull the data from the logs and then run another job to create the monthly values.
Thanks in advance.
11-04-2011 08:33 AM
Here's a small sample showing how to use gplot's built-in regression line, if anybody wants to use it to experiment and try to find a solution to OS2Rules' question...
symbol1 value=dot cv=red interpol=rl ci=blue;
proc gplot data=sashelp.stocks;
plot close*date=1 / regeqn;
11-04-2011 09:22 AM
You could see if PROC GPLOT has any ODS output datasets by using ODS TRACE statement.
Otherwise you could fall back to the old method of redirecting the log to a file and reading it with a datastep.
filename templog temp;
proc printto log=templog;
proc printto log=log;
infile templog ;
* put a copy of the log back into the real log ;
put _infile_ ;
.... look for the regression results ...
11-04-2011 01:07 PM
Unfortunately, GPLOT will not give you the regression points through ODS OUTPUT; however, SGPLOT will. The examples below show both a regression fit and a loess fit. Some of the parameters for these fits can also be adjusted in the procedure. You need at lease SAS 9.2 to run them.
Hope this helps!
ods output sgplot=regdata;
proc sgplot data=sashelp.stocks;
reg y=close x=date / clm;
ods output sgplot=loessdata;
proc sgplot data=sashelp.stocks;
loess y=close x=date / clm;
11-04-2011 01:41 PM
I'm assuming you need the formula, not the data points but you can also manually calculate the slop and intercept using proc means and a bunch of their precalculated statistics without too much work. SInce you're applying this to a predicted dataset at some point you'll need a datastep anyways.
11-04-2011 01:54 PM
The formula is provided by the GPLOT - or rather by the interpolation method. I would like to "pick up" these values and plug in my future dates to get the data points.
11-04-2011 02:01 PM
I know, but that involves scanning the log, it isn't stored somewhere. Also the log doesn't specify the by group the formula is for when you have multiple groups, or at least mine didnt (SAS 9.2.3).
So since you're going to have to grab it somehow, you can either scan the log as Tom has mentioned above or calculate it elsewhere, ie using the manual calculations in Proc Corr (not means like I said earlier).
I'd personally prefer to calculate it elsewhere b/c the log doesn't specify which formula goes with which group/stock/by variable rather than read the log, but that's my preference.
The formula's are below and all the outputs required (ie mean/std/correlation) will be outputted from Proc Corr which is available in base.
11-04-2011 01:52 PM
This whole exercise started when SAS Institute told me that I could not use regression on a SGPLOT - which is what I was doing in the first place. They said that the only interpolation that could be used is "JOIN", which is of no help.
I had a SGPLOT with a histogram (VBAR chart) and a "line" overlayed - I wanted to use regression on the line.
Perhaps it's because I'm still on 9.1.3 ....
11-04-2011 01:35 PM
1) Statistically speaking, I'd be careful about using linear regression to extrapolate. In general, use regression to interpolate, and use time series forecasting to extrapolate.
2) If you decide to procede, there is still some work to do. Dan Heath's method will give you the points in the plot. However, you'll have to use two of those points to find the slope-intercept formula for the line of best fit. Once you have the equation, you can fit new points by using the DO loop technique that you mentioned in your post.
11-04-2011 01:59 PM
As it turns out, the regression is quadratic - here is what I see in the log:
WARNING: Least-square solutions for the parameters are not unique. QUADRATIC equation is used for the interpolation.
NOTE: Regression equation : cost_per_rec = 18263.67 - 1.915098*date + 0.00005*date^2.
AS can be seen, the formula is there - I just need to plug in my future dates, and bingo!