I see. You might want to overlay the data on the contour plot. If the explanatory variables are correlated (which they probably are), the corner of the plots (which you think are optimal) might be outside the range of the data. You might be looking at the extrapolated predictions. Models will have high prediction errors and can even predict nonsensical results when you evaluate the model outside the range of the data. See the article "Interpolation vs extrapolation: the convex hull of multivariate data"
Anyway, don't use kriging or any other spatial analysis procedure since you do not have spatial data. Good luck.
Here's how to do it with Proc GContour, and let the ODS Style generate the color ramp for you ...
proc template;
define style styles.my_grad;
parent=styles.htmlblue;
style twocolorramp / startcolor=red endcolor=navy;
end;
run;
ods html style=my_grad;
proc gcontour data=sashelp.lake;
plot width*length=depth / nolegend pattern;
run;
Thanks, Robert. Great addition here.
And thanks again, Rick. Your suggestions are always so clear. Terrific.
Below is a contour/gradient chart produced by Proc LOESS, using the same data as above:
Notice how it differs greatly from the contour chart from Proc GCONTOUR (way above).
The difference seems primarily due to two Proc LOESS model parameters:
degree = 2
interp=cubic
Clueless as to what to use, frankly.
Other SAS procedures that appear to hold promise in producing a fitting contour/gradient representation are:
Proc GAM
Proc GAMPL
Researching at present how to use these. Are they even appropriate.
Objective, once again: X1, X2 ---> Y. Graphically. Contour. Gradient.
Any leads greatly appreciated.
Nicholas
Don't use GAM, which is old and slow. Use GAMPL instead.
There is also ADAPTIVEREG, HPSPLIT, spline bases in GLMSELECT, and more.
Every nonparametric regression model (including a contour plot, which is a simple regression model) will yield a different predicted surface. That is the problem with nonparametrics. I wrote a blog post in which I discuss my philosophy on the topic of nonparametric fits, which is "trust, but verify." Unless I see the same features represented in several fits, I remain skeptical that the features are true.
There are, of course, statistical ways to measure the goodness of fit, including AICC, BIC, and other "information criteria" measures. You can also use prediction measures such as MSE.
I don't usually recommend binning a continuous variable into discrete bins, but to simplify this problem (which, to me, seems very ambitious), you might consider binning the technical indicators into 3 (or 5 bins) such as low, med, and high. Then you can look at a simple measure such as the mean (or median) profit in each of the resulting nine bivariate bins. The interpretation becomes very simple, such as "traders can expect a profit (after 3 days) if they buy when Indicator 1 is high and Indicator 2 is low."
Rick, your apparent understanding of the matter at hand seems to be spot on.
For the moment I'm curious why you included HPSPLIT as a lead, as you know there aren't contour/gradient plots resulting.
Yet, it popped into your thoroughly experienced mind.
I've yet to actually try HPSPLIT. As with all the Procs, will take some time to digest.
But, from what I see so far, using the model:
Performance = Levels Indicator #1, Levels Indicator #2, ... , Levels Indicator #n.
A regression tree will be created.
In case at hand, Performance is a continuous variable. How a stock performs over the next three days, in percent. No categorical variables that I can think of.
Yes, I appreciated your article. I'd prefer as much corroboration as possible from the charts and statistics.
Thousands of hedge funds out there, surely with at least one of your counterparts on staff.
Thanks for being here.
The reason I mentioned HPSPLIT is that it is yet another nonparametric regression procedure in SAS. Although you used the language of contour plots to ask your question, your question is really about fitting a response surface to two explanatory variables. GCONTOUR fits one surface, LOESS fits a different surface, GAMPL fits yet another surface, and so forth. A problem with nonparametric regression is that it is often difficult to interpret the model in a way that leads to business actions like "buy here, sell there." The advantage of HPSPLIT is that it translates to business rules that are easy to understand and follow.
Regarding continuous vs categorical, I wasn't suggesting that you bin the performance, which is the response variable. I was suggesting that you bin the explanatory variables. That is essentially what PROC HPSPLIT does, although it uses nonuniform, data-dependent, 2D bins.
Thanks, Rick. I wouldn't have thought of exploring HPSplit without your suggesting it. So far seems like a very powerful package, and potentially quite useful.
I'm still experimenting with different parameters in the available options, of which there are many.
A few thoughts:
1. Is the process really legit? From Google search, few have used it. Others in comments to posts you have written on HPSplit have wondered why anyone would find a use for it, saying that ADAPTIVEREG was the more accepted tool. You tended to agree, saying that HPSlit has "some" uses, and that you would be writing an article comparing the two, showing pros and cons.
2. How can one decide which parameters to use? Goodness. Here are the main ones. Seems typical user has to just guess at each, and see what happens.....
cvmethod=random(?)
intervalbins=?
maxbranch=?
maxdepth=?
mincatsize=?
minleafsize=?
nsurrogates=?
grow ?
prune ?
If anyone out there with significant HPSplit experience can shed light on ANY of the above parameters -- what worked for you, or what's to be avoided, and why -- please speak up.
Thanks!
I don't have strong feelings about whether you should use HPSPLIT or some other procedure. If you have programming questions, I suggest you start a new thread that asks about the HPSPLIT options. I don't have much experience with it myself, but others might have valuable experience to share. Good luck on your project.
Thanks again, Rick. Raises another quick question, though:
Different procedures ought to be coming up with pretty similar results.
So, why re-invent the wheel?
Does it not make sense to focus on one terrific Proc, and master that?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Save $200 when you sign up by March 14!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.