02-07-2016 07:21 PM - edited 02-07-2016 09:04 PM
I would like to do a segmented multiple regression with fixed effects (with and without interactions). Does SAS has any procedures to handle this task. I've read the PROC NLIN can do a segmented regression, however I can only find tutorials on a segmented regression on a single predictor. I'll be very appreciate if you could point to me any samples on the issue. Links to tutorials would be nice too.
I'm currently working with SAS EG 6.1.
02-10-2016 10:02 AM
Segmented regression is well defined for a single predictor. You can specify a number of breakpoints (single points) that separate the domain into disjoint segments on which to run a regression.
When you have more than one explanatory variable, segmented regression is not well defined. For example, with two variables, you can break up the domain into rectangle and solve the regression problem on each rectangle. But you can also use triangles or squiggly curves.
The way to approach this topic for multivariate regression is to switch from segmented regression to local regression. In local regression, the predicted value at each point p is obtained by solving a regression problem that involves the data points that are close to p. Often a kernel function is used so that points close to p carry more weight than points far from p.
This method is call LOESS for LOcal EStimation. In SAS, it is supported by PROC LOESS. The documentation includes a two-diensional example that you should look at.
There are other nonparametric regression techniques in SAS, but PROC LOESS seems to be most similar to segmented regression.
02-10-2016 12:38 PM - edited 02-15-2016 02:37 PM
I'll add on to Rick's reply about segmented multiple regression. Aside from partitioning the m-space of the predictor variables into the appropriate number of subdomains, you must also enforce both continuity and smoothness constraints along all points on the (m-1)*(m-2)/2 intersections (hope I have the right number there). These constraints may not be amenable to optimization.
Consider the loess approach, where "join contours" can be determined by looking at the gradient and hessian of the final solution.
EDIT: Well it wasn't the right number. If there are m predictor variables, the number of intersections is m*(m-1)/2.
02-10-2016 05:00 PM - edited 02-10-2016 05:06 PM
Hi Rick and Steve,
Thank you very much for your posts. I'll study the LOESS for more details. As I just come across to Adaptivereg, I would like to ask for your comments.
According to the SAS guide on Adaptivereg (http://support.sas.com/documentation/cdl/en/statug/65328/HTML/default/viewer.htm#statug_adaptivereg_... the LOESS and TPSPLINE procedures are limited to problems in low dimensions. My base model includes 5 countinuous and 3 categorical predictor variables. I only expect one break in 2 continuous variables each and I assume each segements are linear. Besides, there will be linear relationships between the response variable with the remaining 3 continuous variables. There might be categorical by continuous interactions which I need to study for a full model (1 categorical * 5 continuous variable interactions).
In my situation, what approach is most feasible to look at.
02-11-2016 07:52 AM
PROC LOESS does not have a CLASS statement, and as you say it is best for low-dimensional problems. I agree that you should look at PROC ADAPTIVEREG, which is a nonparametric routine that can fit flexible models to data.
Now that you've explained your problem more, there might be another approach. You say that you only want piecewise linear functions for two of the variables. Look at using the EFFECT statement to create linear splines (DEGREE=1) for those two variables. IF YOU KNOW THE LOCATIONS for the breakpoints, you can use the KNOTMETHOD=LIST(..) option to specify the locations that you want those variables split. You would want to use a truncated power basic function (degree=1) for the splines. Notice, however, that in this approach the knot positions are fixed, not parameters that are chosen by the procedure.
I've never done this before, but it might work.
02-16-2016 08:20 PM
Hi Rick and all again,
Have tried Adaptivereg, the interpretation for model without interaction between variables is straightforward. However, I have a big trouble with one with interaction (2 ways only). The issue is that SAS automatically selects the model using stepwise regressions and it comes up with quite many unwanted interactions. I have checked the SAS manual but could not find an option where I can define my interactions. Have I missed something? or if not, can any of you suggest me a way to work arround this problem. I have reduced the number of basis functions but not sure if it is a good approach, and still, several unwanted interactions appear in my final model.
02-17-2016 03:44 PM
Would using the KEEP= option in the MODEL statement enable to look only at the interactions of interest? That fixes those terms in the model, and I would assume the basis vectors describing them. If they were exhaustive of all the model terms, then I think you would have what you need.
02-17-2016 05:03 PM - edited 02-17-2016 05:07 PM
David, Keep= does not work with interactions. It only deals with variables. If I create a new variable as my interaction, I'm not quite sure break points would behave correctly.
02-18-2016 07:58 AM
What does your current MODEL statement look like? I think if you specifically add the interactions to the MODEL statement, then the KEEP= option should work, but I am not sure.
02-18-2016 06:26 PM
My wanted model is:
Y= continuous1 continuous2 continuous3 continuous4 continuous5 Categorical1 categorical2 categorical3 continuous1*Categorical1 continuous2*categorical1 continuous3*categorical1 continuous4 *categorical1 continuous5*categorical1 continuous1*continuous2
I expect there will be breaks in continuous1 and continuous2. There will be linear dependencies in the remaining 3 continuous variables.
How do you reckon? Thanks, Mai