Hi all. In a Generalized Additive Model with a continuous predictor, is it possible to include interaction by a binary or categorical predictor? I know that in Proc GAM including
spline2(x1, x2)
will run a thin-plate spline with a nice graph showing interaction between two continuous predictors. Is there anything analogous for a binary/categorical interaction predictor? I hope I don't have to do this in R! Thanks.
If you just want to include a constant shift the depends on a CLASS variable, you can use
CLASS A;
model y = param(A) ...;
However, it sounds like you want to include interaction terms between a continuous and a classification variable. I don't think PROC GAMPL does that automatically, but you can use a SAS procedure to generate the design matrix that includes the interaction effects or you can create them manally. You can then include spline terms for those interaction effects. For example, here is some fake data in which the response depends on the class levels:
data MyData;
do x = 1 to 10 by .1;
group="A"; y = 2*x + sin(x) + rand("Normal"); output;
group="B"; y = 10 - x - sin(x) + rand("Normal"); output;
end;
run;
data Have;
set MyData;
x_A = x*(group='A'); /* manually create the interaction terms */
x_B = x*(group='B');
run;
proc gampl data=Have plots=components;
class group;
model Y = param(group | x) spline(x_A) spline(x_B); /* semiparametric */
/* model Y = spline(x_A) spline(x_B); *or pure parametric; */
output out=GamPLOut pred=p;
id Y X group;
run;
proc sort data=GamPLOut;
by group x;
run;
proc sgplot data=GamPLOut;
scatter x=x y=y / group=group;
series x=x y=p / group=group;
run;
Calling @Rick_SAS
If you just want to include a constant shift the depends on a CLASS variable, you can use
CLASS A;
model y = param(A) ...;
However, it sounds like you want to include interaction terms between a continuous and a classification variable. I don't think PROC GAMPL does that automatically, but you can use a SAS procedure to generate the design matrix that includes the interaction effects or you can create them manally. You can then include spline terms for those interaction effects. For example, here is some fake data in which the response depends on the class levels:
data MyData;
do x = 1 to 10 by .1;
group="A"; y = 2*x + sin(x) + rand("Normal"); output;
group="B"; y = 10 - x - sin(x) + rand("Normal"); output;
end;
run;
data Have;
set MyData;
x_A = x*(group='A'); /* manually create the interaction terms */
x_B = x*(group='B');
run;
proc gampl data=Have plots=components;
class group;
model Y = param(group | x) spline(x_A) spline(x_B); /* semiparametric */
/* model Y = spline(x_A) spline(x_B); *or pure parametric; */
output out=GamPLOut pred=p;
id Y X group;
run;
proc sort data=GamPLOut;
by group x;
run;
proc sgplot data=GamPLOut;
scatter x=x y=y / group=group;
series x=x y=p / group=group;
run;
Thank you for such a complete answer! This works perfectly. Would you mind telling me what the ID statement adds? I'm curious as to what it does.
Yay! Glad it satisfies your needs.
In traditional SAS procedures, the OUTPUT statement automatically copies all variables in the input data to the output data set. This can be expensive with large data, so the newer predictive modeling procedures only write out the variables you request. The predictions, residuals, CLs, etc are specified on the OUTPUT statement. If you want to copy any variables from the input data, you use the ID statement.
I used the ID statement so that I could plot the original data in the SGPLOT call. An alternative is to merge the predicted values (in GAMPLOut) and the original data in a separate DATA step.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.