Hi all -
I have run a linear regression on my students' test score data and the r-squared value is high and p-value is low which would be great - except my residuals plot showed a different story. The residual plots did not show uniform randomness, my data appeared to be curved/not-linear. My residuals histogram is skewed a bit to the right.
Seems as though I need to transform a variable or create a nonlinear model. But I need help with SAS on how to do this. Some random, sample data is attached here that shows the types of data i'm using. I am using the following variables:
Dependent Variable: Mean Test Score
Independent Variable: % Free and Reduced Lunch
& I am separating my analyses by test subject (literacy or math).
I have NO idea how to run a procedure in SAS to help me build a nonlinear model. Is anyone able to assist me or point me in the right direction?
I tried PROC NLIN, but get lost with the parameters...
To give us an idea of the type of model required, could you post the graph resulting from a loess fit, something like:
proc sort data=myData; by subject frl; run;
proc sgpanel data=myData;
panelby subject / onepanel;
loess y=score x=frl;
run;
Absolutely! See attached for the Loess graphs.
Those relationships look fairly linear to me.Try exploring the linear effects with something like
proc glm data=myData plots=(diagnostics residuals);
where subject ne "Spanish Language Art...";
class subject level;
model ssAvg = FRLPercent subject|level FRLPercent*subject FRLPercent*level;
output out=myOutput r=resid p=pred;
run;
Hmmmm, yeah the residuals look pretty good when I do the proc glm like that, see attached. What do you think?
Very flexible models can be fit by using splines. Models with splines can be easily fit in PROC GAMPL. See the examples in the GAMPL documentation in the SAS/STAT documentation.
Hi Dave, thanks so much for responding.
Does this look right?
proc gampl data=cmasRegAnalysis;
class subject level;
model ssAvg = spline(FRLPercent);
run;
However the proc gampl doesn't allow me to separate the output by subject and level?
If you want to fit separate spline models for each subject and level combination, then specify BY SUBJECT LEVEL; rather than CLASS SUBJECT LEVEL; . If instead you want a single model including those variables as predictors, then keep the CLASS statement and add PARAM(SUBJECT LEVEL) in your MODEL statement.
Thank you! Attached is my output. Do you have any good resources to help me understand my output from proc gampl?
GAMPL looks for a form of the spline that fits the shape of your data. The linear shape of what it decided on and the Effective degrees of freedom equal to 1 indicates that there is no nonlinearity. In the examples in the GAMPL documentation you will see examples where the plotted spline shape and the degrees of freedom are quite nonlinear.
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.