BookmarkSubscribeRSS Feed
gan024059
Calcite | Level 5

Hi all - 

 

 

I have run a linear regression on my students' test score data and the r-squared value is high and p-value is low which would be great - except my residuals plot showed a different story. The residual plots did not show uniform randomness, my data appeared to be curved/not-linear. My residuals histogram is skewed a bit to the right.

 

Seems as though I need to transform a variable or create a nonlinear model. But I need help with SAS on how to do this. Some random, sample data is attached here that shows the types of data i'm using. I am using the following variables:

 

Dependent Variable: Mean Test Score

Independent Variable: % Free and Reduced Lunch

 

& I am separating my analyses by test subject (literacy or math).

 

I have NO idea how to run a procedure in SAS to help me build a nonlinear model. Is anyone able to assist me or point me in the right direction?

 

I tried PROC NLIN, but get lost with the parameters...

 

 

10 REPLIES 10
PGStats
Opal | Level 21

To give us an idea of the type of model required, could you post the graph resulting from a loess fit, something like:

 

proc sort data=myData; by subject frl; run;

proc sgpanel data=myData;
panelby subject / onepanel;
loess y=score x=frl;
run;
PG
gan024059
Calcite | Level 5

Absolutely! See attached for the Loess graphs.

PGStats
Opal | Level 21

Those relationships look fairly linear to me.Try exploring the linear effects with something like

 

proc glm data=myData plots=(diagnostics residuals);
where subject ne "Spanish Language Art...";
class subject level;
model ssAvg = FRLPercent subject|level FRLPercent*subject FRLPercent*level;
output out=myOutput r=resid p=pred;
run;
PG
gan024059
Calcite | Level 5

Hmmmm, yeah the residuals look pretty good when I do the proc glm like that, see attached. What do you think?

StatDave
SAS Super FREQ

Very flexible models can be fit by using splines. Models with splines can be easily fit in PROC GAMPL. See the examples in the GAMPL documentation in the SAS/STAT documentation.

gan024059
Calcite | Level 5

Hi Dave, thanks so much for responding.

 

Does this look right?

 

proc gampl data=cmasRegAnalysis;
class subject level;
model ssAvg = spline(FRLPercent);
run;

gan024059
Calcite | Level 5

However the proc gampl doesn't allow me to separate the output by subject and level?

StatDave
SAS Super FREQ

If you want to fit separate spline models for each subject and level combination, then specify BY SUBJECT LEVEL; rather than CLASS SUBJECT LEVEL; .  If instead you want a single model including those variables as predictors, then keep the CLASS statement and add PARAM(SUBJECT LEVEL) in your MODEL statement.

gan024059
Calcite | Level 5

Thank you! Attached is my output. Do you have any good resources to help me understand my output from proc gampl?

StatDave
SAS Super FREQ

GAMPL looks for a form of the spline that fits the shape of your data. The linear shape of what it decided on and the Effective degrees of freedom equal to 1 indicates that there is no nonlinearity. In the examples in the GAMPL documentation you will see examples where the plotted spline shape and the degrees of freedom are quite nonlinear.

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 3249 views
  • 4 likes
  • 3 in conversation