I have two continuous variables similar to the two below.
data cars; set sashelp.cars; keep horsepower weight; run;
How can I test if the relationship between the two is linear? As one might do before running a regression that assumes as much.
This analysis tests for your questions, assuming gaussian independent errors :
proc glimmix data=sashelp.cars;
effect extraSplineEffect = spline(weight);
model horsepower =
weight /* Is there a linear relationship ? */
extraSplineEffect /* Is there more than just noise in the residuals ? */
/ htype=1; /* Test these questions sequentially */
run;
Interpretation, for this example: Yes, there is a linear trend relating horsepower to weight. And yes, there remains more than noise in the residuals once the linear trend is removed.
Old school is to plot the data such as :
proc sgplot data=sashelp.cars; scatter x=weight y=horsepower; run;
If there is a strong or even medium strength relationship your eye will tell what form is likely.
The funnel shape from this data indicates something interesting may be going on. So:
proc sgplot data=sashelp.cars; scatter x=weight y=horsepower/ group=type; run;
And it looks like there may be a "better" linear relationship between weight and horsepower when vehicle Type is considered.
@kpberger wrote:
How can I test if the relationship between the two is linear? As one might do before running a regression that assumes as much.
As @ballardw said, you can always plot the raw data.
The linear regression itself is a form of test. There's really no need to perform a formal test for linearity before fitting a linear regression. If you assume it is linear, then fit the regression, plot the residuals. The advantage of plotting the residuals (as compared to plotting the raw data) is that deviations from linearity can be easier to spot in the residuals.
Can use proc gam to test this? Looking at the p-values on the Analysis of Deviance table. For example, if I am trying to predict horsepower using weight:
proc gam data=sashelp.cars plots=components(clm commonaxes); model horsepower=spline(weight); run;
The p-value for spline(weight) is <0.0001 - is this referring to a rejection of the hypothesis that the relationship is linear/the spline is necessary? I can not tell from documentation/a day of internet searching.
Honestly, you have ignored all of our advice to plot the data. Had you done so, I think the answer is relatively clear.
Your assumption is incorrect. I have plotted the data. It is still not clear to me whether the relationships are linear.
Edit: I clearly asked in the question if there was a test for this. If there is no test, the correct answer to my question is just "there is no test".
After looking at the plot, I can't imagine what other model form (quadratic, spline, non-linear, etc.) would be more appropriate for this data. What do you think is a better model if not linear?
The data I am using is not actually from a SAS help file. I offered that as an example to aid responses in being specific with code. The unshareable data I am using involves 30+ comparisons and the relationships are not clear.
Show us a scatterplot, with phony variable names.
I repeat, correlation, specifically PROC CORR.
See this blog post.
https://blogs.sas.com/content/iml/2011/08/26/visualizing-correlations-between-variables-in-sas.html
Visually examine the graphs and/or look at the correlation matrix.
@kpberger wrote:
The data I am using is not actually from a SAS help file. I offered that as an example to aid responses in being specific with code. The unshareable data I am using involves 30+ comparisons and the relationships are not clear.
"30+ comparisons"?
Are you suggesting 9 or more variables? It would take 9 variables to generate 36 unique correlations (8 to generate 28). That might change things, since you would be doing multiple comparisons. If you intend some formal "statistical" inference of linear relationship for each of the bivariate relations, then I believe you are at risk of a Multiple Comparisons Problem. I leave it to others to provide a more authoritative discussion of that possibility.
This analysis tests for your questions, assuming gaussian independent errors :
proc glimmix data=sashelp.cars;
effect extraSplineEffect = spline(weight);
model horsepower =
weight /* Is there a linear relationship ? */
extraSplineEffect /* Is there more than just noise in the residuals ? */
/ htype=1; /* Test these questions sequentially */
run;
Interpretation, for this example: Yes, there is a linear trend relating horsepower to weight. And yes, there remains more than noise in the residuals once the linear trend is removed.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.