- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have two continuous variables similar to the two below.
data cars; set sashelp.cars; keep horsepower weight; run;
How can I test if the relationship between the two is linear? As one might do before running a regression that assumes as much.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This analysis tests for your questions, assuming gaussian independent errors :
proc glimmix data=sashelp.cars;
effect extraSplineEffect = spline(weight);
model horsepower =
weight /* Is there a linear relationship ? */
extraSplineEffect /* Is there more than just noise in the residuals ? */
/ htype=1; /* Test these questions sequentially */
run;
Interpretation, for this example: Yes, there is a linear trend relating horsepower to weight. And yes, there remains more than noise in the residuals once the linear trend is removed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Old school is to plot the data such as :
proc sgplot data=sashelp.cars; scatter x=weight y=horsepower; run;
If there is a strong or even medium strength relationship your eye will tell what form is likely.
The funnel shape from this data indicates something interesting may be going on. So:
proc sgplot data=sashelp.cars; scatter x=weight y=horsepower/ group=type; run;
And it looks like there may be a "better" linear relationship between weight and horsepower when vehicle Type is considered.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@kpberger wrote:
How can I test if the relationship between the two is linear? As one might do before running a regression that assumes as much.
As @ballardw said, you can always plot the raw data.
The linear regression itself is a form of test. There's really no need to perform a formal test for linearity before fitting a linear regression. If you assume it is linear, then fit the regression, plot the residuals. The advantage of plotting the residuals (as compared to plotting the raw data) is that deviations from linearity can be easier to spot in the residuals.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can use proc gam to test this? Looking at the p-values on the Analysis of Deviance table. For example, if I am trying to predict horsepower using weight:
proc gam data=sashelp.cars plots=components(clm commonaxes); model horsepower=spline(weight); run;
The p-value for spline(weight) is <0.0001 - is this referring to a rejection of the hypothesis that the relationship is linear/the spline is necessary? I can not tell from documentation/a day of internet searching.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Honestly, you have ignored all of our advice to plot the data. Had you done so, I think the answer is relatively clear.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your assumption is incorrect. I have plotted the data. It is still not clear to me whether the relationships are linear.
Edit: I clearly asked in the question if there was a test for this. If there is no test, the correct answer to my question is just "there is no test".
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
After looking at the plot, I can't imagine what other model form (quadratic, spline, non-linear, etc.) would be more appropriate for this data. What do you think is a better model if not linear?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The data I am using is not actually from a SAS help file. I offered that as an example to aid responses in being specific with code. The unshareable data I am using involves 30+ comparisons and the relationships are not clear.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Show us a scatterplot, with phony variable names.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I repeat, correlation, specifically PROC CORR.
See this blog post.
https://blogs.sas.com/content/iml/2011/08/26/visualizing-correlations-between-variables-in-sas.html
Visually examine the graphs and/or look at the correlation matrix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@kpberger wrote:
The data I am using is not actually from a SAS help file. I offered that as an example to aid responses in being specific with code. The unshareable data I am using involves 30+ comparisons and the relationships are not clear.
"30+ comparisons"?
Are you suggesting 9 or more variables? It would take 9 variables to generate 36 unique correlations (8 to generate 28). That might change things, since you would be doing multiple comparisons. If you intend some formal "statistical" inference of linear relationship for each of the bivariate relations, then I believe you are at risk of a Multiple Comparisons Problem. I leave it to others to provide a more authoritative discussion of that possibility.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This analysis tests for your questions, assuming gaussian independent errors :
proc glimmix data=sashelp.cars;
effect extraSplineEffect = spline(weight);
model horsepower =
weight /* Is there a linear relationship ? */
extraSplineEffect /* Is there more than just noise in the residuals ? */
/ htype=1; /* Test these questions sequentially */
run;
Interpretation, for this example: Yes, there is a linear trend relating horsepower to weight. And yes, there remains more than noise in the residuals once the linear trend is removed.