BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
kpberger
Obsidian | Level 7

I have two continuous variables similar to the two below.

 

data cars; set sashelp.cars; keep horsepower weight; run;

How can I test if the relationship between the two is linear? As one might do before running a regression that assumes as much.

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

This analysis tests for your questions, assuming gaussian independent errors :

 

proc glimmix data=sashelp.cars;
effect extraSplineEffect = spline(weight);
model horsepower = 
    weight            /* Is there a linear relationship ? */
    extraSplineEffect /* Is there more than just noise in the residuals ? */
    / htype=1;        /* Test these questions sequentially */
run;

PGStats_0-1652466625803.png

 

Interpretation, for this example: Yes, there is a linear trend relating horsepower to weight. And yes, there remains more than noise in the residuals once the linear trend is removed.

PG

View solution in original post

12 REPLIES 12
ballardw
Super User

Old school is to plot the data such as :

proc sgplot data=sashelp.cars;
  scatter x=weight y=horsepower;
run;

If there is a strong or even medium strength relationship your eye will tell what form is likely.

The funnel shape from this data indicates something interesting may be going on. So:

proc sgplot data=sashelp.cars;
  scatter x=weight y=horsepower/ group=type;
run;

And it looks like there may be a "better" linear relationship between weight and horsepower when vehicle Type is considered.

PaigeMiller
Diamond | Level 26

@kpberger wrote:

How can I test if the relationship between the two is linear? As one might do before running a regression that assumes as much.


As @ballardw said, you can always plot the raw data.

 

The linear regression itself is a form of test. There's really no need to perform a formal test for linearity before fitting a linear regression. If you assume it is linear, then fit the regression, plot the residuals. The advantage of plotting the residuals (as compared to plotting the raw data) is that deviations from linearity can be easier to spot in the residuals.

--
Paige Miller
kpberger
Obsidian | Level 7

Can use proc gam to test this? Looking at the p-values on the Analysis of Deviance table. For example, if I am trying to predict horsepower using weight:

proc gam data=sashelp.cars plots=components(clm commonaxes); model horsepower=spline(weight); run;

The p-value for spline(weight) is <0.0001 - is this referring to a rejection of the hypothesis that the relationship is linear/the spline is necessary? I can not tell from documentation/a day of internet searching.

PaigeMiller
Diamond | Level 26

Honestly, you have ignored all of our advice to plot the data. Had you done so, I think the answer is relatively clear.

--
Paige Miller
kpberger
Obsidian | Level 7

Your assumption is incorrect. I have plotted the data. It is still not clear to me whether the relationships are linear.

 

Edit: I clearly asked in the question if there was a test for this. If there is no test, the correct answer to my question is just "there is no test".

PaigeMiller
Diamond | Level 26

After looking at the plot, I can't imagine what other model form (quadratic, spline, non-linear, etc.) would be more appropriate for this data. What do you think is a better model if not linear?

--
Paige Miller
kpberger
Obsidian | Level 7

The data I am using is not actually from a SAS help file. I offered that as an example to aid responses in being specific with code. The unshareable data I am using involves 30+ comparisons and the relationships are not clear.

PaigeMiller
Diamond | Level 26

Show us a scatterplot, with phony variable names.

--
Paige Miller
Reeza
Super User

I repeat, correlation, specifically PROC CORR.


See this blog post.

https://blogs.sas.com/content/iml/2011/08/26/visualizing-correlations-between-variables-in-sas.html

 

Visually examine the graphs and/or look at the correlation matrix.

mkeintz
PROC Star

@kpberger wrote:

The data I am using is not actually from a SAS help file. I offered that as an example to aid responses in being specific with code. The unshareable data I am using involves 30+ comparisons and the relationships are not clear.


"30+ comparisons"?

 

Are you suggesting 9 or more variables?   It would take 9 variables to generate 36 unique  correlations (8 to generate 28).  That might change things, since you would be doing multiple comparisons.  If you intend some formal "statistical" inference of linear relationship for each of the bivariate relations, then I believe you are at risk of a Multiple Comparisons Problem.  I leave it to others to provide a more authoritative discussion of that possibility.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PGStats
Opal | Level 21

This analysis tests for your questions, assuming gaussian independent errors :

 

proc glimmix data=sashelp.cars;
effect extraSplineEffect = spline(weight);
model horsepower = 
    weight            /* Is there a linear relationship ? */
    extraSplineEffect /* Is there more than just noise in the residuals ? */
    / htype=1;        /* Test these questions sequentially */
run;

PGStats_0-1652466625803.png

 

Interpretation, for this example: Yes, there is a linear trend relating horsepower to weight. And yes, there remains more than noise in the residuals once the linear trend is removed.

PG

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 2211 views
  • 4 likes
  • 6 in conversation