turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Correlation Vs regression

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2016 08:29 AM

Dear Team,

I work for a logistic & supply chain client. I am trying to run a linear regression model on freight as the dependent var and there are like 8-9 independent variables which add up to the total freight.

Now this might not be a true SAS procedural assistance but wanted some advice before i could start building the analysis.

While all these dep var are significant with the predictor variables hence does building a linear regression model require that some of the predictor variables have to be insignificant else if all are significant we can do a correlation analysis and see how the variables are related rather than following the entire process of regression.

Kindly advice as this is what i got to know from one of the analytics practice lead.

Accepted Solutions

Solution

03-15-2016
12:37 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2016 12:28 PM

Shivi82 wrote:

But i just wanted to understand that statistically do we need to have atleast one or few insignificant var to run a regression. I hope this helps.

No, you do not require an insignificant value to run a regression.

A variable that insignificant in a linear relationship may be significant in a model.

If you're using the analysis method of analyzing variables one by one and then including only those that are significant in a mode, that's not a good methodology. If you are using such a methodology, then the 'significance' level is also usually around 0.2, not 0.05.

The reason both Rick and I are questioning your data structure/definition is, if your data does sum to frieght, and the components are additive, then your data is not independent and linear regression analysis is not a good method for analysis.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2016 10:36 AM

Shivi82 wrote:

Dear Team,

there are like 8-9 independent variables which add up to the total freight.

What do you mean by that?

Did you have more variables that individually were not signicant with the predictor variable.

If you have 8 to 9 independent variables, you'll have to be cautious of multicollinearity as well.

But no, you can't just look at correlation, though it's a good thing to do.

When building a model you can use the GLMSELECT procedure to employ variable selection methods.

The first statistics course in SAS is free and covers regression and the analysis process.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2016 11:55 AM

Hello Rezza, Thanks for your quick reply.

Probably i did not explain correctly.

I do understand that correlation is a powerful measure however it cannot perform the analysis what regression does i.e. finding the line of best fit and finding a value based on independent var.

My question was ~ in my case i have 7 independent var and all have some significance to the dep (freight) var so doe we need to have any variable which is insignificant to run a regression . I have gone through the regression course on SAS freely available and it is very useful and apt.

Thank You.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2016 10:56 AM

Are you saying that you have variables like

x1 = weight of container

x2 = weight of pallet

x3 = weight of contents

...

and that the response is Y = TOTAL weight?

Can you provide more deatils and maybe some example observations?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2016 12:07 PM

Hello Rick,

My dependent variable is Total Freight and this freight has components such as freight, tax, gas price, and like so there are 4 other var which total up and make my total freight. these values keep on changing from one month to another.

I have data points for the last 12 months and make in excess of 4 million rows of data. When i ran the correlation analysis in SAS it gave me how these var were correlated with the outcome variable. Hence i wanted to run a regression model to predict at given values of these variables what my outcome value could look like keeping in account all the assumptions of linear regression and also performing the residual analysis.

But i just wanted to understand that statistically do we need to have atleast one or few insignificant var to run a regression. I hope this helps.

regards, Shivi

Solution

03-15-2016
12:37 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2016 12:28 PM

Shivi82 wrote:

But i just wanted to understand that statistically do we need to have atleast one or few insignificant var to run a regression. I hope this helps.

No, you do not require an insignificant value to run a regression.

A variable that insignificant in a linear relationship may be significant in a model.

If you're using the analysis method of analyzing variables one by one and then including only those that are significant in a mode, that's not a good methodology. If you are using such a methodology, then the 'significance' level is also usually around 0.2, not 0.05.

The reason both Rick and I are questioning your data structure/definition is, if your data does sum to frieght, and the components are additive, then your data is not independent and linear regression analysis is not a good method for analysis.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2016 12:42 PM

Thanks Rezza.

Your explaination below is very useful and it makes a lot of sense as all my values add up and make total freight. Hence i agree with you that Linear Regression would not be a great meaure in this circumstance.

Regards, Shivi