Programming the statistical procedures from SAS

Correlation Vs regression

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 92
Accepted Solution

Correlation Vs regression

Dear Team,

 

I work for a logistic & supply chain client. I am trying to run a linear regression model on freight as the dependent var and there are like 8-9 independent variables which add up to the total freight.

Now this might not be a true SAS procedural assistance but wanted some advice before i could start building the analysis.

 

While all these dep var are significant with the predictor variables hence does building a linear regression model require that some of the predictor variables have to be insignificant else if all are significant we can do a correlation analysis and see how the variables are related rather than following the entire process of regression.

 

Kindly advice as this is what i got to know from one of the analytics practice lead.


Accepted Solutions
Solution
‎03-15-2016 12:37 PM
Grand Advisor
Posts: 16,893

Re: Correlation Vs regression


Shivi82 wrote:

 

But i just wanted to understand that statistically do we need to have atleast one or few insignificant var to run a regression. I hope this helps.

 

No, you do not require an insignificant value to run a regression.

 

A variable that insignificant in a linear relationship may be significant in a model.

 

If you're using the analysis method of analyzing variables one by one and then including only those that are significant in a mode, that's not a good methodology. If you are using such a methodology, then the 'significance' level is also usually around 0.2, not 0.05.

 

The reason both Rick and I are questioning your data structure/definition is, if your data does sum to frieght, and the components are additive, then your data is not independent and linear regression analysis is not a good method for analysis.

 

 

View solution in original post


All Replies
Grand Advisor
Posts: 16,893

Re: Correlation Vs regression


Shivi82 wrote:

Dear Team,

 

 there are like 8-9 independent variables which add up to the total freight.

 


 

What do you mean by that?

Did you have more variables that individually were not signicant with the predictor variable.

 

If you have 8 to 9 independent variables, you'll have to be cautious of multicollinearity as well.

But no, you can't just look at correlation, though it's a good thing to do. 

 

When building a model you can use the GLMSELECT procedure to employ variable selection methods.

 

The first statistics course in SAS is free and covers regression and the analysis process.

Frequent Contributor
Posts: 92

Re: Correlation Vs regression

Hello Rezza, Thanks for your quick reply.

 

Probably i did not explain correctly.

 

I do understand that correlation is a powerful measure however it cannot perform the analysis what regression does i.e. finding the line of best fit and finding a value based on independent var.

 

My question was ~ in my case i have 7 independent var and all have some significance to the dep (freight) var so doe we need to have any variable which is insignificant to run a regression . I have gone through the regression course on SAS freely available and it is very useful and apt.

Thank You.

SAS Super FREQ
Posts: 3,309

Re: Correlation Vs regression

Are you saying that you have variables like

x1 = weight of container

x2 = weight of pallet

x3 = weight of contents

...

and that the response is Y = TOTAL weight?

 

Can you provide more deatils and maybe some example observations?

Frequent Contributor
Posts: 92

Re: Correlation Vs regression

Hello Rick,

 

My dependent variable is Total Freight and this freight has components such as freight, tax, gas price, and like so there are 4 other var which total up and make my total freight. these values keep on changing from one month to another.

 

I have data points for the last 12 months and make in excess of 4 million rows of data. When i ran the correlation analysis in SAS it gave me how these var were correlated with the outcome variable. Hence i wanted to run a regression model to predict at given values of these variables what my outcome value could look like keeping in account all the assumptions of linear regression and also performing the residual analysis.

But i just wanted to understand that statistically do we need to have atleast one or few insignificant var to run a regression. I hope this helps.

 

regards, Shivi

Solution
‎03-15-2016 12:37 PM
Grand Advisor
Posts: 16,893

Re: Correlation Vs regression


Shivi82 wrote:

 

But i just wanted to understand that statistically do we need to have atleast one or few insignificant var to run a regression. I hope this helps.

 

No, you do not require an insignificant value to run a regression.

 

A variable that insignificant in a linear relationship may be significant in a model.

 

If you're using the analysis method of analyzing variables one by one and then including only those that are significant in a mode, that's not a good methodology. If you are using such a methodology, then the 'significance' level is also usually around 0.2, not 0.05.

 

The reason both Rick and I are questioning your data structure/definition is, if your data does sum to frieght, and the components are additive, then your data is not independent and linear regression analysis is not a good method for analysis.

 

 

Frequent Contributor
Posts: 92

Re: Correlation Vs regression

Thanks Rezza.

Your explaination below is very useful and it makes a lot of sense as all my values add up and make total freight. Hence i agree with you that Linear Regression would not be a great meaure in this circumstance.

 

Regards, Shivi

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 332 views
  • 0 likes
  • 3 in conversation