BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Shivi82
Quartz | Level 8

Dear Team,

 

I work for a logistic & supply chain client. I am trying to run a linear regression model on freight as the dependent var and there are like 8-9 independent variables which add up to the total freight.

Now this might not be a true SAS procedural assistance but wanted some advice before i could start building the analysis.

 

While all these dep var are significant with the predictor variables hence does building a linear regression model require that some of the predictor variables have to be insignificant else if all are significant we can do a correlation analysis and see how the variables are related rather than following the entire process of regression.

 

Kindly advice as this is what i got to know from one of the analytics practice lead.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

@Shivi82 wrote:

 

But i just wanted to understand that statistically do we need to have atleast one or few insignificant var to run a regression. I hope this helps.

 

No, you do not require an insignificant value to run a regression.

 

A variable that insignificant in a linear relationship may be significant in a model.

 

If you're using the analysis method of analyzing variables one by one and then including only those that are significant in a mode, that's not a good methodology. If you are using such a methodology, then the 'significance' level is also usually around 0.2, not 0.05.

 

The reason both Rick and I are questioning your data structure/definition is, if your data does sum to frieght, and the components are additive, then your data is not independent and linear regression analysis is not a good method for analysis.

 

 

View solution in original post

6 REPLIES 6
Reeza
Super User

@Shivi82 wrote:

Dear Team,

 

 there are like 8-9 independent variables which add up to the total freight.

 


 

What do you mean by that?

Did you have more variables that individually were not signicant with the predictor variable.

 

If you have 8 to 9 independent variables, you'll have to be cautious of multicollinearity as well.

But no, you can't just look at correlation, though it's a good thing to do. 

 

When building a model you can use the GLMSELECT procedure to employ variable selection methods.

 

The first statistics course in SAS is free and covers regression and the analysis process.

Shivi82
Quartz | Level 8

Hello Rezza, Thanks for your quick reply.

 

Probably i did not explain correctly.

 

I do understand that correlation is a powerful measure however it cannot perform the analysis what regression does i.e. finding the line of best fit and finding a value based on independent var.

 

My question was ~ in my case i have 7 independent var and all have some significance to the dep (freight) var so doe we need to have any variable which is insignificant to run a regression . I have gone through the regression course on SAS freely available and it is very useful and apt.

Thank You.

Rick_SAS
SAS Super FREQ

Are you saying that you have variables like

x1 = weight of container

x2 = weight of pallet

x3 = weight of contents

...

and that the response is Y = TOTAL weight?

 

Can you provide more deatils and maybe some example observations?

Shivi82
Quartz | Level 8

Hello Rick,

 

My dependent variable is Total Freight and this freight has components such as freight, tax, gas price, and like so there are 4 other var which total up and make my total freight. these values keep on changing from one month to another.

 

I have data points for the last 12 months and make in excess of 4 million rows of data. When i ran the correlation analysis in SAS it gave me how these var were correlated with the outcome variable. Hence i wanted to run a regression model to predict at given values of these variables what my outcome value could look like keeping in account all the assumptions of linear regression and also performing the residual analysis.

But i just wanted to understand that statistically do we need to have atleast one or few insignificant var to run a regression. I hope this helps.

 

regards, Shivi

Reeza
Super User

@Shivi82 wrote:

 

But i just wanted to understand that statistically do we need to have atleast one or few insignificant var to run a regression. I hope this helps.

 

No, you do not require an insignificant value to run a regression.

 

A variable that insignificant in a linear relationship may be significant in a model.

 

If you're using the analysis method of analyzing variables one by one and then including only those that are significant in a mode, that's not a good methodology. If you are using such a methodology, then the 'significance' level is also usually around 0.2, not 0.05.

 

The reason both Rick and I are questioning your data structure/definition is, if your data does sum to frieght, and the components are additive, then your data is not independent and linear regression analysis is not a good method for analysis.

 

 

Shivi82
Quartz | Level 8

Thanks Rezza.

Your explaination below is very useful and it makes a lot of sense as all my values add up and make total freight. Hence i agree with you that Linear Regression would not be a great meaure in this circumstance.

 

Regards, Shivi

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1751 views
  • 0 likes
  • 3 in conversation