## Help on SAS output

Occasional Contributor
Posts: 5

# Help on SAS output

help on SAS Output

Hi everyone, new to the community and looking for some help.

I'm currently in the process of writing my thesis and having some difficulties with interpretation of my regression analysis. My mentor has just sent me the result of regression analysis and I have no idea, from where I am supposed to begin. I have some idea about p-value and how one can interpret collinearity diagnostics. However, I have no idea how i can go further with F-Statistics or I don't know whether I really need them. I attached the results, I would be very appreciated any feedback and offer you're able to offer. You can also also offer some books or papers, which are useful for interpretation.

Contributor
Posts: 45

## Re: Help on SAS output

Hi,

You appear to be showing the result of a multiple linear regression using the divm1 and ci1 as predictors and with MODEL1 as the dependent outcome variable.

The F - Statistic is given by the Mean Square Model (average variability given by model) divided by the Mean Square Error (average variability unexplained by the model).

If the F statistic is much larger than 1 (yours is much larger at 434), there is a low probability of this happening by accident, and we can say with more confidence that the model explains the variability we see between your predictors (divm1 and ci1) and your dependent variable MODEL1 than just guessing randomly.

As your F-statistic is 434, SAS has calculated that the chances of this happening randomly when there is no actual relation between divm1, ci1 and your regressor are less than 0.001%. This will pass pretty much any statistical threshold out there: Your inputs do explain at least some of the variability seen in your output, and are probably better than guessing randomly.

The adjusted R-Square is at 69% so we can say that the predictors divm1 and ci1 account for roughly 69% of the variability seen within your dependent variable.

In the parameter section you can see a breakdown for each component of your model.

Intercept - This is the new 'average' that is assigned to all cases.

dixm1 - The parameter estimate is 0.71799, meaning that when the dixm1 characteristic for a case increases by 1, our guess for the value of the dependent variable increases by 0.71799. The T-Test is similar to the F-test in that is tests the hypothesis of whether using the estimate of 0.71799 for divm1 is statistically superior to just guessing (using the sample average only). in this case we can see that we can say with more than 99.99% certainty that using divm1 as a predictor with the value  0.71799 is better than nothing.

ci1 - Same as above, for each unit increase in ci1  we can expect a 0.02188 increase in out outcome variable. We can say that this relationship actually exists with a certainty of 99.16%.

It is a good idea to set your significance threshold prior to the regression, It is common practice to give SAS a set alpha value so that is knows which predictors are significant. The default is alpha=0.05, meaning that when we are 95%+ sure of something we can keep it in our model.

Occasional Contributor
Posts: 5

## Re: Help on SAS output

I appreciate you taking the time to answer. But could you please explain how you reach 99.99% certainty by saying "in this case we can see that we can say with more than 99.99% certainty" referring to T-test. Actually, in this case t-value is 28.39

Do you have any idea about scatter residual vs predicted plots interpretation?

Indeed, analysis was done by my mentor, so I don't know which significance threshold she used. But before I write my report, I ask her.

Regular Contributor
Posts: 152

## Re: Help on SAS output

Occasional Contributor
Posts: 5

## Re: Help on SAS output

"In your first model, these plots resemble more a fan opening to the right, indictating that the variance may change with increasing size of the predicted value (that is, heteroskedasticity).Two large predicted values have very large negative studentized residuals, and several other predicted values with studentized residuals either larger or smaller than about two standard errors from zero, indicating potential outliers. " Did you mean with this sentences that the plots indicating inconclusive results? Actually interpretation of residuals a little bit complicated to me.

Regular Contributor
Posts: 152

## Re: Help on SAS output

No, these plots and findings do NOT indicate inconclusive results.  They indicate that the assumptions of ordinary least squares regression may not be fully met with your data.  The outlying observations should be checked to find out why they may be outlying (for example, incorrectly recorded values).  The influential observations should be checked to determine whether their omission from the analysis greatly affects the regression results.  The changing variance of the residual patterns with the increasing size of the predicted value (heteroskedasticity) may indicate that the dependent variable should be transformed or that the observations should be weighted unequally (two possible "fixes" for heteroskedasticity.  Most good regression textbooks describe how to interpret residual plots and how to fix possible problems indicated by such plots; read them.

Occasional Contributor
Posts: 5

## Re: Help on SAS output

Can you give me the name of the books? Today I had a quick look on the "Little SAS Book". But it was more about SAS programming and data analysis. I think I need more statistical foundations of econometric modelling.

Regular Contributor
Posts: 152

## Re: Help on SAS output

Look at textbooks on regression in statistics.  You can find a brief discussion of residuals in the SAS documentation for PROC REG.  You can search for SAS books on regression at the SAS publication catalog:  https://support.sas.com/pubscat/complete.jsp.  You can search for regression books from other publishers at amazon.com or other online booksellers.  Finally, search on Google for articles about "regression residuals".

Contributor
Posts: 45

## Re: Help on SAS output

The t-statistic is given by dividing the parameter estimate by the standard error.

- If the parameter estimate is high (strong predictor) and the standard error is low (consistent predictor) then our t-statistic will be large.

- However if the parameter estimate is low (weak predictor) and the standard error is high (inconsistent predictor) then our t-statistic will be small.

We can see how having strong predictors with relatively small inconsistencies will make a predictor more desirable to a statistician.

If we assume there is no actual connection between divm1 and MODEL1, and the value of one has no affect upon the value of the other, there must be some chance that our 384 observations would indicate that there was a strong relation between them (Parameter estimate = 0.71799) with relatively few inconsistencies (standard error = 0.02529). We would expect the existence of a well-fitting model  appearing by chance to be very small, and to get smaller if more observations with similar trends were collected.

I have tried to find an explicit equation for how the specific probability is calculated, but Statistically it does not matter what the actual Pr>|t| value is, so long as it is below 0.05, the value that we have set at the beginning of the process. I have found a few sites outlining the procedure for finding confidence intervals for t-values but cannot find anything that matches the context of parameter estimates and my calculations appear inconsistent.

Would any other community members know exactly how the Pr>|t| is calculated or how confidence intervals are set?

Regular Contributor
Posts: 152

## Re: Help on SAS output

The description of the SAS PROBT function in the SAS Language Reference shows how Pr>|t| is calculated from the calculated t-statistic and the number of error degrees of freedom.  For a given statistical significance level, alpha, and a given number of error degrees of freedom, one can calculate the corresponding t-statistic using the SAS TINV function.  With 384 observations, this two-tailed t-statistic is close to the corresponding z-statistic (~= 1.96).  Therefore, you can use 0.71799 +/- 1.96*0.02429 to calculate the two-tailed 95% confidence interval for this regression coefficient.

Contributor
Posts: 45

## Re: Help on SAS output

Thanks for that 1zmm,

0.71799 +/- 1.96*0.02429 are our 95% confidence limits for our parameter estimate.

This relates to an interval of (0.670382, 0.765598). Because zero is not included in this interval we can say that we are at least 95% confident that the real parameter estimate is not in fact zero. An alternative wording for this is that if we repeated this study again and again with more observations that were independently collected, we would expect 95% of them to yeild a parameter estimate within the (0.670382, 0.765598) range.

Occasional Contributor
Posts: 5