Solved: Re: Wald Chi Square statistics - Logistic Regression

vishal_prof_gmail_com · Posted 07-07-2019 02:35 AM

Hi,

I have doubt in Logistic regression. The significance of variables is tested using Wald chi square statistics and corresponding p- value.

Wald Chi Square Statistisc = (Estimate / Std Error)^2

The null hypothesis is tested using Chi Square distribution. I am not clear why we use Chi Square and not t-statistics like in Linear regression. I know that estimation technique in Logistic regression is maximum likelihood and in linear regression it is OLS, how does this affects the choice of distribution for testing significance of variable.

Thanks,

Vishal

Rick_SAS · Posted 07-08-2019 10:11 AM

The description of the hypothesis test for the regression coefficients is explained in the documentation.

You can compare it to the hypothesis test for the linear regression.

Briefly, the LOGISTIC procedure tests the quadratic form directly, which is distributed as chi-square.

The REG procedure puts the quadratic form in the numerator and puts a sample variance statistic in the denominator and tests the ratio. The ratio of two chi-square RVs is an F, which explains the difference.

View solution in original post

koyelghosh · Posted 07-07-2019 06:49 AM

@vishal_prof_gmail_com

(1) Logistic regression is a case where the outputs are discrete (mostly there are two outcomes as in binary logistic regression problems). However in linear regression the outcome is continuous and can take any value. The former is frequency dependent while the later is mean dependent comparisons.

(2) You use Chi-square statistics when the observations are coming from a Chi-square distribution while you use t-statistics when the observations are coming from a t-distribution.

Probably these are the reason why we use different statistics depending upon the problem at hand.

Is this what you wanted to know?

koyelghosh · Posted 07-08-2019 01:31 AM

Thanks koyelghosh,

Agree to your points target is continuous in Linear and discrete in Logistics. I would like to add few points here:

1. Chi Square statistics = ((Beta - 0)/ Std error)^2, here beta is the coefficient which we are testing against the null hypothesis that it is 0. The part of formula (Beta - 0)/ Std error), is same as for t-statistics. I agree to the point that target variable is discrete , however Beta is coming from a population which is continuous (can be -ve/+ve) that's why it is standardized. Why don't we then compare to a t distribution, rather than squaring it and then comparing it to Chi square distribution (which is the square of a random number).

What you are saying in the point 'You use Chi-square statistics when the observations are coming from a Chi-square distribution while you use t-statistics when the observations are coming from a t-distribution.' is true for target variable , however we are checking the coefficients which not necessarily maybe from a Chi-square distribution.

2. Even in Logistic regression the target variable is transformed using Logit function in to a continuous variable (-infinity to infinity). It is actually a generalized linear model.So , why cannot a t-distribution be used for checking the significance of coefficient.

Thanks,

Vishal

PaigeMiller · Posted 07-08-2019 06:43 AM

The square of a continuous and normally distributed statistic is distributed as Chi-Squared.

--
Paige Miller

koyelghosh · Posted 07-08-2019 12:01 PM

@vishal_prof_gmail_com . Sorry I could not reply earlier. Had a busy day. Rick_SAS and PaigeMiller have already given you the answer. I have nothing new to add. I am answering only because the question was referring to me.

1. Chi Square statistics = ((Beta - 0)/ Std error)^2, here beta is the coefficient which we are testing against the null hypothesis that it is 0. The part of formula (Beta - 0)/ Std error), is same as for t-statistics. I agree to the point that target variable is discrete , however Beta is coming from a population which is continuous (can be -ve/+ve) that's why it is standardized. Why don't we then compare to a t distribution, rather than squaring it and then comparing it to Chi square distribution (which is the square of a random number).

You are right when you say that X^2 is (Beta/Std.Error)^2 and it looks very much like t-statistics, except for the square term. So much so that square root of X^2 is also called Psuedo t-ratio (see here)! But why Pesudo when they actually look very similar and why can't one use t-statistics for logisitic regression? You have an excellent question!

My answer would go like this.

The assumptions about the population should go before we carry out any statistical procedure. I don't think it is a good idea to reverse the flow of logic (that is carry out a test first and then do assumptions about population later. I guess that is how the statistics work). The assumption with t-statistics is that the population is approximately normal looking (which is actually t-distribution) and the population parameters are unknown. However when you are doing Logistic regression, the population parameters are known (actually it can be shown that variance and its related mean are known). For example in binary logistic regression, the expected value E(Y) = n*p and Var(Y) = n*p*(1-p), where n=number of data-points, p=probability of success (in case of coin flip for example it is 0.5 but it can be anything between 0 and 1). This seemingly simple difference put different constraints on the tests that we can carry out.

What you are saying in the point 'You use Chi-square statistics when the observations are coming from a Chi-square distribution while you use t-statistics when the observations are coming from a t-distribution.' is true for target variable , however we are checking the coefficients which not necessarily maybe from a Chi-square distribution.

and

2. Even in Logistic regression the target variable is transformed using Logit function in to a continuous variable (-infinity to infinity). It is actually a generalized linear model.

The transformation that you are talking about helps us to fit and visualize the data. To begin with we do not have a continuous data. As in binary logisitic regression, it is only 0 or 1 (or dead/alive, cancer/not cancer, loan defaulted/not defaulted etc.). Transformations helps us to understand and predict but it does not alter the underlying original data and thus the distribution. Coefficients are associated with that transformation.

I tried to keep things simple but I am sorry if I made it confusing rather.

Best wishes,

chatt_deb · Posted 08-15-2020 01:32 PM

Hi

I had a question related to this and I really appreciate your help.

I have a situation where I have 2 equations

Y1 = B1 + B1X + ....

Y2 = B2 + B2X+ .....

as you note the dependent variables are different (Y1 and Y2) while the independent variables in question (X) is the same. I have run regressions and have their respective test statistics and betas for the 2 equations

Here, I want to claim that B1X is significantly different from B2X in the second equation and after going through this post, I am using the formula

Chi Sq = [(B1X - B2X) / (S.E.1 - S.E.2)] ^2

Is this correct? If you could direct me to a paper or book that talks about this, I would really appreciate that. Thanks a lot!

StatDave · Posted 07-08-2019 04:11 PM

You could use the square root of the Wald statistic and produce a t test if desired. The result is the same for the 1 df parameter tests. In fact, PROC HPLOGISTIC uses t tests instead of the equivalent Wald chi-square test. The DDFM= option in the MODEL statement of that procedure can be used to control the degrees of freedom for the t test.

chatt_deb · Posted 08-15-2020 01:35 PM

Hi Vishal,

I had a question related to this and I really appreciate your help.

I have a situation where I have 2 equations

Y1 = B1 + B1X + ....

Y2 = B2 + B2X+ .....

as you note the dependent variables are different (Y1 and Y2) while the independent variables in question (X) is the same. I have run regressions and have their respective test statistics and betas for the 2 equations

Here, I want to claim that B1X is significantly different from B2X in the second equation and after going through this post, I am using the formula

Chi Sq = [(B1X - B2X) / (S.E.1 - S.E.2)] ^2

Is this correct? If you could direct me to a paper or book that talks about this, I would really appreciate that. Thanks a lot!

Rick_SAS · Posted 07-08-2019 10:11 AM

The description of the hypothesis test for the regression coefficients is explained in the documentation.

You can compare it to the hypothesis test for the linear regression.

Briefly, the LOGISTIC procedure tests the quadratic form directly, which is distributed as chi-square.

The REG procedure puts the quadratic form in the numerator and puts a sample variance statistic in the denominator and tests the ratio. The ratio of two chi-square RVs is an F, which explains the difference.

JacobSimonsen · Posted 07-09-2019 08:02 AM

You can use wald statistics, and likelihood ratio test that have asymptotically chi-squared distributions in linear regression. But, when data is normal distributed, then it is possible to use the exact distributions (not relying on asymptotic results). Therefore, you use t-statistics and F-test in linear regression as it is more exact. Actually, if you use proc genmod instead of proc glm/proc mixed for normal distributed data then you will get the wald and chi-square statistics.

In logistic regression it is not possible (or in best case very difficult) to find test statistics with a known exact distribution, therefore you use chi-square and wald statistics because then you at least know their asymptotic distribution. And actually, n doesnt need to be very large before the chi-square statistic are practically indistinguishable from a χ²distribution.

vishal_prof_gmail_com · Posted 07-10-2019 12:36 AM

Thanks @JacobSimonsen @koyelghosh @Rick_SAS @StatDave @PaigeMiller

Agree to the point that in Linear regression the target variable is continuous and we use F statistics while in Logistic Regression the target variable is binary and we use Chi Square distribution. This is about testing the significance of the model, wherein we compare a Null model and a model with covariates.

However my question is about the significance of model coefficients. Let me put it this way,

Consider and Logistic regression mode:

Log[(1-p)/p] = Intercept + B1X1 + B2X2 + ERROR.

In this model p is the probability of an event (say Loan default). Event will have binary values. Now for testing the significance of the model we Chi- Square ratio , Likelihood ratio. The coefficient X1 can have any value from -infinity to +infinity, we can say it is coming from a continuous population, then why cannot I use a t-distribution for testing it. To make my question more clear, consider a Linear model:

Y = Intercept +B1X1 + B2X2 + ERROR

The difference between these two models is the target variable and method of finding the coefficients. For Linear it is OLS,while Logistic it is MLE. The values of B1, B2 have same distribution for Linear and Logistic, so why T distribution for Linear and Chi Square for Logistic.

I think the answer lies in the distribution of target variable which we are talking about, we need to frame it more objectively. Infact how the distribution of model coefficients will by impacted by target variable or approach(OLS/MLE) needs to be answered.

Regards,

Vishal

PaigeMiller · Posted 07-10-2019 06:32 AM

@vishal_prof_gmail_com wrote:

The coefficient X1 can have any value from -infinity to +infinity, we can say it is coming from a continuous population, then why cannot I use a t-distribution for testing it.

Because it is not a t-distribution when the response is binary. As stated by @StatDave, if you take the square root of the statistic, then you have a t-distribution, if that's what you really want.

There are probably text books and web sites that go through the mathematics of this whole thing, those are better places to look for answers.

--
Paige Miller

vishal_prof_gmail_com · Posted 07-14-2019 11:28 AM

"Because it is not a t-distribution when the response is binary.."

In real world, a variable never follows a t-distribution. The variable is standardized and then compared to a t-distribution. I think that's not the right justification.

" As stated by @StatDave_sas, if you take the square root of the statistic, then you have a t-distribution, if that's what you really want". This is not what I am asking. My question is , why are we using square of [(x-0)/Std error]. How is this derived?

"There are probably text books and web sites that go through the mathematics of this whole thing, those are better places to look for answers."

I have been trying to find the answer to this question since quite some time. What you have mentioned above is the answer I have received mostly. People generally answer by using complicated terms like asymptotic Chi-square distribution , Pseudo statistics, etc.

If you can forward refer some book/website which may help finding this answer, it would be very helpfull.

Thanks.

vishal_prof_gmail_com · Posted 07-10-2019 12:41 AM

To make the question more objective.

In Logistic regression the estimates (coefficients) should be such that they maximize the likelihood of observing the data. Then how do we conclude that they will be coming from a Chi Square distribution and the Chi Square statistic would be ((Beta - 0 )/ Std Error)^2.

RosieSAS · Posted 07-10-2019 10:25 AM

The estimates and test of coefficients depend on the distribution of the response variable, not the distribution of the coefficients themselves. Or say the distribution of the coefficients depend on the distribution of the response variable. We can't say the coefficient is continuous, then t-test can be used. If the response variable is a binary variable, then after the logit transformation, the Wald Chi Square statistic of the coefficient follows an asymptotic chi-square distribution, then we use this inference to test the significance of the coefficient or its CI. This is my understanding.

SAS Innovate 2025: Call for Content