Solved: Help: why there are different results (Logistic Regression) from PROC ...

Pre-app · Posted 11-24-2017 10:57 AM

Hi,

I am using SAS Enterprise Miner and SAS Enterprise Guide to perform Logistic Regression upon the same dataset.

However, I get different set of output statistics from PROC DMREG (for Data Mining needs) in SAS EM compared to PROC LOGISTIC in SAS EG.

Which set of results is better? Is there any reference I can use to identify better results?

DougWielenga · Posted 12-05-2017 12:21 PM

Please note that the direct use of the DMREG procedure is not supported by SAS Technical Support. There is, however. documentation available on request to licensed users of SAS Enterprise Miner. An excerpt from the documentation for DMREG explains these potential differences:

/*** BEGIN EXCERPT ***/

The DMREG and LOGISTIC procedures fit the same models for a categorical target. Both procedures have the CLASS statement to specify categorical input variables and both use the deviation from the mean coding as the default parameterization for a CLASS input variable. However, there are many differences between the two procedures, both in syntax and in features. For example, to specify the GLM parameterization of CLASS variables, you specify the MODEL statement option CODING= GLM in the DMREG procedure. But, in the LOGISTIC procedure , you specify the CLASS statement option PARAM= GLM. You are required to specify a DMDB catalog of input data in the DMREG procedure, but not in the LOGISTIC procedure. The DMREG procedure produces DATA step scoring code, but the LOGISTIC procedure does not. In terms of training a model, you might expect the estimates from both procedures to be identical. Often the estimates between the two procedures are very close but not necessarily identical for a number of reasons. The DMREG and LOGISTIC procedures do not use the same routines to carry out the optimization, and the convergence criterion and optimization technique used might not be the same. However, discrepancies of the parameter estimates between the two procedures would not make any difference in prediction.

/*** END EXCERPT ***/

In short, differences in how categorical effects are coded and differences in optimization algorithms as well as collinearity among any of the predictors might lead to slightly different parameter estimates but these should result in minimal difference in the predicted values. The GLM coding scheme makes exponentiating the parameter a meaningful value but this is not true for the default deviation coding used by DMREG since this compares each level to the average, not to a 'base' level. One other thing, SAS Enterprise Miner will choose the overall average for the predicted value for the Regression node for any observation with missing values, while these observations will be completely dropped by the LOGISTIC procedure.

Hope this helps!

Doug

View solution in original post

Astounding · Posted 11-25-2017 08:32 PM

I confess, I'm going by a hazy recollection of a general impression here.

Doesn't Enterprise Miner automatically sample the data and perform its analysis on the sample? That's at least an issue to verify and would explain why the results might be different.

Pre-app · Posted 11-27-2017 11:33 AM

Thank you Astounding.

The 2 procedures use the same dataset for comparison. No sampling is applied.

For example,

* For PROC LOGISTIC (also can use node [SAS Code] in SAS EM which would generate the same results as SAS EG), get:

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -4.2280 0.1005 1768.1503 <.0001

F_HV 1 -0.4514 0.0447 102.1930 <.0001

F_REV 1 1.1634 0.0399 851.3617 <.0001

F_ACTIVE0 1 0.6207 0.0395 246.4798 <.0001

(The results for other 15 parameters almost match those from PROC DMREG.)

Odds Ratio Estimates

Point 95% Wald

Effect Estimate Confidence Limits

F_HV 0.637 0.583 0.695

F_REV 3.201 2.960 3.461

F_ACTIVE0 1.860 1.721 2.010

* For PROC DMREG (use node [Regression] in SAS EM), get:

Analysis of Maximum Likelihood Estimates

Standard Wald Standardized

Parameter DF Estimate Error Chi-Square Pr > ChiSq Estimate Exp(Est)

Intercept 1 -3.5620 0.1009 1247.02 <.0001 0.028

F_ACTIVE0 0 1 -0.3103 0.0198 246.48 <.0001 0.733

F_REV 0 1 -0.5817 0.0199 851.36 <.0001 0.559

F_HV 0 1 0.2257 0.0223 102.19 <.0001 1.253

(The results for other 15 parameters almost match those from PROC LOGISTIC.)

Odds Ratio Estimates

Point

Effect Estimate

F_ACTIVE0 0 vs 1 0.538

F_REV 0 vs 1 0.312

F_HV 0 vs 1 1.570

Other 15 parameters almost have the same results of output statistics for both PROC LOGISTIC and PROC DMREG. However, the above 4 parameters (including Intercept) have different results.

One more thing, using PROC DMREG, why the statistics of 3 parameters for Exp (Est) are not equal to those for Point Estimate? (But other 15 parameters are equal.)

DougWielenga · Posted 12-05-2017 12:21 PM

Please note that the direct use of the DMREG procedure is not supported by SAS Technical Support. There is, however. documentation available on request to licensed users of SAS Enterprise Miner. An excerpt from the documentation for DMREG explains these potential differences:

/*** BEGIN EXCERPT ***/

The DMREG and LOGISTIC procedures fit the same models for a categorical target. Both procedures have the CLASS statement to specify categorical input variables and both use the deviation from the mean coding as the default parameterization for a CLASS input variable. However, there are many differences between the two procedures, both in syntax and in features. For example, to specify the GLM parameterization of CLASS variables, you specify the MODEL statement option CODING= GLM in the DMREG procedure. But, in the LOGISTIC procedure , you specify the CLASS statement option PARAM= GLM. You are required to specify a DMDB catalog of input data in the DMREG procedure, but not in the LOGISTIC procedure. The DMREG procedure produces DATA step scoring code, but the LOGISTIC procedure does not. In terms of training a model, you might expect the estimates from both procedures to be identical. Often the estimates between the two procedures are very close but not necessarily identical for a number of reasons. The DMREG and LOGISTIC procedures do not use the same routines to carry out the optimization, and the convergence criterion and optimization technique used might not be the same. However, discrepancies of the parameter estimates between the two procedures would not make any difference in prediction.

/*** END EXCERPT ***/

In short, differences in how categorical effects are coded and differences in optimization algorithms as well as collinearity among any of the predictors might lead to slightly different parameter estimates but these should result in minimal difference in the predicted values. The GLM coding scheme makes exponentiating the parameter a meaningful value but this is not true for the default deviation coding used by DMREG since this compares each level to the average, not to a 'base' level. One other thing, SAS Enterprise Miner will choose the overall average for the predicted value for the Regression node for any observation with missing values, while these observations will be completely dropped by the LOGISTIC procedure.

Hope this helps!

Doug

Help: why there are different results (Logistic Regression) from PROC DMREG vs PROC LOGISTIC?

Re: Help: why there are different results (Logistic Regression) from PROC DMREG vs PROC LOGISTIC?

Re: Help: why there are different results (Logistic Regression) from PROC DMREG vs PROC LOGISTIC?

Re: Help: why there are different results (Logistic Regression) from PROC DMREG vs PROC LOGISTIC?

Re: Help: why there are different results (Logistic Regression) from PROC DMREG vs PROC LOGISTIC?