BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Pre-app
Calcite | Level 5

Hi,

 

I am using SAS Enterprise Miner and SAS Enterprise Guide to perform Logistic Regression upon the same dataset.

 

However, I get different set of output statistics from PROC DMREG (for Data Mining needs) in SAS EM compared to PROC LOGISTIC in SAS EG.

 

Which set of results is better? Is there any reference I can use to identify better results?

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

Please note that the direct use of the DMREG procedure is not supported by SAS Technical Support.  There is, however. documentation available on request to licensed users of SAS Enterprise Miner.  An excerpt from the documentation for DMREG explains these potential differences:

 

/*** BEGIN EXCERPT ***/

 

The DMREG and LOGISTIC procedures fit the same models for a categorical target. Both procedures have the CLASS statement to specify categorical input variables and both use the deviation from the mean coding as the default parameterization for a CLASS input variable. However, there are many differences between the two procedures, both in syntax and in features. For example, to specify the GLM parameterization of CLASS variables, you specify the MODEL statement option CODING= GLM in the DMREG procedure. But, in the LOGISTIC procedure , you specify the CLASS statement option PARAM= GLM. You are required to specify a DMDB catalog of input data in the DMREG procedure, but not in the LOGISTIC procedure. The DMREG procedure produces DATA step scoring code, but the LOGISTIC procedure does not. In terms of training a model, you might expect the estimates from both procedures to be identical. Often the estimates between the two procedures are very close but not necessarily identical for a number of reasons. The DMREG and LOGISTIC procedures do not use the same routines to carry out the optimization, and the convergence criterion and optimization technique used might not be the same. However, discrepancies of the parameter estimates between the two procedures would not make any difference in prediction.

 

/*** END EXCERPT ***/

 

In short, differences in how categorical effects are coded and differences in optimization algorithms as well as collinearity among any of the predictors might lead to slightly different parameter estimates but these should result in minimal difference in the predicted values.  The GLM coding scheme makes exponentiating the parameter a meaningful value but this is not true for the default deviation coding used by DMREG since this compares each level to the average, not to a 'base' level. One other thing, SAS Enterprise Miner will choose the overall average for the predicted value for the Regression node for any observation with missing values, while these observations will be completely dropped by the LOGISTIC procedure. 


Hope this helps!

Doug 

View solution in original post

3 REPLIES 3
Astounding
PROC Star

I confess, I'm going by a hazy recollection of a general impression here.

 

Doesn't Enterprise Miner automatically sample the data and perform its analysis on the sample?  That's at least an issue to verify and would explain why the results might be different.

Pre-app
Calcite | Level 5

Thank you Astounding.

 

The 2 procedures use the same dataset for comparison. No sampling is applied.

 

For example,

* For PROC LOGISTIC (also can use node [SAS Code] in SAS EM which would generate the same results as SAS EG), get:

                        Analysis of Maximum Likelihood Estimates

 

                                                     Standard          Wald

Parameter                         DF    Estimate       Error    Chi-Square    Pr > ChiSq

 

Intercept                          1     -4.2280      0.1005     1768.1503        <.0001

F_HV                               1     -0.4514      0.0447      102.1930        <.0001

F_REV                              1      1.1634      0.0399      851.3617        <.0001

F_ACTIVE0                          1      0.6207      0.0395      246.4798        <.0001

(The results for other 15 parameters almost match those from PROC DMREG.)

 

 

                                Odds Ratio Estimates

 

                                                        Point          95% Wald

Effect                                               Estimate      Confidence Limits

F_HV                                                  0.637       0.583      0.695

F_REV                                                 3.201       2.960       3.461

F_ACTIVE0                                             1.860       1.721       2.010

 

 

 

* For PROC DMREG (use node [Regression] in SAS EM), get:

                                      Analysis of Maximum Likelihood Estimates

 

                                                     Standard          Wald                  Standardized

Parameter                         DF    Estimate       Error    Chi-Square    Pr > ChiSq        Estimate    Exp(Est)

 

Intercept                           1     -3.5620      0.1009       1247.02        <.0001                       0.028

F_ACTIVE0       0                   1     -0.3103      0.0198        246.48        <.0001                       0.733

F_REV           0                   1     -0.5817      0.0199        851.36        <.0001                       0.559

F_HV            0                   1      0.2257      0.0223        102.19        <.0001                       1.253

(The results for other 15 parameters almost match those from PROC LOGISTIC.)

 

 

                     Odds Ratio Estimates

 

                                                        Point

Effect                                               Estimate

F_ACTIVE0       0 vs 1                              0.538

F_REV           0 vs 1                              0.312

F_HV            0 vs 1                              1.570

 

 

Other 15 parameters almost have the same results of output statistics for both PROC LOGISTIC and PROC DMREG. However, the above 4 parameters (including Intercept) have different results.

 

 

One more thing, using PROC DMREG, why the statistics of 3 parameters for Exp (Est) are not equal to those for Point Estimate? (But other 15 parameters are equal.)

DougWielenga
SAS Employee

Please note that the direct use of the DMREG procedure is not supported by SAS Technical Support.  There is, however. documentation available on request to licensed users of SAS Enterprise Miner.  An excerpt from the documentation for DMREG explains these potential differences:

 

/*** BEGIN EXCERPT ***/

 

The DMREG and LOGISTIC procedures fit the same models for a categorical target. Both procedures have the CLASS statement to specify categorical input variables and both use the deviation from the mean coding as the default parameterization for a CLASS input variable. However, there are many differences between the two procedures, both in syntax and in features. For example, to specify the GLM parameterization of CLASS variables, you specify the MODEL statement option CODING= GLM in the DMREG procedure. But, in the LOGISTIC procedure , you specify the CLASS statement option PARAM= GLM. You are required to specify a DMDB catalog of input data in the DMREG procedure, but not in the LOGISTIC procedure. The DMREG procedure produces DATA step scoring code, but the LOGISTIC procedure does not. In terms of training a model, you might expect the estimates from both procedures to be identical. Often the estimates between the two procedures are very close but not necessarily identical for a number of reasons. The DMREG and LOGISTIC procedures do not use the same routines to carry out the optimization, and the convergence criterion and optimization technique used might not be the same. However, discrepancies of the parameter estimates between the two procedures would not make any difference in prediction.

 

/*** END EXCERPT ***/

 

In short, differences in how categorical effects are coded and differences in optimization algorithms as well as collinearity among any of the predictors might lead to slightly different parameter estimates but these should result in minimal difference in the predicted values.  The GLM coding scheme makes exponentiating the parameter a meaningful value but this is not true for the default deviation coding used by DMREG since this compares each level to the average, not to a 'base' level. One other thing, SAS Enterprise Miner will choose the overall average for the predicted value for the Regression node for any observation with missing values, while these observations will be completely dropped by the LOGISTIC procedure. 


Hope this helps!

Doug 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2671 views
  • 0 likes
  • 3 in conversation