BookmarkSubscribeRSS Feed
krizA
Calcite | Level 5

 

Hi All,

 

I have some few questions on some Enterprise Miner nodes. Hope you can help me. 

 

(1) Regression Node - Logistic Regression

- I noticed that it doesn't generate R-squared values and the usual goodness of fit statistics, like what the Proc Logistic does. I thought of using the SAS Code node instead to get the complete set of stats through Proc logistic. But before I switch to that maybe you have some ideas how to check it in the EM results page/s? Maybe I missed it out. 

- Other option I thought of is the HP Regression node which generates the Hosmer-Lemeshow, 3 R-square values and Somers' D by default. However, the results of this node are not as user-friendly as Regression node (ie, Beta estimates, odds ratios etc).  I can just use it though just to get those stats. I have yet to check if it gives me the same result as Regression node - in terms of coefficients and significant variables. Does anybody have any thoughts on this? 

 

(2) HP Regression Node - Logistic Regression

- Just wanted to make sure that the values under "PARTITION FIT STATISTICS" for Hosmer-lemeshow is the statistic itself and not the p-value? I tried to search for the technical documentation in Eminer Help and even on the internet but it doesn't give me the specifics - just the formula. 

 

(3) Cutoff Node 

- After developing my logistic regression model, I wanted to play around and analyze different cutoffs. But when I tried to do so, it gives me a runtime error everytime. Upon checking the logs, it shows: "ERROR: Undeclared array referenced: symputx". 

- The only different thing I did for this model development is that I did oversampling due to rare event (0.78%) and then added Decisions node to adjust priors to ensure adjusment in my predicted probabilities. I don't see any reasons why it should create some error though. 

 

Would appreciate any thoughts and suggestions. Thanks in advance!

 

2 REPLIES 2
Funda_SAS
SAS Employee

Hi,

 

HP Regression node is built on top of PROC HPLOGISTIC and PROC HPREG. Unfortunately, the high-performance procedures do not support the full functionality of non high-performance version of these procedures (such as PROC LOGISTIC, PROC REG, PROC GLMSELECT, ...). For more information about the capabilities of these procedures, please see proc documentations for PROC HPLOGISTIC and PROC HPREG.

 

PROC HPLOGISTIC supports generalized RSQUARE in the MODEL statement, however this option is currently not supported in EM. One way to get around this is to run PROC HPLOGISTIC through the CODE node.

 

Hope this helps!

 

Funda

krizA
Calcite | Level 5

Hi Funda, 

 

Thanks for your reply. Yes, I have checked the HPLogistic and HPReg codes already. I was tempted to use the SAS Code node but I wish to exhaust all possible EM functionalities before I resort to that. It is unfortunate that it isn't supported so noted on that, thanks.

 

Any thoughts on items #1 and #3? Reg node does not generate the pseudo/McFadden's R-square. So should I use Code node as well? Wonder why that is not included though, it is one of the most basis stats when checking model results. 

 

As for the cutoff node, any thoughts on that as well? Cannot use said node at all due to the persistent error. 

 

Thanks! Appreciate your time to answer.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1691 views
  • 0 likes
  • 2 in conversation