BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
xian
Calcite | Level 5

I'm getting different results of variable clusters from EM and EG. I think it's due to the Initial option. In EG, I can use INITIAL=RANDOM and RANDOM=<seed> to set a random seed so that each time I can get the same clusters.If I don't set INITIAL, the clusters seem to be related to the order of the variables I put in the var statement.

 

But I couldn't find the INIT option in EM. Which INIT method is EM using? Is it possible to set a random seed in EM?

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

Although the Variable Clustering node in SAS Enterprise Miner uses the VARCLUS procedure, it's implementation in SAS Enterprise Miner is somewhat different.  In general, I would not expect both to obtain exactly the same results from both processes.  For example, you can include class variables in your analysis.   

 

If you add the following lines to your Project Start Code

 

    options MPRINT;

 

and force the node to rerun (e.g. Set the Rerun property for the Input Data Source node to Yes and rerun the flow) then you will be able to view the actual code that is being run.   Here is an excerpt of the code run against the data SAMPSIO.HMEQ that is installed with SAS Enterprise Miner:

 

/*** BEGIN LOG EXCERPT ***/

 

MPRINT(VARCLUS): proc varclus data = EMWS21.Ids2_DATA outstat= EMWS21.VarClus3_OUTSTAT hi short ;
MPRINT(VARCLUS): var
MPRINT(EM_INTERVAL_INPUT): CLAGE CLNO DEBTINC DELINQ DEROG LOAN MORTDUE NINQ VALUE YOJ
MPRINT(VARCLUS): ;
MPRINT(VARCLUS): ;
MPRINT(VARCLUS): run;

 

/*** END LOG EXCERPT ***/

 

You can see from the log above that the INIT= and RANDOM= options are not specified in this call to VARCLUS since those options are not available in the SAS Enterprise Miner interface.  You will also notice iin the log that there is a great deal more processing that is occurring in SAS Enterprise Miner.   Having said that, there are additional options that you can specify in SAS Enterprise Miner that would likely allow you to get answers close to those produced by VARCLUS with the same options as long as grouping variables are not included in your analysis.

 

Hope this helps!

Doug

View solution in original post

1 REPLY 1
DougWielenga
SAS Employee

Although the Variable Clustering node in SAS Enterprise Miner uses the VARCLUS procedure, it's implementation in SAS Enterprise Miner is somewhat different.  In general, I would not expect both to obtain exactly the same results from both processes.  For example, you can include class variables in your analysis.   

 

If you add the following lines to your Project Start Code

 

    options MPRINT;

 

and force the node to rerun (e.g. Set the Rerun property for the Input Data Source node to Yes and rerun the flow) then you will be able to view the actual code that is being run.   Here is an excerpt of the code run against the data SAMPSIO.HMEQ that is installed with SAS Enterprise Miner:

 

/*** BEGIN LOG EXCERPT ***/

 

MPRINT(VARCLUS): proc varclus data = EMWS21.Ids2_DATA outstat= EMWS21.VarClus3_OUTSTAT hi short ;
MPRINT(VARCLUS): var
MPRINT(EM_INTERVAL_INPUT): CLAGE CLNO DEBTINC DELINQ DEROG LOAN MORTDUE NINQ VALUE YOJ
MPRINT(VARCLUS): ;
MPRINT(VARCLUS): ;
MPRINT(VARCLUS): run;

 

/*** END LOG EXCERPT ***/

 

You can see from the log above that the INIT= and RANDOM= options are not specified in this call to VARCLUS since those options are not available in the SAS Enterprise Miner interface.  You will also notice iin the log that there is a great deal more processing that is occurring in SAS Enterprise Miner.   Having said that, there are additional options that you can specify in SAS Enterprise Miner that would likely allow you to get answers close to those produced by VARCLUS with the same options as long as grouping variables are not included in your analysis.

 

Hope this helps!

Doug

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 988 views
  • 0 likes
  • 2 in conversation