Hi guys,
I was hoping one of you could help me with this. I am familiar with Random Forests and how they operate (well, theoretically) but I'm struggling with the forests in E-miner. I want to model a certain dataset, but there are a few variables in the dataset that I don't want the model to use (i.e. I've discovered that - after runnning some exploratory nodes - some cause high correlations whilst others are unstable). I have dragged a metadata node into the diagram which I ran before the random forest node. In this metadata node, I set the role of these variables to "rejected". Additionally, just to be safe, when I right-click on the random forest node and select the "edit variables" option, I also set the "Use" colomn for these variables to "No" in the node itself. When I view the results, I see that these variables are not listed under "Variable importance" and they are also not listed when you select "view -> properties -> variables". Awesome! However, when I extract the scoring code to run the forest in Base SAS, I see that a lot of these variables that I don't want in the model are still run in the SAS code? Is this just the way the random forest node executes? Are these variables used in the final model anyways (which is not ideal)?
Thanks for the help!
J
Hi Jakes,
When you say you see the extra variables in the score code, are you sure that you are looking at the score code from HPForest?
Random Forests in Enterprise Miner is one of the few procedures that do not produce SAS Score code. The score code would be quite large, so instead your HPForest node produces a file that another procedure (proc hp4score) uses to score.
If you go to results to see the SAS code, you will see something like the below
proc hp4score data=&hpfst_score_input;
id &hpfst_id_vars;
%if %symexist(EM_USER_OUTMDLFILE)=0 %then %do;
score file="D:\EM\EM_Projects\EM14.1\miguel\demo\Workspaces\EMWS1\HPDMForest2\OUTMDLFILE.bin" out=&hpfst_score_output;
%end;
%else %do;
score file="&EM_USER_OUTMDLFILE" out=&hpfst_score_output;
%end;
PERFORMANCE DETAILS;
run;
Kinda puzzled here. Do you have a screenshot that suggests that HPForest is not honoring your metadata selections?
Thanks,
Miguel
Hi Jason,
Thanks for the info! I will definately be trying the variable selection in the RF node, since my company is running the latest SAS Packages. I am actually not a fan of the more complex models such as the RF node, simply because of its difficulty to explain (especially to my technical commity 🙂 ) and the fact that it does not score in a simple data step such as the other "lament" models (like the regression, neural net, etc.). However, in some cases - like the problem I have been dealing with now - it outperforms the other models by a significant margin.
The most important part I took away from your message is to go and look at the optimized code. I did manage to get the model to run in Base SAS by using a score node. The main reason for me posting this question was that I saw some of those variables I rejected coming into the variable transformations bit of the code - something you addressed in the reply.
Thanks for the help!
Hi Miguel,
Thanks for the interest in my question. I did manage to get the code runnning in Base SAS / SAS EG by copying the OUTMDLFILE.bin to a certain location on our company's grid processor and referencing it in my code.
The reason why I asked the question that I did was: I applied some transformations to the variables before running variable selection / model nodes. When I extracted the code, I saw that some of these variables that I did not want in the model still appeared in the Base SAS code when applying the transformations. I am going to have a look at the Optimized code like Jason suggested, as I admit that might be where my problem is.
Eventhough I know exactly how RF models work from a theoretical/academic point of view and I have been involved with some of them in R, I have no idea how the hp4score node in SAS works 🙂
I am doing what any analyst should not do and trusting in the results that the Base SAS code gives me which I got from the E-miner Scoring node.
Thanks for the help!
Dear All,
I went through all the conversation and understood that I can't directly extract the RF Scoring code and run it directly through EG like it's done with DT or Regressions etc..
All i need to know now how can I extract the RFscoring code to run it through EG or is this impossible to be done ?
Thanks in advance.
Mohammad ElSofany
Data Scientist
Hi Mohammad
One of the SAS consultants at our company helped me to run the Random Forest in EG. When you extract the RF scoring code and open it in EG, look for somewhere in the code where it declares a macro called %em_hpfst_score. In this macro, the code is looking for a value for the macro variable %em_score_output, but if we just extract the code from EM, it does not populate it by default. So all I did was in the SAS line above where the macro is being declared, I specified the macro variable em_score_output by typing the code let em_score_output = scoreset; where scoreset is the dataset I'm scoring the RF on. Just remember to put the part where the code actually scores the data with the RF within a datastep. I hope this makes sense - let me know if it worked!
Thanks JakesVenter,
The thing that I only can see a small code when i open the Model and get the code from it or run scoring code export node (In this node in particular there is 2 codes 1 very long one with all the variables and other is the small one that I'm putting below.
Please note that this small code that I'm displaying below is the same code I can find from the Model node itself.
Appreciate your advice.
Data work.myoutput;
Set scoreset;
*------------------------------------------------------------*;
* EM SCORE CODE;
* EM Version: 13.2;
* SAS Release: 9.04.01M2P072314;
* Host: SC-172-20-150-203;
* Encoding: utf-8;
* Locale: en_US;
* Project Path: /sasdata2/SAS-USERS/melsofany/MNP/MNP_Test;
* Project Name: MNP_Test;
* Diagram Id: EMWS1;
* Diagram Name: Imp_vars;
* Generated by: unxsrv;
* Date: 30AUG2016:16:13:19;
*------------------------------------------------------------*;
*------------------------------------------------------------*;
* TOOL: Input Data Source;
* TYPE: SAMPLE;
* NODE: Ids;
*------------------------------------------------------------*;
*------------------------------------------------------------*;
* TOOL: Partition Class;
* TYPE: SAMPLE;
* NODE: Part;
*------------------------------------------------------------*;
*------------------------------------------------------------*;
* TOOL: Extension Class;
* TYPE: MODEL;
* NODE: HPDMForest5;
*------------------------------------------------------------*;
%macro em_hpfst_score;
%if %symexist(hpfst_score_input)=0 %then %let hpfst_score_input=&em_score_output;
%if %symexist(hpfst_score_output)=0 %then %let hpfst_score_output=&em_score_output;
%if %symexist(hpfst_id_vars)=0 %then %let hpfst_id_vars = _ALL_;
%let hpvvn= %sysfunc(getoption(VALIDVARNAME));
options validvarname=V7;
proc hp4score data=&hpfst_score_input;
id &hpfst_id_vars;
%if %symexist(EM_USER_OUTMDLFILE)=0 %then %do;
score file="/sasdata2/SAS-USERS/melsofany/MNP/MNP_Test/MNP_Test/Workspaces/EMWS1/HPDMForest5/OUTMDLFILE.bin" out=&hpfst_score_output;
%end;
%else %do;
score file="&EM_USER_OUTMDLFILE" out=&hpfst_score_output;
%end;
PERFORMANCE DETAILS;
run;
options validvarname=&hpvvn;
data &hpfst_score_output;
set &hpfst_score_output;
%mend;
%em_hpfst_score;
*------------------------------------------------------------*;
*Computing Classification Vars: Target;
*------------------------------------------------------------*;
length _format200 $200;
drop _format200;
_format200= ' ' ;
_p_= 0 ;
drop _p_ ;
if P_Target1 - _p_ > 1e-8 then do ;
_p_= P_Target1 ;
_format200='1';
end;
if P_Target0 - _p_ > 1e-8 then do ;
_p_= P_Target0 ;
_format200='0';
end;
I_Target=dmnorm(_format200,32); ;
label U_Target = 'Unnormalized Into: Target';
if I_Target='1' then
U_Target=1;
if I_Target='0' then
U_Target=0;
data &em_score_output;
set &em_score_output;
*------------------------------------------------------------*;
* TOOL: Score Node;
* TYPE: ASSESS;
* NODE: Score;
*------------------------------------------------------------*;
*------------------------------------------------------------*;
* Score: Creating Fixed Names;
*------------------------------------------------------------*;
LABEL EM_EVENTPROBABILITY = 'Probability for level 1 of Target';
EM_EVENTPROBABILITY = P_Target1;
LABEL EM_PROBABILITY = 'Probability of Classification';
EM_PROBABILITY =
max(
P_Target1
,
P_Target0
);
LENGTH EM_CLASSIFICATION $%dmnorlen;
LABEL EM_CLASSIFICATION = "Prediction for Target";
EM_CLASSIFICATION = I_Target;
run;
Hi Mohammad,
In addition to what you must do in my previous message, there is also something else that you must do in the code for the RF to work in EG. There should be a file called "OUTMDLFILE.bin" that is generated by the HPDM RF node in Eminer and saved in the folder where your EM project is saved. It should be saved under the "HPDMForest" folder that is located under your "EMWS" folder under the Workspaces folder. Now, in the SAS code, create a macro variable called path and let that macro variable reference this OUTMDLFILE.bin file. For example, something like %let path = "C:\Users\Desktop\OUTMDLFILE.bin" (the path would look something like this if I copied this file to my desktop - it doesn't matter, just as long as you reference this file). Then, again go to the place in the code where the macro em_hpfst_score is defined. Within that macro, reference the path macro variable at the place in the code that I highlighted in the picture attached to this message.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.