BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Xamius32
Calcite | Level 5

I am trying to score data for each group selected in the start/end groups nodes in enterprise miner. The goal would be to gain a score for each group instead of just one score.

 

See example below. this example seems to end with one score, and I am not sure how to adjust to gain a score for each group for each observation in the validation data. 

 

Capture.PNG

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

For many people viewing this thread, I suspect the answer provided by @WendyCzika will be most useful:

 

I think you need to move the Model Comparison node before the End Groups node, so it will pick the best model for each group. Then you should see in the Score Code in the End Groups node, there is (or could be) a different model used for each group.   

See this tip for more info: https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-Build-Stratified-Models-using-the-...

 

For those who wish to fit a model to each subgroup but want to score all subgroups on all models, a different approach is necessary.  As you have already found, Group Processing will not help you accomplish this.   The only way to get the individual subgroup models is to fit them in separate paths where you have filtered out the subgroup of interest.  You can then Score the entire training data set using the model from each subgroup path.  You would then need to merge those data sets together by the ID variable in order to obtain scores for all observations on each subgroup model. 


You have to be careful because the default predicted target variable will be the same for all subset models.  When you score the whole data table on each model, you will want to create a new predicted target variable for that model so that you can tell them apart.  For example, if you have a binary target BAD which takes on values 1 and 0 where 1 is the event of interest, SAS Enterprise Miner would create the prediction variable P_BAD1 which is of the form  P_<target variable name><target variable level> to store the predictions from each subset model.  After scoring all the observations, create a new variable which equals the prediction variable of interest.  For example, 

 

       P_Group1_Reg = P_BAD1;

 

and then export just the ID information and the new prediction variable.  Once you have done this for each subset, you can merge the resulting data sets (containing only the ID and the new prediction variable for each observation) by the ID variable you are using.  You can later merge in any additional information from the original data set such as the actual target value and any key predictors. 


If you are using partitioning. do the Data Partition node first but be sure to stratify on both the target (if categorical) and the subgroup variable.  Then create a separate path for each subgroup, using a Filter node to subset out the observations for the category of interest.  After fitting the model, attach a Score node and use a new Input Data Source node which has the complete Training data (having all subsets) but set the role to Score so the full data can be scored.  You can then create the new prediction variable in a subsequent SAS Code node following the Score node for each subgroup using something like the following assuming the flow has a single binary target, MyID = ID variable for each observation, P_Grp1_Reg = new prediction variable (denotes Group 1 & Regression model), and you are writing to the path defined by the MyLib library:

 

/*** BEGIN SAS CODE ***/

 

    libname mylib " <path to the location where you are writing out the newly scored data> ";

     

    data mylib.grp1scores; 

          set &EM_IMPORT_SCORE;

          P_Grp1_Reg=%EM_BINARY_TARGET;  * Note: assumes a single binary target is used;

          keep MyID P_Grp1_Reg;   

 

    proc sort data=mylib.grp1scores;

          by MyID;  * prepare the data to be merged by MyID with the other subgroup scores;

    run; 

 

/*** END SAS CODE ***/

 

You could then easily merge all of the subgroup scores for the whole training data since they would have unique prediction variable names and common ID values sorted and ready for merging using something like the following:

 

/*** BEGIN SAS CODE ***/

 

    libname mylib " <path to the location where you are writing out the newly scored data> ";

 

    data mylib.allscores;

         merge mylib.grp1scores

                     mylib.grp2scores

                     mylib.grp3scores;

         by MyID;

      run;

 

/*** END SAS CODE ***/

 

Hope this helps!

Doug

 

View solution in original post

5 REPLIES 5
WendyCzika
SAS Employee

I think you need to move the Model Comparison node before the End Groups node, so it will pick the best model for each group. Then you should see in the Score Code in the End Groups node, there is (or could be) a different model used for each group.   

See this tip for more info: https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-Build-Stratified-Models-using-the-...

 

Xamius32
Calcite | Level 5

I appreciate the response. I had tried that originally, and when I put it before, the score node still only ends up with one score. I only see one prediction in the exported data under the score node as seen below.

 

scored.PNG

 

 

image.png

Xamius32
Calcite | Level 5

I see in the score node that it scores each group seperately, IE if segment=1 then it scores one way, if segment=2 it scores another way. My goal is to have it score all observations X number of times (X=number of segments).

 

so, if in the score data, an observation is segment 1, it will score for segment 1,2 3, and 4. Thus I will have 4 different scores to compare. 

WendyCzika
SAS Employee

Oh, I see now.  I can't think of a way to do that in Enterprise Miner, but maybe someone else will have an idea.

DougWielenga
SAS Employee

For many people viewing this thread, I suspect the answer provided by @WendyCzika will be most useful:

 

I think you need to move the Model Comparison node before the End Groups node, so it will pick the best model for each group. Then you should see in the Score Code in the End Groups node, there is (or could be) a different model used for each group.   

See this tip for more info: https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-Build-Stratified-Models-using-the-...

 

For those who wish to fit a model to each subgroup but want to score all subgroups on all models, a different approach is necessary.  As you have already found, Group Processing will not help you accomplish this.   The only way to get the individual subgroup models is to fit them in separate paths where you have filtered out the subgroup of interest.  You can then Score the entire training data set using the model from each subgroup path.  You would then need to merge those data sets together by the ID variable in order to obtain scores for all observations on each subgroup model. 


You have to be careful because the default predicted target variable will be the same for all subset models.  When you score the whole data table on each model, you will want to create a new predicted target variable for that model so that you can tell them apart.  For example, if you have a binary target BAD which takes on values 1 and 0 where 1 is the event of interest, SAS Enterprise Miner would create the prediction variable P_BAD1 which is of the form  P_<target variable name><target variable level> to store the predictions from each subset model.  After scoring all the observations, create a new variable which equals the prediction variable of interest.  For example, 

 

       P_Group1_Reg = P_BAD1;

 

and then export just the ID information and the new prediction variable.  Once you have done this for each subset, you can merge the resulting data sets (containing only the ID and the new prediction variable for each observation) by the ID variable you are using.  You can later merge in any additional information from the original data set such as the actual target value and any key predictors. 


If you are using partitioning. do the Data Partition node first but be sure to stratify on both the target (if categorical) and the subgroup variable.  Then create a separate path for each subgroup, using a Filter node to subset out the observations for the category of interest.  After fitting the model, attach a Score node and use a new Input Data Source node which has the complete Training data (having all subsets) but set the role to Score so the full data can be scored.  You can then create the new prediction variable in a subsequent SAS Code node following the Score node for each subgroup using something like the following assuming the flow has a single binary target, MyID = ID variable for each observation, P_Grp1_Reg = new prediction variable (denotes Group 1 & Regression model), and you are writing to the path defined by the MyLib library:

 

/*** BEGIN SAS CODE ***/

 

    libname mylib " <path to the location where you are writing out the newly scored data> ";

     

    data mylib.grp1scores; 

          set &EM_IMPORT_SCORE;

          P_Grp1_Reg=%EM_BINARY_TARGET;  * Note: assumes a single binary target is used;

          keep MyID P_Grp1_Reg;   

 

    proc sort data=mylib.grp1scores;

          by MyID;  * prepare the data to be merged by MyID with the other subgroup scores;

    run; 

 

/*** END SAS CODE ***/

 

You could then easily merge all of the subgroup scores for the whole training data since they would have unique prediction variable names and common ID values sorted and ready for merging using something like the following:

 

/*** BEGIN SAS CODE ***/

 

    libname mylib " <path to the location where you are writing out the newly scored data> ";

 

    data mylib.allscores;

         merge mylib.grp1scores

                     mylib.grp2scores

                     mylib.grp3scores;

         by MyID;

      run;

 

/*** END SAS CODE ***/

 

Hope this helps!

Doug

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3353 views
  • 0 likes
  • 3 in conversation