Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- SAS Data Science
- /
- Using start/end groups in Enterprise miner to obtain scores for each g...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-26-2016 06:50 PM
(3432 views)

I am trying to score data for each group selected in the start/end groups nodes in enterprise miner. The goal would be to gain a score for each group instead of just one score.

See example below. this example seems to end with one score, and I am not sure how to adjust to gain a score for each group for each observation in the validation data.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

For many people viewing this thread, I suspect the answer provided by @WendyCzika will be most useful:

I think you need to move the Model Comparison node before the End Groups node, so it will pick the best model for each group. Then you should see in the Score Code in the End Groups node, there is (or could be) a different model used for each group.

See this tip for more info: https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-Build-Stratified-Models-using-the-...

For those who wish to fit a model to each subgroup but want to score all subgroups on all models, a different approach is necessary. As you have already found, Group Processing will not help you accomplish this. The only way to get the individual subgroup models is to fit them in separate paths where you have filtered out the subgroup of interest. You can then Score the entire training data set using the model from each subgroup path. You would then need to merge those data sets together by the ID variable in order to obtain scores for all observations on each subgroup model.

You have to be careful because the default predicted target variable will be the same for all subset models. When you score the whole data table on each model, you will want to create a new predicted target variable for that model so that you can tell them apart. For example, if you have a binary target BAD which takes on values 1 and 0 where 1 is the event of interest, SAS Enterprise Miner would create the prediction variable P_BAD1 which is of the form P_<target variable name><target variable level> to store the predictions from each subset model. After scoring all the observations, create a new variable which equals the prediction variable of interest. For example,

P_Group1_Reg = P_BAD1;

and then export just the ID information and the new prediction variable. Once you have done this for each subset, you can merge the resulting data sets (containing only the ID and the new prediction variable for each observation) by the ID variable you are using. You can later merge in any additional information from the original data set such as the actual target value and any key predictors.

If you are using partitioning. do the Data Partition node first but be sure to stratify on both the target (if categorical) and the subgroup variable. Then create a separate path for each subgroup, using a Filter node to subset out the observations for the category of interest. After fitting the model, attach a Score node and use a new Input Data Source node which has the complete Training data (having all subsets) but set the role to Score so the full data can be scored. You can then create the new prediction variable in a subsequent SAS Code node following the Score node for each subgroup using something like the following assuming the flow has a single binary target, MyID = ID variable for each observation, P_Grp1_Reg = new prediction variable (denotes Group 1 & Regression model), and you are writing to the path defined by the MyLib library:

/*** BEGIN SAS CODE ***/

libname mylib " <path to the location where you are writing out the newly scored data> ";

data mylib.grp1scores;

set &EM_IMPORT_SCORE;

P_Grp1_Reg=%EM_BINARY_TARGET; * Note: assumes a single binary target is used;

keep MyID P_Grp1_Reg;

proc sort data=mylib.grp1scores;

by MyID; * prepare the data to be merged by MyID with the other subgroup scores;

run;

/*** END SAS CODE ***/

You could then easily merge all of the subgroup scores for the whole training data since they would have unique prediction variable names and common ID values sorted and ready for merging using something like the following:

/*** BEGIN SAS CODE ***/

libname mylib " <path to the location where you are writing out the newly scored data> ";

data mylib.allscores;

merge mylib.grp1scores

mylib.grp2scores

mylib.grp3scores;

by MyID;

run;

/*** END SAS CODE ***/

Hope this helps!

Doug

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I think you need to move the Model Comparison node before the End Groups node, so it will pick the best model for each group. Then you should see in the Score Code in the End Groups node, there is (or could be) a different model used for each group.

See this tip for more info: https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-Build-Stratified-Models-using-the-...

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I see in the score node that it scores each group seperately, IE if segment=1 then it scores one way, if segment=2 it scores another way. My goal is to have it score all observations X number of times (X=number of segments).

so, if in the score data, an observation is segment 1, it will score for segment 1,2 3, and 4. Thus I will have 4 different scores to compare.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

For many people viewing this thread, I suspect the answer provided by @WendyCzika will be most useful:

I think you need to move the Model Comparison node before the End Groups node, so it will pick the best model for each group. Then you should see in the Score Code in the End Groups node, there is (or could be) a different model used for each group.

See this tip for more info: https://communities.sas.com/t5/SAS-Communities-Library/Tip-How-to-Build-Stratified-Models-using-the-...

For those who wish to fit a model to each subgroup but want to score all subgroups on all models, a different approach is necessary. As you have already found, Group Processing will not help you accomplish this. The only way to get the individual subgroup models is to fit them in separate paths where you have filtered out the subgroup of interest. You can then Score the entire training data set using the model from each subgroup path. You would then need to merge those data sets together by the ID variable in order to obtain scores for all observations on each subgroup model.

You have to be careful because the default predicted target variable will be the same for all subset models. When you score the whole data table on each model, you will want to create a new predicted target variable for that model so that you can tell them apart. For example, if you have a binary target BAD which takes on values 1 and 0 where 1 is the event of interest, SAS Enterprise Miner would create the prediction variable P_BAD1 which is of the form P_<target variable name><target variable level> to store the predictions from each subset model. After scoring all the observations, create a new variable which equals the prediction variable of interest. For example,

P_Group1_Reg = P_BAD1;

and then export just the ID information and the new prediction variable. Once you have done this for each subset, you can merge the resulting data sets (containing only the ID and the new prediction variable for each observation) by the ID variable you are using. You can later merge in any additional information from the original data set such as the actual target value and any key predictors.

If you are using partitioning. do the Data Partition node first but be sure to stratify on both the target (if categorical) and the subgroup variable. Then create a separate path for each subgroup, using a Filter node to subset out the observations for the category of interest. After fitting the model, attach a Score node and use a new Input Data Source node which has the complete Training data (having all subsets) but set the role to Score so the full data can be scored. You can then create the new prediction variable in a subsequent SAS Code node following the Score node for each subgroup using something like the following assuming the flow has a single binary target, MyID = ID variable for each observation, P_Grp1_Reg = new prediction variable (denotes Group 1 & Regression model), and you are writing to the path defined by the MyLib library:

/*** BEGIN SAS CODE ***/

libname mylib " <path to the location where you are writing out the newly scored data> ";

data mylib.grp1scores;

set &EM_IMPORT_SCORE;

P_Grp1_Reg=%EM_BINARY_TARGET; * Note: assumes a single binary target is used;

keep MyID P_Grp1_Reg;

proc sort data=mylib.grp1scores;

by MyID; * prepare the data to be merged by MyID with the other subgroup scores;

run;

/*** END SAS CODE ***/

You could then easily merge all of the subgroup scores for the whole training data since they would have unique prediction variable names and common ID values sorted and ready for merging using something like the following:

/*** BEGIN SAS CODE ***/

libname mylib " <path to the location where you are writing out the newly scored data> ";

data mylib.allscores;

merge mylib.grp1scores

mylib.grp2scores

mylib.grp3scores;

by MyID;

run;

/*** END SAS CODE ***/

Hope this helps!

Doug

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.