turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Re: Regression with a By variable

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-09-2018 11:03 AM

PROC REG allows one to specify a BY variable generating a regression model and coefficients for each value of the BY-variable.

How does one achieve the same result in Enterprise Miner?

Accepted Solutions

Solution

02-12-2018
02:43 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BrianLoe

02-12-2018 02:19 PM

Hello BrianLoe-

In Enterprise Miner, you can use the Start Groups / End Groups node pair to perform by-group processing. Set the Mode property to Stratify. For details from within Enterprise Miner, select Help -> Contents -> Node Reference -> Utility Nodes -> Start Groups Node.

Not all modeling nodes produce the same output with group processing that they produce without group processing. In the case of Start Groups -> Regression -> End Groups, you still need to perform some coding in order to see the parameter estimates at each iteration. One method is to add a SAS Code node after the End Groups node. Select the Code Editor property, and enter code like this (it assumes that this is a new diagram that contains only one Regression node):

proc print data=&EM_LIB..reg_effects_loop;

run;

Close the window, run the node, view the results. The PROC PRINT output shows the coefficient value for each variable at each level of the BY group.

There is an alternative approach that involves no coding. The alternative works well if your BY variable has only a handful of levels. With this approach, use one Filter node and one Regression node for every level of the BY variable. In each level, use the Filter node to filter out the unwanted levels, and you get the usual Regression node results. You can copy & paste the Filter / Regression pair, and manually modify each Filter node. I.e., if your BY group has 5 levels, then you will have 5 Filter / Regression pairs that run in parallel. Connect each Regression node to a single empty SAS Code node so that you can run everything from that single SAS Code node, if you want.

Which approach to consider depends on the overall goal of your flow.

If you have a very large number of BY variable levels, then you might want to consider using SAS Factory Miner, a product that is designed for analyzing data that contains a large number of segments (BY variable levels).

Thank you for your interest.

All Replies

Solution

02-12-2018
02:43 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BrianLoe

02-12-2018 02:19 PM

Hello BrianLoe-

In Enterprise Miner, you can use the Start Groups / End Groups node pair to perform by-group processing. Set the Mode property to Stratify. For details from within Enterprise Miner, select Help -> Contents -> Node Reference -> Utility Nodes -> Start Groups Node.

Not all modeling nodes produce the same output with group processing that they produce without group processing. In the case of Start Groups -> Regression -> End Groups, you still need to perform some coding in order to see the parameter estimates at each iteration. One method is to add a SAS Code node after the End Groups node. Select the Code Editor property, and enter code like this (it assumes that this is a new diagram that contains only one Regression node):

proc print data=&EM_LIB..reg_effects_loop;

run;

Close the window, run the node, view the results. The PROC PRINT output shows the coefficient value for each variable at each level of the BY group.

There is an alternative approach that involves no coding. The alternative works well if your BY variable has only a handful of levels. With this approach, use one Filter node and one Regression node for every level of the BY variable. In each level, use the Filter node to filter out the unwanted levels, and you get the usual Regression node results. You can copy & paste the Filter / Regression pair, and manually modify each Filter node. I.e., if your BY group has 5 levels, then you will have 5 Filter / Regression pairs that run in parallel. Connect each Regression node to a single empty SAS Code node so that you can run everything from that single SAS Code node, if you want.

Which approach to consider depends on the overall goal of your flow.

If you have a very large number of BY variable levels, then you might want to consider using SAS Factory Miner, a product that is designed for analyzing data that contains a large number of segments (BY variable levels).

Thank you for your interest.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MikeStockstill

02-23-2018 12:24 PM

Would SAS ever consider just allowing the user to define the variable role to be "BY" in the model variable editor? Seems like that would be a way easier user experience. Defining BY variables is so easy in SAS EG, but quite cumbersome in SAS EM.

I'm reading the start/end group documentation now, still haven't quite figured it out...

Highlighted
## Re: Regression with a By variable

Options

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BrianRexing

02-23-2018 02:51 PM

Hello BrianRexing -

BY-variable processing (using a BY statement on a procedure) is a special case of group processing, whereas Enterprise Miner has additional methods of group processing available. For what you want, take these steps:

- Add a Start Groups node at the point where you want the group processing to begin.

- Click the Variables property (or right-click the node and select Edit Variables).

- In the Variables window, change the Grouping Role value to Stratification for the variable

that you want to use to define your groups (your BY variable). You can have more than one.

Close the Variables window.

- Change the Start Groups Mode property to Stratify.

Use the Stratify mode to perform standard group processing. When you use the Stratify mode, the Start

Groups node loops through each level of group variable when you run the process flow diagram. When

you select the Stratify mode, the Minimum Group Size and Target Group properties are enabled.

- Add the nodes that you want to process repeatedly.

- Add an End Groups node to close the loop.

For an example, see Help -> Contents -> Node Reference -> Utility Nodes -> Start Groups Node -> Start Groups Node Example.

For details about all of the group processing modes that are available, see the Start Groups Node Train Properties: General section of that same chapter.

Have a nice weekend.