turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Regression with a By variable

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted
# Regression with a By variable

Options

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2 weeks ago

PROC REG allows one to specify a BY variable generating a regression model and coefficients for each value of the BY-variable.

How does one achieve the same result in Enterprise Miner?

Accepted Solutions

Solution

a week ago

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BrianLoe

a week ago

Hello BrianLoe-

In Enterprise Miner, you can use the Start Groups / End Groups node pair to perform by-group processing. Set the Mode property to Stratify. For details from within Enterprise Miner, select Help -> Contents -> Node Reference -> Utility Nodes -> Start Groups Node.

Not all modeling nodes produce the same output with group processing that they produce without group processing. In the case of Start Groups -> Regression -> End Groups, you still need to perform some coding in order to see the parameter estimates at each iteration. One method is to add a SAS Code node after the End Groups node. Select the Code Editor property, and enter code like this (it assumes that this is a new diagram that contains only one Regression node):

proc print data=&EM_LIB..reg_effects_loop;

run;

Close the window, run the node, view the results. The PROC PRINT output shows the coefficient value for each variable at each level of the BY group.

There is an alternative approach that involves no coding. The alternative works well if your BY variable has only a handful of levels. With this approach, use one Filter node and one Regression node for every level of the BY variable. In each level, use the Filter node to filter out the unwanted levels, and you get the usual Regression node results. You can copy & paste the Filter / Regression pair, and manually modify each Filter node. I.e., if your BY group has 5 levels, then you will have 5 Filter / Regression pairs that run in parallel. Connect each Regression node to a single empty SAS Code node so that you can run everything from that single SAS Code node, if you want.

Which approach to consider depends on the overall goal of your flow.

If you have a very large number of BY variable levels, then you might want to consider using SAS Factory Miner, a product that is designed for analyzing data that contains a large number of segments (BY variable levels).

Thank you for your interest.

All Replies

Solution

a week ago

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to BrianLoe

a week ago

Hello BrianLoe-

In Enterprise Miner, you can use the Start Groups / End Groups node pair to perform by-group processing. Set the Mode property to Stratify. For details from within Enterprise Miner, select Help -> Contents -> Node Reference -> Utility Nodes -> Start Groups Node.

Not all modeling nodes produce the same output with group processing that they produce without group processing. In the case of Start Groups -> Regression -> End Groups, you still need to perform some coding in order to see the parameter estimates at each iteration. One method is to add a SAS Code node after the End Groups node. Select the Code Editor property, and enter code like this (it assumes that this is a new diagram that contains only one Regression node):

proc print data=&EM_LIB..reg_effects_loop;

run;

Close the window, run the node, view the results. The PROC PRINT output shows the coefficient value for each variable at each level of the BY group.

There is an alternative approach that involves no coding. The alternative works well if your BY variable has only a handful of levels. With this approach, use one Filter node and one Regression node for every level of the BY variable. In each level, use the Filter node to filter out the unwanted levels, and you get the usual Regression node results. You can copy & paste the Filter / Regression pair, and manually modify each Filter node. I.e., if your BY group has 5 levels, then you will have 5 Filter / Regression pairs that run in parallel. Connect each Regression node to a single empty SAS Code node so that you can run everything from that single SAS Code node, if you want.

Which approach to consider depends on the overall goal of your flow.

If you have a very large number of BY variable levels, then you might want to consider using SAS Factory Miner, a product that is designed for analyzing data that contains a large number of segments (BY variable levels).

Thank you for your interest.