Solved: Re: Regression with a By variable

BrianLoe · Posted 02-09-2018 11:03 AM

PROC REG allows one to specify a BY variable generating a regression model and coefficients for each value of the BY-variable.

How does one achieve the same result in Enterprise Miner?

MikeStockstill · Posted 02-12-2018 02:19 PM

Hello BrianLoe-

In Enterprise Miner, you can use the Start Groups / End Groups node pair to perform by-group processing. Set the Mode property to Stratify. For details from within Enterprise Miner, select Help -> Contents -> Node Reference -> Utility Nodes -> Start Groups Node.

Not all modeling nodes produce the same output with group processing that they produce without group processing. In the case of Start Groups -> Regression -> End Groups, you still need to perform some coding in order to see the parameter estimates at each iteration. One method is to add a SAS Code node after the End Groups node. Select the Code Editor property, and enter code like this (it assumes that this is a new diagram that contains only one Regression node):

proc print data=&EM_LIB..reg_effects_loop;
run;

Close the window, run the node, view the results. The PROC PRINT output shows the coefficient value for each variable at each level of the BY group.

There is an alternative approach that involves no coding. The alternative works well if your BY variable has only a handful of levels. With this approach, use one Filter node and one Regression node for every level of the BY variable. In each level, use the Filter node to filter out the unwanted levels, and you get the usual Regression node results. You can copy & paste the Filter / Regression pair, and manually modify each Filter node. I.e., if your BY group has 5 levels, then you will have 5 Filter / Regression pairs that run in parallel. Connect each Regression node to a single empty SAS Code node so that you can run everything from that single SAS Code node, if you want.

Which approach to consider depends on the overall goal of your flow.

If you have a very large number of BY variable levels, then you might want to consider using SAS Factory Miner, a product that is designed for analyzing data that contains a large number of segments (BY variable levels).

Thank you for your interest.

View solution in original post

MikeStockstill · Posted 02-12-2018 02:19 PM

Hello BrianLoe-

In Enterprise Miner, you can use the Start Groups / End Groups node pair to perform by-group processing. Set the Mode property to Stratify. For details from within Enterprise Miner, select Help -> Contents -> Node Reference -> Utility Nodes -> Start Groups Node.

Not all modeling nodes produce the same output with group processing that they produce without group processing. In the case of Start Groups -> Regression -> End Groups, you still need to perform some coding in order to see the parameter estimates at each iteration. One method is to add a SAS Code node after the End Groups node. Select the Code Editor property, and enter code like this (it assumes that this is a new diagram that contains only one Regression node):

proc print data=&EM_LIB..reg_effects_loop;
run;

Close the window, run the node, view the results. The PROC PRINT output shows the coefficient value for each variable at each level of the BY group.

There is an alternative approach that involves no coding. The alternative works well if your BY variable has only a handful of levels. With this approach, use one Filter node and one Regression node for every level of the BY variable. In each level, use the Filter node to filter out the unwanted levels, and you get the usual Regression node results. You can copy & paste the Filter / Regression pair, and manually modify each Filter node. I.e., if your BY group has 5 levels, then you will have 5 Filter / Regression pairs that run in parallel. Connect each Regression node to a single empty SAS Code node so that you can run everything from that single SAS Code node, if you want.

Which approach to consider depends on the overall goal of your flow.

If you have a very large number of BY variable levels, then you might want to consider using SAS Factory Miner, a product that is designed for analyzing data that contains a large number of segments (BY variable levels).

Thank you for your interest.

BrianRexing · Posted 02-23-2018 12:24 PM

Would SAS ever consider just allowing the user to define the variable role to be "BY" in the model variable editor? Seems like that would be a way easier user experience. Defining BY variables is so easy in SAS EG, but quite cumbersome in SAS EM.

I'm reading the start/end group documentation now, still haven't quite figured it out...

MikeStockstill · Posted 02-23-2018 02:51 PM

Hello BrianRexing -

BY-variable processing (using a BY statement on a procedure) is a special case of group processing, whereas Enterprise Miner has additional methods of group processing available. For what you want, take these steps:

- Add a Start Groups node at the point where you want the group processing to begin.

- Click the Variables property (or right-click the node and select Edit Variables).

- In the Variables window, change the Grouping Role value to Stratification for the variable

that you want to use to define your groups (your BY variable). You can have more than one.

Close the Variables window.

- Change the Start Groups Mode property to Stratify.

Use the Stratify mode to perform standard group processing. When you use the Stratify mode, the Start

Groups node loops through each level of group variable when you run the process flow diagram. When

you select the Stratify mode, the Minimum Group Size and Target Group properties are enabled.

- Add the nodes that you want to process repeatedly.

- Add an End Groups node to close the loop.

For an example, see Help -> Contents -> Node Reference -> Utility Nodes -> Start Groups Node -> Start Groups Node Example.

For details about all of the group processing modes that are available, see the Start Groups Node Train Properties: General section of that same chapter.

Have a nice weekend.