Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Regression with a By variable

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Regression with a By variable

PROC REG allows one to specify a BY variable generating a regression model and coefficients for each value of the BY-variable.

 

How does one achieve the same result in Enterprise Miner?


Accepted Solutions
Solution
2 weeks ago
SAS Employee
Posts: 42

Re: Regression with a By variable

Hello BrianLoe-

 

In Enterprise Miner, you can use the Start Groups / End Groups node pair to perform by-group processing.  Set the Mode property to Stratify.  For details from within Enterprise Miner, select Help -> Contents -> Node Reference -> Utility Nodes -> Start Groups Node.

 

Not all modeling nodes produce the same output with group processing that they produce without group processing.  In the case of Start Groups -> Regression -> End Groups, you still need to perform some coding in order to see the parameter estimates at each iteration.  One method is to add a SAS Code node after the End Groups node.  Select the Code Editor property, and enter code like this (it assumes that this is a new diagram that contains only one Regression node):

 

     proc print data=&EM_LIB..reg_effects_loop;
    run;

 

Close the window, run the node, view the results.  The PROC PRINT output shows the coefficient value for each variable at each level of the BY group.  

 

 

There is an alternative approach that involves no coding.  The alternative works well if your BY variable has only a handful of levels.  With this approach, use one Filter node and one Regression node for every level of the BY variable.  In each level, use the Filter node to filter out the unwanted levels, and you get the usual Regression node results.  You can copy & paste the Filter / Regression pair, and manually modify each Filter node.  I.e., if your BY group has 5 levels, then you will have 5 Filter / Regression pairs that run in parallel.  Connect each Regression node to a single empty SAS Code node so that you can run everything from that single SAS Code node, if you want.

 

 

Which approach to consider depends on the overall goal of your flow.

 

 

If you have a very large number of BY variable levels, then you might want to consider using SAS Factory Miner, a product that is designed for analyzing data that contains a large number of segments (BY variable levels).

 

Thank you for your interest.

 

 

View solution in original post


All Replies
Solution
2 weeks ago
SAS Employee
Posts: 42

Re: Regression with a By variable

Hello BrianLoe-

 

In Enterprise Miner, you can use the Start Groups / End Groups node pair to perform by-group processing.  Set the Mode property to Stratify.  For details from within Enterprise Miner, select Help -> Contents -> Node Reference -> Utility Nodes -> Start Groups Node.

 

Not all modeling nodes produce the same output with group processing that they produce without group processing.  In the case of Start Groups -> Regression -> End Groups, you still need to perform some coding in order to see the parameter estimates at each iteration.  One method is to add a SAS Code node after the End Groups node.  Select the Code Editor property, and enter code like this (it assumes that this is a new diagram that contains only one Regression node):

 

     proc print data=&EM_LIB..reg_effects_loop;
    run;

 

Close the window, run the node, view the results.  The PROC PRINT output shows the coefficient value for each variable at each level of the BY group.  

 

 

There is an alternative approach that involves no coding.  The alternative works well if your BY variable has only a handful of levels.  With this approach, use one Filter node and one Regression node for every level of the BY variable.  In each level, use the Filter node to filter out the unwanted levels, and you get the usual Regression node results.  You can copy & paste the Filter / Regression pair, and manually modify each Filter node.  I.e., if your BY group has 5 levels, then you will have 5 Filter / Regression pairs that run in parallel.  Connect each Regression node to a single empty SAS Code node so that you can run everything from that single SAS Code node, if you want.

 

 

Which approach to consider depends on the overall goal of your flow.

 

 

If you have a very large number of BY variable levels, then you might want to consider using SAS Factory Miner, a product that is designed for analyzing data that contains a large number of segments (BY variable levels).

 

Thank you for your interest.

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 143 views
  • 0 likes
  • 2 in conversation