<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: building regression trees by groups in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/building-regression-trees-by-groups/m-p/880878#M43575</link>
    <description>Just to clarify.&lt;BR /&gt;1. You want one regression tree for each unit (zip code);&lt;BR /&gt;2. You want to impute missing values for variables/columns that are not the zip code, based on the regression tree results.</description>
    <pubDate>Thu, 15 Jun 2023 09:19:35 GMT</pubDate>
    <dc:creator>JosvanderVelden</dc:creator>
    <dc:date>2023-06-15T09:19:35Z</dc:date>
    <item>
      <title>building regression trees by groups</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/building-regression-trees-by-groups/m-p/880855#M43572</link>
      <description>&lt;P&gt;I have a large dataset that combines multiple rows for each of multiple units (zip codes). I need to build one regression trees for each unit and use them to impute missing values in the DV. I understand that hpsplit does not implement "by" processing, so I thought of using a macro to split, build tree, score missing values, and append to an output data set.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is how I found&amp;nbsp;this question&amp;nbsp;&lt;LI-MESSAGE title="Splitting the dataset using macros" uid="212367" url="https://communities.sas.com/t5/SAS-Programming/Splitting-the-dataset-using-macros/m-p/212367#U212367" discussion_style_icon_css="lia-mention-container-editor-message lia-img-icon-forum-thread lia-fa-icon lia-fa-forum lia-fa-thread lia-fa"&gt;&lt;/LI-MESSAGE&gt; and the recommendation not to use macros. Can anyone suggest a better approach?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The version of SAS is 9.4, SAS/STAT 15.2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Jun 2023 02:19:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/building-regression-trees-by-groups/m-p/880855#M43572</guid>
      <dc:creator>RBA</dc:creator>
      <dc:date>2023-06-15T02:19:10Z</dc:date>
    </item>
    <item>
      <title>Re: building regression trees by groups</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/building-regression-trees-by-groups/m-p/880878#M43575</link>
      <description>Just to clarify.&lt;BR /&gt;1. You want one regression tree for each unit (zip code);&lt;BR /&gt;2. You want to impute missing values for variables/columns that are not the zip code, based on the regression tree results.</description>
      <pubDate>Thu, 15 Jun 2023 09:19:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/building-regression-trees-by-groups/m-p/880878#M43575</guid>
      <dc:creator>JosvanderVelden</dc:creator>
      <dc:date>2023-06-15T09:19:35Z</dc:date>
    </item>
    <item>
      <title>Re: building regression trees by groups</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/building-regression-trees-by-groups/m-p/880896#M43578</link>
      <description>&lt;P&gt;You don't say what PROC or other functions in SAS you are going to use to create regression trees.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, if you are going to use PROC HPSPLIT, there is no BY statement, so some sort of macro is probably unavoidable. Other methods of creating regression trees may have the BY statement, so we definitely need to know how you are planning to do this.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Macros have advantages and disadvantages. Without a more clear explanation of what you are planning to do, there's really no way to discuss advantages and disadvantages of macros.&lt;/P&gt;</description>
      <pubDate>Thu, 15 Jun 2023 11:15:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/building-regression-trees-by-groups/m-p/880896#M43578</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2023-06-15T11:15:53Z</dc:date>
    </item>
    <item>
      <title>Re: building regression trees by groups</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/building-regression-trees-by-groups/m-p/880899#M43579</link>
      <description>&lt;P&gt;Thanks for the prompt reply. We are clear on your point 1.&lt;BR /&gt;&lt;BR /&gt;About point 2, I meant I want to impute the dependent variable, so I would use the trees for scoring.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There are missing values among the explanatory variables but that is precisely the reason I am considering trees for this task (so I do not need to throw observations away).&lt;/P&gt;</description>
      <pubDate>Thu, 15 Jun 2023 11:38:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/building-regression-trees-by-groups/m-p/880899#M43579</guid>
      <dc:creator>RBA</dc:creator>
      <dc:date>2023-06-15T11:38:19Z</dc:date>
    </item>
    <item>
      <title>Re: building regression trees by groups</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/building-regression-trees-by-groups/m-p/880901#M43580</link>
      <description>&lt;P&gt;Thanks for the reply. I did not specify a specific tool precisely because I am open to suggestions. I am only aware of HPSPLIT, which, as you mentioned, does not allow processing by group. This is precisely the reason I thought I may have to use macros.&lt;BR /&gt;&lt;BR /&gt;If needed, I could try mode complex methods (e.g., random forests), but I am afraid it may not be feasible because the dataset has over 100,000,000 observartions and a few hundred groups.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Would you happen to be aware of alternatives to HPSPLIT that could handle this volume of information by group?&lt;/P&gt;</description>
      <pubDate>Thu, 15 Jun 2023 11:46:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/building-regression-trees-by-groups/m-p/880901#M43580</guid>
      <dc:creator>RBA</dc:creator>
      <dc:date>2023-06-15T11:46:34Z</dc:date>
    </item>
    <item>
      <title>Re: building regression trees by groups</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/building-regression-trees-by-groups/m-p/880903#M43581</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;There are missing values among the explanatory variables but that is precisely the reason I am considering trees for this task (so I do not need to throw observations away).&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;SAS has some built-in methods of imputing missing values, such as PROC MI. Whether or not this would work for you, I can't say.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;I am only aware of HPSPLIT, which, as you mentioned, does not allow processing by group. This is precisely the reason I thought I may have to use macros.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Yes, I don't personally have qualms about creating a macro to do BY group processing in this case where the PROC you want to use does not have a BY statement. I think the disadvantages of using macros here would be very few and minor.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;If needed, I could try mode complex methods (e.g., random forests), but I am afraid it may not be feasible because the dataset has over 100,000,000 observartions [sic] and a few hundred groups.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;SPAN&gt;Perhaps SAS Viya, which has the ability to distribute the task between many different machines and also has PROC TREESPLIT, might be a good way to get the results you want in a short amount of time.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 15 Jun 2023 11:52:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/building-regression-trees-by-groups/m-p/880903#M43581</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2023-06-15T11:52:15Z</dc:date>
    </item>
  </channel>
</rss>

