<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: PROC HPGENSELECT: Categorical Variable: Use the Level with the Highest Exposure as the Base in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-Categorical-Variable-Use-the-Level-with-the/m-p/846456#M41910</link>
    <description>See the description of the options in the CLASS statement in the GENMOD documentation. You can use specify the ORDER=FREQ and DESCENDING options as global options (following a slash in the CLASS statement) to order the levels by ascending frequency.</description>
    <pubDate>Sun, 27 Nov 2022 02:38:01 GMT</pubDate>
    <dc:creator>StatDave</dc:creator>
    <dc:date>2022-11-27T02:38:01Z</dc:date>
    <item>
      <title>PROC HPGENSELECT: Categorical Variable: Use the Level with the Highest Exposure as the Base</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-Categorical-Variable-Use-the-Level-with-the/m-p/843701#M41907</link>
      <description>&lt;P&gt;Suppose I have insurance dataset with 2 categorical predictive variables: &lt;EM&gt;Gender (F/M)&lt;/EM&gt; and &lt;EM&gt;Credit (A/B/C/D/E)&lt;/EM&gt;&lt;BR /&gt;I also have exposure variable &lt;EM&gt;Days&lt;/EM&gt; that I will use as the weight in PROC HPGENSELECT.&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;PROC HPGENSELECT data=InputData FCONV=1E-8 MAXITER=100 ITSUMMARY;
     CLASS Gender Credit;  
     MODEL Loss = Gender Credit / dist= Tweedie (p=1.6) link=log;
     WEIGHT Days;
     ODS OUTPUT ParameterEstimates= PEs;
RUN;
PROC PRINT DATA = PEs; RUN;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;From &lt;A href="https://support.sas.com/kb/37/108.html" target="_blank"&gt;37108 - Setting reference levels for CLASS predictor variables (sas.com)&lt;/A&gt; I know that by default the levels are arranged in ascending alphanumeric order -&amp;gt; so M will become the base level for&lt;EM&gt; Gender&lt;/EM&gt;, and E will become the base level for &lt;EM&gt;Credit&lt;/EM&gt;.&lt;/P&gt;&lt;P&gt;However, the prevalent classes using exposure variable Days are &lt;EM&gt;Gender = F&lt;/EM&gt; and &lt;EM&gt;Credit = B&lt;/EM&gt;.&lt;/P&gt;&lt;P&gt;For example, I can use PROC SUMMARY to determine the prevalent class for each predictive variable:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;PROC SUMMARY data=InputData SUM PRINT MISSING;
     CLASS Gender;
     VAR Days;
RUN; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;... and then specify the preferred reference levels in the CLASS statement:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;PROC HPGENSELECT data=InputData FCONV=1E-8 MAXITER=100 ITSUMMARY;
     CLASS Gender&lt;STRONG&gt;(ref = "F")&lt;/STRONG&gt; Credit&lt;STRONG&gt;(ref = "B")&lt;/STRONG&gt;;  
     MODEL Loss = Gender Credit / dist= Tweedie (p=1.6) link=log;
     WEIGHT Days;
     ODS OUTPUT ParameterEstimates= PEs;
RUN;
PROC PRINT DATA = PEs; RUN;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;If I have 10 more categorical predictive variables, is there an elegant way to avoid PROC SUMMARY, pass exposure variable &lt;EM&gt;Days&lt;/EM&gt; to PROC HPGENSELECT, and request PROC HPGENSELECT for each categorical predictive variable use the level with the highest exposure as the base?&lt;/P&gt;&lt;P&gt;Thanks for the insights!&lt;/P&gt;</description>
      <pubDate>Thu, 10 Nov 2022 23:57:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-Categorical-Variable-Use-the-Level-with-the/m-p/843701#M41907</guid>
      <dc:creator>Bear85</dc:creator>
      <dc:date>2022-11-10T23:57:52Z</dc:date>
    </item>
    <item>
      <title>Re: PROC HPGENSELECT: Categorical Variable: Use the Level with the Highest Exposure as the Base</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-Categorical-Variable-Use-the-Level-with-the/m-p/846456#M41910</link>
      <description>See the description of the options in the CLASS statement in the GENMOD documentation. You can use specify the ORDER=FREQ and DESCENDING options as global options (following a slash in the CLASS statement) to order the levels by ascending frequency.</description>
      <pubDate>Sun, 27 Nov 2022 02:38:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-Categorical-Variable-Use-the-Level-with-the/m-p/846456#M41910</guid>
      <dc:creator>StatDave</dc:creator>
      <dc:date>2022-11-27T02:38:01Z</dc:date>
    </item>
    <item>
      <title>Re: PROC HPGENSELECT: Categorical Variable: Use the Level with the Highest Exposure as the Base</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-Categorical-Variable-Use-the-Level-with-the/m-p/851447#M42144</link>
      <description>&lt;P&gt;Thanks for your response, StatDave! Yes, options ORDER = FREQ and DESCENDING in the CLASS statement &lt;A href="https://support.sas.com/documentation/cdl/en/stathpug/66410/HTML/default/viewer.htm#stathpug_introcom_stat_sect003.htm" target="_blank" rel="noopener"&gt;CLASS Statement :: SAS/STAT(R) 12.3 User's Guide: High-Performance Procedures&lt;/A&gt; would work if I wanted to select the base level using highest frequency of &lt;EM&gt;Gender&lt;/EM&gt;. However, I need to consider 2nd variable -&amp;nbsp;&lt;EM&gt;Days - &lt;/EM&gt;to determine the prevalent class. For example, I'd like "F" to be the base class for Gender because it has higher sum(Days), even though "M" has higher _FREQ_&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;&lt;STRONG&gt;Obs&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;Gender&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;_FREQ_&lt;/STRONG&gt;&lt;/TD&gt;&lt;TD&gt;&lt;STRONG&gt;Days&lt;/STRONG&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;F&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4,000&lt;/TD&gt;&lt;TD&gt;810,000&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;M&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 4,821&lt;/TD&gt;&lt;TD&gt;790,560&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;BR /&gt;In my case, I decided to continue to use the approach from the original post: PROC SUMMARY to determine the prevalent class for each predictive variable, and then&amp;nbsp;specify the preferred reference levels in the CLASS statement.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 29 Dec 2022 01:04:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-Categorical-Variable-Use-the-Level-with-the/m-p/851447#M42144</guid>
      <dc:creator>Bear85</dc:creator>
      <dc:date>2022-12-29T01:04:41Z</dc:date>
    </item>
    <item>
      <title>Re: PROC HPGENSELECT: Categorical Variable: Use the Level with the Highest Exposure as the Base</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-Categorical-Variable-Use-the-Level-with-the/m-p/851503#M42145</link>
      <description>&lt;P&gt;Hello &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/434340"&gt;@Bear85&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I see ...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that you can do all that&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; PROC SUMMARY + PROC HPGENSELECT&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; with proper base levels for CLASS variables&lt;/P&gt;
&lt;P&gt;in ONE GO (without any manual intervention)!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can do that with some macro coding or with data-driven code generation in a data step.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Good luck,&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;</description>
      <pubDate>Thu, 29 Dec 2022 09:14:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-Categorical-Variable-Use-the-Level-with-the/m-p/851503#M42145</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2022-12-29T09:14:38Z</dc:date>
    </item>
    <item>
      <title>Re: PROC HPGENSELECT: Categorical Variable: Use the Level with the Highest Exposure as the Base</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-Categorical-Variable-Use-the-Level-with-the/m-p/851509#M42146</link>
      <description>&lt;P&gt;The level should match the standard as &lt;A href="https://bik.ai" target="_self"&gt;bik&lt;/A&gt; is matching the standard of market as an conversational tool&lt;/P&gt;</description>
      <pubDate>Thu, 29 Dec 2022 10:36:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/PROC-HPGENSELECT-Categorical-Variable-Use-the-Level-with-the/m-p/851509#M42146</guid>
      <dc:creator>bik01</dc:creator>
      <dc:date>2022-12-29T10:36:10Z</dc:date>
    </item>
  </channel>
</rss>

