<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Preparing categorical variable for logistic regression in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Preparing-categorical-variable-for-logistic-regression/m-p/448759#M23455</link>
    <description>&lt;P&gt;There's no need to create dummy variables for PROC LOGISTIC.&amp;nbsp;The CLASS statement in PROC LOGISTIC will handle that for you.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have 100+ product types, this could be a problem for the analysis. If there is some logical way to group some of these product types together, I would give that a try.&lt;/P&gt;</description>
    <pubDate>Mon, 26 Mar 2018 17:57:17 GMT</pubDate>
    <dc:creator>PaigeMiller</dc:creator>
    <dc:date>2018-03-26T17:57:17Z</dc:date>
    <item>
      <title>Preparing categorical variable for logistic regression</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Preparing-categorical-variable-for-logistic-regression/m-p/448755#M23454</link>
      <description>&lt;P&gt;I am trying to build a&amp;nbsp;logistic regression model for campaign scoring.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is for retail grocery..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What I have is 2 type of data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Target event:&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; customer_id, response(Y/N)&lt;/P&gt;&lt;P&gt;Customer Transaction data:&amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN&gt;customer_id,&amp;nbsp;transaction_Date, channel, product_type, price&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Where channel and product_type is categorical&amp;nbsp;variable.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Usually I convert &lt;SPAN&gt;&amp;nbsp;categorical&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;variable into dummy variable, but in this case there's&amp;nbsp;&lt;/SPAN&gt;20+channel and 100+ product_type, so i am not sure what to do.&amp;nbsp; Do I do cluster analysis on the before &lt;SPAN&gt;categorical&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;variable before i&amp;nbsp;&lt;/SPAN&gt;aggregate them to the customer level or if there a better way?&amp;nbsp; Please help.&amp;nbsp; Thanks.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2018 17:45:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Preparing-categorical-variable-for-logistic-regression/m-p/448755#M23454</guid>
      <dc:creator>Fae</dc:creator>
      <dc:date>2018-03-26T17:45:08Z</dc:date>
    </item>
    <item>
      <title>Re: Preparing categorical variable for logistic regression</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Preparing-categorical-variable-for-logistic-regression/m-p/448759#M23455</link>
      <description>&lt;P&gt;There's no need to create dummy variables for PROC LOGISTIC.&amp;nbsp;The CLASS statement in PROC LOGISTIC will handle that for you.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have 100+ product types, this could be a problem for the analysis. If there is some logical way to group some of these product types together, I would give that a try.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2018 17:57:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Preparing-categorical-variable-for-logistic-regression/m-p/448759#M23455</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2018-03-26T17:57:17Z</dc:date>
    </item>
    <item>
      <title>Re: Preparing categorical variable for logistic regression</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Preparing-categorical-variable-for-logistic-regression/m-p/448776#M23456</link>
      <description>&lt;P&gt;Thanks very much for your help.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To make sure I am not making a mistake.&amp;nbsp; for proc Logistic,&amp;nbsp; I can't have multiple&amp;nbsp;data records for each customer so I need to prepare the data as below, right?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Customer1: Response, Predictor variable 1, ....., p&lt;SPAN&gt;redictor variable N,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Customer2: Response, Predictor variable 1, ....., predictor variable N,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;..&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;CustomerN: Response, Predictor variable 1, ....., predictor variable N,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The biggest issues I am facing is that since a customer can have multiple transactions,&amp;nbsp;I don't know how to aggregate them to the customer level without losing the behavior information such as channel and product type.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I can easily aggregate the amount_spent by summing them, but what should I do if it's a category variable?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks very much.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2018 18:30:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Preparing-categorical-variable-for-logistic-regression/m-p/448776#M23456</guid>
      <dc:creator>Fae</dc:creator>
      <dc:date>2018-03-26T18:30:07Z</dc:date>
    </item>
    <item>
      <title>Re: Preparing categorical variable for logistic regression</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Preparing-categorical-variable-for-logistic-regression/m-p/448791#M23459</link>
      <description>&lt;P&gt;How big is your data set? One option would be to combine the data by categories and then create multiple variables.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;ie&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;TotSpend_Sports, TotSpend_Food, TotSpend_Kids, TotSpend_Housewares, etc&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2018 19:49:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Preparing-categorical-variable-for-logistic-regression/m-p/448791#M23459</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-03-26T19:49:18Z</dc:date>
    </item>
    <item>
      <title>Re: Preparing categorical variable for logistic regression</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Preparing-categorical-variable-for-logistic-regression/m-p/448797#M23461</link>
      <description>&lt;P&gt;You could either&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1) Generate your predictor matrix with &lt;STRONG&gt;proc transpose&lt;/STRONG&gt; - requires the replacement of missing values with zéros&lt;/P&gt;
&lt;P&gt;or&lt;/P&gt;
&lt;P&gt;2) Generate your predictor matrix with &lt;STRONG&gt;proc logistic designonly outdesign=&amp;nbsp;&lt;/STRONG&gt;- requires that you sum up predictor values for each Customer.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Mar 2018 20:12:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Preparing-categorical-variable-for-logistic-regression/m-p/448797#M23461</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2018-03-26T20:12:52Z</dc:date>
    </item>
    <item>
      <title>Re: Preparing categorical variable for logistic regression</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Preparing-categorical-variable-for-logistic-regression/m-p/449079#M23476</link>
      <description>&lt;P&gt;If you have repeated measurements on your customers, then you can use PROC GENMOD to fit the logistic model (use DIST=BIN option) and the REPEATED statement with your subject variable in the SUBJECT= option. That will fit a GEE model that adjusts for the correlation within subjects. If you have 100+ levels of a categorical predictor in the CLASS statement, this will probably&amp;nbsp;cause model fitting problems. If there is any logical grouping of these levels into a smaller set of levels, then that is more likely to work. You could either use the DATA step with IF THEN ELSE statements to create a new grouping variable based on the old one, or you could use PROC FORMAT to create a format that groups the levels as desired.&lt;/P&gt;</description>
      <pubDate>Tue, 27 Mar 2018 17:51:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Preparing-categorical-variable-for-logistic-regression/m-p/449079#M23476</guid>
      <dc:creator>StatDave</dc:creator>
      <dc:date>2018-03-27T17:51:10Z</dc:date>
    </item>
  </channel>
</rss>

