<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using Proc Reg with categorical variables in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56028#M15645</link>
    <description>It is not a concern if the rows are all of the dummy variables (in PROC REG) or levels (in PROC GLM) representing a discrete variable. It is, in fact, an expected outcome when dummy variables (in PROC REG) or discrete variables (in PROC GLM) are used.</description>
    <pubDate>Tue, 21 Jul 2009 20:28:52 GMT</pubDate>
    <dc:creator>Paige</dc:creator>
    <dc:date>2009-07-21T20:28:52Z</dc:date>
    <item>
      <title>Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56020#M15637</link>
      <description>My goal is to develop a model with proc reg using both categorical (such as gender) and and continuous variables (such as age) to predict a continuous outcome (such as profit).  Can proc reg do this?  If not what can?  If it can, do I specify which ones are categorical?  Lastly, I also have a zipcode predictor which can take on 10 values, how do I change this into a binary predictor for regression?&lt;BR /&gt;
&lt;BR /&gt;
Sorry if these questions are obvious, I'm still learning how to navigate SAS documentation.&lt;BR /&gt;
&lt;BR /&gt;
-Thanks</description>
      <pubDate>Fri, 17 Jul 2009 18:56:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56020#M15637</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2009-07-17T18:56:38Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56021#M15638</link>
      <description>&lt;P&gt;PROC REG does not support categorical predictors directly. You have to recode them into a series of 0-1 values and use them in the model. A two-level categorical variable (like gender) becomes a simple 0-1 recode and then treated as continuous. A three-level categorical variable becomes two variables, etc. &lt;BR /&gt; &lt;BR /&gt; This is analogous to the reference cell recoding that can be used in PROC GLM for categorical variables. The place that it falls down is that if you use the variable selection tools in REG, then you can end up with the situation of part of a variable in the model.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13573"&gt;@Paige&lt;/a&gt;&amp;nbsp;agrees:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;SPAN&gt;I would use PROC GLM instead of PROC REG. Your predictor variables that are categories (gender, zip) are placed in the CLASS statement.&lt;/SPAN&gt;&lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Also consider GLMSELECT procedure. &amp;nbsp;&lt;SPAN&gt; It fills&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;the gap of allowing variable selection with CLASS&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;variables. It also produces output that allow&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;further analyses with REG and/or GLM. &amp;nbsp;GLMSELECT treats a class variable as a single multi-degree of freedom test for inclusion/exclusion. Either all levels are in or all levels are out; it's not a piecemeal process.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13573"&gt;@Paige&lt;/a&gt;&amp;nbsp;points out:&lt;/SPAN&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;SPAN&gt;One thing that I feel needs to be pointed out here is that, despite the introduction of PROC GLMSELECT by SAS, many statisticians feel that STEPWISE (including forward and backward) model selection procedures is dangerous and misleading, and advise against using such. (Yes, I know there are other selection procedures in GLMSELECT, such as LAR and LASSO, which I have no knowledge of)&lt;/SPAN&gt;&lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;Editor's note: this response consolidates several of the helpful replies in this thread. &amp;nbsp;Read through the entire topic to see the conversation.&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 28 Mar 2017 11:59:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56021#M15638</guid>
      <dc:creator>Doc_Duke</dc:creator>
      <dc:date>2017-03-28T11:59:49Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56022#M15639</link>
      <description>I would use PROC GLM instead of PROC REG. Your predictor variables that are categories (gender, zip) are placed in the CLASS statement.</description>
      <pubDate>Mon, 20 Jul 2009 13:52:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56022#M15639</guid>
      <dc:creator>Paige</dc:creator>
      <dc:date>2009-07-20T13:52:34Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56023#M15640</link>
      <description>Thanks for the help.&lt;BR /&gt;
&lt;BR /&gt;
I ran the regression with both PROC REG (created dummy variables) and PROC GLM.  For the 10 values of the discrete variable, I created 9 dummy variables.&lt;BR /&gt;
&lt;BR /&gt;
Also I noticed using proc reg that out of my 9 categorical variables coefficients, that one of them wasn't significant so I dropped it out of my model, does PROC GLM do this?&lt;BR /&gt;
&lt;BR /&gt;
Lastly,&lt;BR /&gt;
The Proc Glm doesn't give me estimation parameters, what is the synatx to get that?</description>
      <pubDate>Tue, 21 Jul 2009 15:29:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56023#M15640</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2009-07-21T15:29:03Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56024#M15641</link>
      <description>&amp;gt; Thanks for the help.&lt;BR /&gt;
&amp;gt; &lt;BR /&gt;
&amp;gt; I ran the regression with both PROC REG (created&lt;BR /&gt;
&amp;gt; dummy variables) and PROC GLM.  For the 10 values of&lt;BR /&gt;
&amp;gt; the discrete variable, I created 9 dummy variables.&lt;BR /&gt;
&amp;gt; &lt;BR /&gt;
&amp;gt; Also I noticed using proc reg that out of my 9&lt;BR /&gt;
&amp;gt; categorical variables coefficients, that one of them&lt;BR /&gt;
&amp;gt; wasn't significant so I dropped it out of my model,&lt;BR /&gt;
&amp;gt; does PROC GLM do this?&lt;BR /&gt;
&lt;BR /&gt;
This is a problem, as it doesn't make sense to delete the coefficient of one of the 9 categorical variables that represent a single discrete variable. I would recommend that you don't do this.&lt;BR /&gt;
&lt;BR /&gt;
Since you said that you yourself dropped the term out of the model ... yes, PROC GLM does this as well.&lt;BR /&gt;
 &lt;BR /&gt;
&amp;gt; Lastly,&lt;BR /&gt;
&amp;gt; The Proc Glm doesn't give me estimation parameters,&lt;BR /&gt;
&amp;gt; what is the synatx to get that?&lt;BR /&gt;
&lt;BR /&gt;
In the MODEL statement, use the SOLUTION option.</description>
      <pubDate>Tue, 21 Jul 2009 17:05:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56024#M15641</guid>
      <dc:creator>Paige</dc:creator>
      <dc:date>2009-07-21T17:05:48Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56025#M15642</link>
      <description>Paige, thank you for all your help.&lt;BR /&gt;
&lt;BR /&gt;
Last question, should I be alarmed when I get the following note:&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
NOTE: The X'X matrix has been found to be singular, and a generalized inverse was used to solve the&lt;BR /&gt;
      normal equations.  Terms whose estimates are followed by the letter 'B' are not uniquely&lt;BR /&gt;
      estimable.</description>
      <pubDate>Tue, 21 Jul 2009 17:51:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56025#M15642</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2009-07-21T17:51:57Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56026#M15643</link>
      <description>The X'X message is generally benign if you have a CLASS statement in GLM.  It is used to show all levels of the CLASS variable in the analysis.  Do be concerned if there are TWO rows labeled "B" for one variable, as that represents and unplanned linear dependency.</description>
      <pubDate>Tue, 21 Jul 2009 18:33:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56026#M15643</guid>
      <dc:creator>Doc_Duke</dc:creator>
      <dc:date>2009-07-21T18:33:46Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56027#M15644</link>
      <description>I've got 9 rows with the B in it.&lt;BR /&gt;
&lt;BR /&gt;
'Do be concerned if there are TWO rows labeled "B" for one variable, as that represents and unplanned linear dependency. '&lt;BR /&gt;
&lt;BR /&gt;
Are you saying I should only be worried if there are two rows labeled 'B' and I don't have a class statement?

Message was edited by: Jrb599</description>
      <pubDate>Tue, 21 Jul 2009 18:47:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56027#M15644</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2009-07-21T18:47:48Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56028#M15645</link>
      <description>It is not a concern if the rows are all of the dummy variables (in PROC REG) or levels (in PROC GLM) representing a discrete variable. It is, in fact, an expected outcome when dummy variables (in PROC REG) or discrete variables (in PROC GLM) are used.</description>
      <pubDate>Tue, 21 Jul 2009 20:28:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56028#M15645</guid>
      <dc:creator>Paige</dc:creator>
      <dc:date>2009-07-21T20:28:52Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56029#M15646</link>
      <description>After talking to my professor, I have decided I am going to drop just one of the dummy variables; how do I go about doing this in PROC GLM.  Thanks.</description>
      <pubDate>Sat, 25 Jul 2009 17:28:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56029#M15646</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2009-07-25T17:28:32Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56030#M15647</link>
      <description>You can't drop just one dummy variable in PROC GLM.&lt;BR /&gt;
&lt;BR /&gt;
The choice of dummy variables is done internally, so you have no control over it.&lt;BR /&gt;
&lt;BR /&gt;
Furthermore, the results you get from the PROC GLM way of doing things produces the exact same predictions, exact same sum of squares, exact same model, etc. as any other way of creating dummy variables. &lt;BR /&gt;
&lt;BR /&gt;
In other words, if you could drop one of the dummy variables, so instead of &lt;I&gt;n&lt;/I&gt; dummy variables representing &lt;I&gt;n&lt;/I&gt; levels, you now have &lt;I&gt;n&lt;/I&gt;–1 dummy variables, the results are the same.&lt;BR /&gt;
&lt;BR /&gt;
I am surprised your professor does not know this.</description>
      <pubDate>Mon, 27 Jul 2009 12:55:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56030#M15647</guid>
      <dc:creator>Paige</dc:creator>
      <dc:date>2009-07-27T12:55:51Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56031#M15648</link>
      <description>We do get it, it's the fact that Cat9 and Cat10 have no significant difference and therefore there is no need for that term with such a high p-value.  They both can be estimated by the parameter without developing a poor model.</description>
      <pubDate>Mon, 27 Jul 2009 13:08:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56031#M15648</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2009-07-27T13:08:59Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56032#M15649</link>
      <description>Hi Jrb599,&lt;BR /&gt;
&lt;BR /&gt;
A point to remember. The dummy variable that is not in the model represents a reference level for the categorical variable represented by the dummy variables in the model. This was mentioned by Doc@Duke at the beginning of this thread. The reference level is the one to which all other levels are compared. Therefore, if more than one dummy variables are not in the model, then all other levels are compared to the combination of the levels these dummy variables represent.</description>
      <pubDate>Mon, 27 Jul 2009 16:37:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56032#M15649</guid>
      <dc:creator>statsplank</dc:creator>
      <dc:date>2009-07-27T16:37:27Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56033#M15650</link>
      <description>Jrb599,&lt;BR /&gt;
&lt;BR /&gt;
One thing that I had forgotten, as it is so new to SAS, is the SAS 9.2 procedure GLMSELECT.  It fills the gap of allowing variable selection with CLASS variables.  It also produces output that allow further analyses with REG and/or GLM.&lt;BR /&gt;
&lt;BR /&gt;
I haven't tried it, but it may help address some of the questions that you have posed here.&lt;BR /&gt;
&lt;BR /&gt;
Doc Muhlbaier&lt;BR /&gt;
Duke</description>
      <pubDate>Mon, 27 Jul 2009 16:44:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56033#M15650</guid>
      <dc:creator>Doc_Duke</dc:creator>
      <dc:date>2009-07-27T16:44:30Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56034#M15651</link>
      <description>"Hi Jrb599,&lt;BR /&gt;
&lt;BR /&gt;
A point to remember. The dummy variable that is not in the model represents a reference level for the categorical variable represented by the dummy variables in the model. This was mentioned by Doc@Duce at the beginning of this thread. The reference level is the one to which all other levels are compared. Therefore, if more than one dummy variables are not in the model, then all other levels are compared to the combination of the levels these dummy variables represent."&lt;BR /&gt;
&lt;BR /&gt;
Yes I get this, this is what I'm trying to do since there is no difference between Cat9 and Cat10(my reference level).</description>
      <pubDate>Mon, 27 Jul 2009 17:02:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56034#M15651</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2009-07-27T17:02:25Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56035#M15652</link>
      <description>&amp;gt; One thing that I had forgotten, as it is so new to&lt;BR /&gt;
&amp;gt; SAS, is the SAS 9.2 procedure GLMSELECT.  It fills&lt;BR /&gt;
&amp;gt; the gap of allowing variable selection with CLASS&lt;BR /&gt;
&amp;gt; variables.  It also produces output that allow&lt;BR /&gt;
&amp;gt; further analyses with REG and/or GLM.&lt;BR /&gt;
&amp;gt; &lt;BR /&gt;
&amp;gt; I haven't tried it, but it may help address some of&lt;BR /&gt;
&amp;gt; the questions that you have posed here.&lt;BR /&gt;
&amp;gt; &lt;BR /&gt;
&amp;gt; Doc Muhlbaier&lt;BR /&gt;
&lt;BR /&gt;
One thing that I feel needs to be pointed out here is that, despite the introduction of PROC GLMSELECT by SAS, many statisticians feel that STEPWISE (including forward and backward) model selection procedures is dangerous and misleading, and advise against using such. (Yes, I know there are other selection procedures in GLMSELECT, such as LAR and LASSO, which I have no knowledge of)&lt;BR /&gt;
&lt;BR /&gt;
Further, it's not obvious to me that model selection procedures used on dummy variables representing a single categorical variable makes any sense.</description>
      <pubDate>Mon, 27 Jul 2009 19:13:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56035#M15652</guid>
      <dc:creator>Paige</dc:creator>
      <dc:date>2009-07-27T19:13:11Z</dc:date>
    </item>
    <item>
      <title>Re: Using Proc Reg with categorical variables</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56036#M15653</link>
      <description>Paige's point about the "value" of any stepwise process is valid.  It's another tool that may or may not be appropriate in a given situation.&lt;BR /&gt;
&lt;BR /&gt;
GLMSELECT treats a class variable as a single multi-degree of freedom test for inclusion/exclusion.  Either all levels are in or all levels are out; it's not a piecemeal process.</description>
      <pubDate>Mon, 27 Jul 2009 19:22:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Using-Proc-Reg-with-categorical-variables/m-p/56036#M15653</guid>
      <dc:creator>Doc_Duke</dc:creator>
      <dc:date>2009-07-27T19:22:37Z</dc:date>
    </item>
  </channel>
</rss>

