<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Creating Dummy Variables from Categorical Variable for large dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670874#M201432</link>
    <description>&lt;P&gt;My first guess is, especially for the GLM modelling, is that the problem may not be the number of observations, but the number of unique values of STATE (or other class variables).&amp;nbsp; But given you have not provided the glm model you are estimating (or the logistic) that's only a conjecture.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do you know how many values of STATE you have?&amp;nbsp; You program assumes it has only two values: 1 and 2.&amp;nbsp; But SAS is telling you there is at least one other value.&amp;nbsp; It could be a missing value, or a valid value.&lt;/P&gt;</description>
    <pubDate>Tue, 21 Jul 2020 04:39:47 GMT</pubDate>
    <dc:creator>mkeintz</dc:creator>
    <dc:date>2020-07-21T04:39:47Z</dc:date>
    <item>
      <title>Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670539#M201316</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to make state indicator dummy variables for a large dataset (n &amp;gt; 600,000) I have found documentation suggesting to run a prog glm or logistic. This method works for me if I reduce the sample size, but it will not run on my computer with the entire sample.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have also hard coded it before, but am hoping to find something more efficient as it is making my code very long and difficult to sift through.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have also seen the following code, but I am receiving an error saying "Array Subscript out of range." I have yet to decipher what that means.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is the code I have most recently tried:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data indicators;
set miss_states1;
array dummys {*} st1 - st2;
do i=1 to DIM(dummys);
	dummys(i) = 0;
end;
dummys(state) = 1;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Thank you in advance to anyone who can help!&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jul 2020 00:29:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670539#M201316</guid>
      <dc:creator>csessa3</dc:creator>
      <dc:date>2020-07-20T00:29:33Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670545#M201318</link>
      <description>&lt;P&gt;The error is in this line&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;dummys(state) = 1;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jul 2020 02:14:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670545#M201318</guid>
      <dc:creator>ghosh</dc:creator>
      <dc:date>2020-07-20T02:14:37Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670549#M201322</link>
      <description>Something shorter and more efficient?  Here's an idea.  Do nothing.  Leave your data just as it is.&lt;BR /&gt;&lt;BR /&gt;Both GLM and LOGISTIC support the CLASS statement which let's the procedure create the dummy variables for you.</description>
      <pubDate>Mon, 20 Jul 2020 02:57:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670549#M201322</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2020-07-20T02:57:43Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670550#M201323</link>
      <description>Have you tried the approaches outlined here, specifically GLMSELECT?&lt;BR /&gt;My concern is if that doesn't run, your actual model won't run either?&lt;BR /&gt;&lt;BR /&gt;Could you create a small sample that has all the values using PROC SURVEYSELECT and STRATA to ensure your categorical variables are included and then use one of the shown methods?</description>
      <pubDate>Mon, 20 Jul 2020 03:00:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670550#M201323</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2020-07-20T03:00:12Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670561#M201326</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/332467"&gt;@csessa3&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hello,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would like to make state indicator dummy variables for a large dataset (n &amp;gt; 600,000) I have found documentation suggesting to run a prog glm or logistic. This method works for me if I reduce the sample size, but it will not run on my computer with the entire sample.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have also hard coded it before, but am hoping to find something more efficient as it is making my code very long and difficult to sift through.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have also seen the following code, but I am receiving an error saying "Array Subscript out of range." I have yet to decipher what that means.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is the code I have most recently tried:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data indicators;
set miss_states1;
array dummys {*} st1 - st2;
do i=1 to DIM(dummys);
	dummys(i) = 0;
end;
dummys(state) = 1;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Thank you in advance to anyone who can help!&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;What is the value of the variable State? Your array dummys contains exactly two elements and as defined would allow use of index values of 1 and 2. If STATE contains any value other than 1 or 2 then that is the cause of the "out of range" error.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And if you are having issues with the Proc Logistic or Proc GLM I strongly suggest that you show the LOG for entire procedure, code, warnings, errors and notes. Copy the entire proc from the log and paste the text into&amp;nbsp; a code box opened with the &amp;lt;/&amp;gt; icon.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jul 2020 04:44:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670561#M201326</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-07-20T04:44:57Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670589#M201339</link>
      <description>&lt;P&gt;Although this has been mentioned earlier in this thread, I mention it again for emphasis.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You usually do not need to create your own dummy variables in SAS. Many SAS analyses procedures allow you to use a CLASS statement, so the SAS PROC creates the dummy variables for you, and you know they are correct — no fumbling around in a data step to get the correct dummy variables.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;So, what do you plan to do with these dummy variables after you create them?&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jul 2020 10:31:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670589#M201339</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2020-07-20T10:31:18Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670620#M201346</link>
      <description>&lt;P&gt;Hello. I would be fine using the logistic class statement, but it will not run for my large sample. Which is why I am looking for an alternate solution&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jul 2020 12:42:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670620#M201346</guid>
      <dc:creator>csessa3</dc:creator>
      <dc:date>2020-07-20T12:42:57Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670621#M201347</link>
      <description>&lt;P&gt;Hello, thank you for identifying the issue. Do you know how to correct it? What is wrong with it?&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jul 2020 12:43:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670621#M201347</guid>
      <dc:creator>csessa3</dc:creator>
      <dc:date>2020-07-20T12:43:51Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670623#M201348</link>
      <description>&lt;P&gt;The GLMSELECT and the proc logistic work for creating the categorical variables when the sample size is reduced. I have previously hard coded the state indicators and run my final regression model with no issue, so I am not worried about my final model not working.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am not familiar about the PROC SURVEYSELECT and STRATA method you have suggested. Would you be able to give me a little more clarification?&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jul 2020 12:48:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670623#M201348</guid>
      <dc:creator>csessa3</dc:creator>
      <dc:date>2020-07-20T12:48:01Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670627#M201351</link>
      <description>&lt;P&gt;It's a long story, but the gist is that I have a dataset of medical providers who are listed in multiple states within the dataset. I am going to use the dummy variables to calculate which state they are listed in the most to use this as their "primary" state.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So for example, if doctor A is listed in Florida 2 times and Georgia 1 time, I want to say he is a Florida doctor.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jul 2020 12:55:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670627#M201351</guid>
      <dc:creator>csessa3</dc:creator>
      <dc:date>2020-07-20T12:55:20Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670628#M201352</link>
      <description>&lt;P&gt;State is a string variable with the state abbreviation (FL, GA, AL, etc.)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The error is that my machine runs out of memory. It will spin for 20-30 minutes and then say insufficient memory. When I open the file it will have categorical variables for 10 of the 50 states. So it works, but it can't complete the task for the entire file.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jul 2020 12:58:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670628#M201352</guid>
      <dc:creator>csessa3</dc:creator>
      <dc:date>2020-07-20T12:58:56Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670630#M201354</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/332467"&gt;@csessa3&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;It's a long story, but the gist is that I have a dataset of medical providers who are listed in multiple states within the dataset. I am going to use the dummy variables to calculate which state they are listed in the most to use this as their "primary" state.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So for example, if doctor A is listed in Florida 2 times and Georgia 1 time, I want to say he is a Florida doctor.&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;PROC SUMMARY or PROC FREQ, then you don't need to create dummy variables yourself.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jul 2020 13:01:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670630#M201354</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2020-07-20T13:01:08Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670695#M201370</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/332467"&gt;@csessa3&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;State is a string variable with the state abbreviation (FL, GA, AL, etc.)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The error is that my machine runs out of memory. It will spin for 20-30 minutes and then say insufficient memory. When I open the file it will have categorical variables for 10 of the 50 states. So it works, but it can't complete the task for the entire file.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;So not only are you attempting to use a variable with more than 2 values as the index of the array you are attempting to use a Character variable where a Numeric is required.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And again, &lt;STRONG&gt;show the proc logistic/glm code you are attempting.&lt;/STRONG&gt; 600,000 records is not really a "large" data set. Search this forum an you will find references to folks using way more records than that.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The most likely cause for running out of computational resources is likely to be number of variables. If you read the documentation for Proc Logistics in the details section is a topic labeled "computational resources" that shows the memory needed.&lt;/P&gt;
&lt;P&gt;And additional memory is needed if the SELECTION option is used.&lt;/P&gt;
&lt;P&gt;So, SHOW the code, from the log.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Adding additional variables, i.e. your hand made indicators is likely to increase memory problems not reduce them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="xis-refProc"&gt;
&lt;DIV id="statug_logistic015873" class="AAsection"&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Mon, 20 Jul 2020 15:12:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670695#M201370</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-07-20T15:12:18Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670696#M201371</link>
      <description>&lt;P&gt;SAS can easily track data for 50 states without running out of memory.&amp;nbsp; So your post tells me that you are doing something different, such as trying to track the states for all providers at the same time.&amp;nbsp; Bottom line:&amp;nbsp; you will need to post the log from the program that is running out of memory if you want helpful feedback.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jul 2020 15:13:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670696#M201371</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2020-07-20T15:13:22Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670722#M201383</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/332467"&gt;@csessa3&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;It's a long story, but the gist is that I have a dataset of medical providers who are listed in multiple states within the dataset. I am going to use the dummy variables to calculate which state they are listed in the most to use this as their "primary" state.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So for example, if doctor A is listed in Florida 2 times and Georgia 1 time, I want to say he is a Florida doctor.&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Here is an example of one way involving simple steps to accomplish the part of identifying the "primary" state involved. Note that if there are two or more states with the same count there is nothing supplying any rule for tie breaking.&lt;/P&gt;
&lt;PRE&gt;data have;
   input id $ state $;
datalines;
1 AL
1 AL
1 NM
2 AZ
2 NM
2 NM
;
Proc summary data=have nway;
   class id state;
   output out=work.freq (drop=_type_);
run;
proc sort data=work.freq;
   by id descending _freq_;
run;
data work.temp;
   set work.freq;
   by id;
   if first.id;
run;

proc sql;
   create table want as
   select a.*, b.state as PrimaryState
   from have as a
        left join
        work.temp as b
        on a.id =b.id
   ;
quit;
&lt;/PRE&gt;
&lt;P&gt;The first data step is just to have something to code against. The Proc summary gets a count; the sort orders the count data to get the most common state first, the temp data step gets only one record for each id and the proc SQL adds a new variable of "PrimaryState" back to your original data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, you still have not described how this is going to impact your Proc Logistic/GLM issue. And if you mean to add 50 more variables for "dummies" then that is not the correct approach.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jul 2020 16:57:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670722#M201383</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2020-07-20T16:57:03Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670874#M201432</link>
      <description>&lt;P&gt;My first guess is, especially for the GLM modelling, is that the problem may not be the number of observations, but the number of unique values of STATE (or other class variables).&amp;nbsp; But given you have not provided the glm model you are estimating (or the logistic) that's only a conjecture.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do you know how many values of STATE you have?&amp;nbsp; You program assumes it has only two values: 1 and 2.&amp;nbsp; But SAS is telling you there is at least one other value.&amp;nbsp; It could be a missing value, or a valid value.&lt;/P&gt;</description>
      <pubDate>Tue, 21 Jul 2020 04:39:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/670874#M201432</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2020-07-21T04:39:47Z</dc:date>
    </item>
    <item>
      <title>Re: Creating Dummy Variables from Categorical Variable for large dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/671115#M201491</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/332467"&gt;@csessa3&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;It's a long story, but the gist is that I have a dataset of medical providers who are listed in multiple states within the dataset. I am going to use the dummy variables to calculate which state they are listed in the most to use this as their "primary" state.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So for example, if doctor A is listed in Florida 2 times and Georgia 1 time, I want to say he is a Florida doctor.&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;In a case like this, you shouldn't be solving it with dummy variables. Run a proc freq and get the top count per doctor.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc freq data=have noprint;
table doctorID*state /out =doc_statecounts;
run;

proc sort data=doc_statecounts;
by doctorID descending count;
run;

proc sort data=doc_statecounts nodupkey out=doc_primarystate;
by doctorID;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 21 Jul 2020 16:48:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-Dummy-Variables-from-Categorical-Variable-for-large/m-p/671115#M201491</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2020-07-21T16:48:56Z</dc:date>
    </item>
  </channel>
</rss>

