<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to keep the same amount of people based on a column? in SAS Enterprise Guide</title>
    <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/755674#M39163</link>
    <description>Keep the same amount of IDs, yes.&lt;BR /&gt;&lt;BR /&gt;In this example we have 2 IDs with target 1 and&amp;nbsp;5 IDs with target 0, so it is a not balanced dataset based on the target variable. My original dataset is composed by 1142 IDs with target 1 and 8395 IDs with target 0.&lt;BR /&gt;&lt;BR /&gt;I want to keep the dataset as big as possible, so, to keep the same amount of IDs for each value of the target variable, the output would be, for example, 2 IDs with target 1 (which are in disadvantage) and 2 IDs with target 0.&lt;BR /&gt;And I said randomly because there are no further rules to filter who with target 1 is being kept.&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Wed, 21 Jul 2021 15:47:14 GMT</pubDate>
    <dc:creator>fcf</dc:creator>
    <dc:date>2021-07-21T15:47:14Z</dc:date>
    <item>
      <title>How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/755661#M39161</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Imagine this is what I have:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;id  x   y    z     target
1   a   b    c          1
1   a   b    c          1
1   a   b    c          1
2   a   b    c          0
2   a   b    c          0
3   a   b    c          0
4   a   b    c          0
4   a   b    c          0
5   a   b    c          1
5   a   b    c          1 
6   a   b    c          0
6   a   b    c          0
6   a   b    c          0
7   a   b    c          0
7   a   b    c          0&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;What I need is to keep the same amount of IDs (and its rows) based on the target variable, randomly, so I have a balanced dataset to create a predictive model.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been searching but can't seem to find anything similar. Thank you for the help&lt;/P&gt;</description>
      <pubDate>Wed, 21 Jul 2021 15:16:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/755661#M39161</guid>
      <dc:creator>fcf</dc:creator>
      <dc:date>2021-07-21T15:16:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/755668#M39162</link>
      <description>&lt;P&gt;Is this supposed to be a random selection?&lt;/P&gt;
&lt;P&gt;Exactly how is "based on the target variable" to be used? Not obvious as you do not show a result, desired or possible.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;By "amount of people" do you mean the same number of unique ids? How many people do you want in the final result?&lt;/P&gt;
&lt;P&gt;One of each is a "same amount". So must be a bit more going on here.&lt;/P&gt;
&lt;P&gt;Do you know how many people are in each target?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 21 Jul 2021 15:36:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/755668#M39162</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2021-07-21T15:36:00Z</dc:date>
    </item>
    <item>
      <title>Re: How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/755674#M39163</link>
      <description>Keep the same amount of IDs, yes.&lt;BR /&gt;&lt;BR /&gt;In this example we have 2 IDs with target 1 and&amp;nbsp;5 IDs with target 0, so it is a not balanced dataset based on the target variable. My original dataset is composed by 1142 IDs with target 1 and 8395 IDs with target 0.&lt;BR /&gt;&lt;BR /&gt;I want to keep the dataset as big as possible, so, to keep the same amount of IDs for each value of the target variable, the output would be, for example, 2 IDs with target 1 (which are in disadvantage) and 2 IDs with target 0.&lt;BR /&gt;And I said randomly because there are no further rules to filter who with target 1 is being kept.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 21 Jul 2021 15:47:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/755674#M39163</guid>
      <dc:creator>fcf</dc:creator>
      <dc:date>2021-07-21T15:47:14Z</dc:date>
    </item>
    <item>
      <title>Re: How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/755696#M39164</link>
      <description>&lt;P&gt;You should be able to just use PROC SURVEYSELECT with SIZE= option.&lt;/P&gt;
&lt;P&gt;Calculate the size of the smallest group and use that as the SIZE= option.&lt;/P&gt;
&lt;P&gt;Here is example using SASHELP.CLASS as dataset and SEX as the stratifying variable.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;proc sort data=sashelp.class out=have;
  by sex;
run;

proc sql noprint;
select min(count) into :size 
  from (select sex,count(*) as count from have group by sex)
;
quit;
%put &amp;amp;=size;


proc surveyselect data=have n=&amp;amp;size seed=47279 out=want;
  strata sex;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 21 Jul 2021 16:29:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/755696#M39164</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2021-07-21T16:29:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760580#M39280</link>
      <description>Sorry for the delay, thank you, but it didn't work</description>
      <pubDate>Tue, 10 Aug 2021 11:45:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760580#M39280</guid>
      <dc:creator>fcf</dc:creator>
      <dc:date>2021-08-10T11:45:18Z</dc:date>
    </item>
    <item>
      <title>Re: How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760611#M39281</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/355489"&gt;@fcf&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;Sorry for the delay, thank you, but it didn't work&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;IMPORTANT CONCEPT: if you tell us it didn't work, and provide no other information, we can't help you. You need to explain and provide information about what you did and what happened.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Show us exactly the code you used. If there is an ERROR in the log, show us the ENTIRE log (that's 100% of the log, every single character, do not chop anything out). If the results are wrong, show us the wrong output and explain why its wrong and what you want to see instead.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Aug 2021 12:35:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760611#M39281</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2021-08-10T12:35:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760629#M39282</link>
      <description>I can't seem to 100% understand what happened but I think the output is returning 50% rows with target 0 and 50% rows with target 1, but that is not what I need.&lt;BR /&gt;&lt;BR /&gt;I need to have a output with 50% IDS that have target 0 and 50% IDS that have target 1, mantaining all rows of those ids.</description>
      <pubDate>Tue, 10 Aug 2021 13:10:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760629#M39282</guid>
      <dc:creator>fcf</dc:creator>
      <dc:date>2021-08-10T13:10:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760636#M39284</link>
      <description>&lt;P&gt;Do you have repeated observations for the same ID in your original dataset?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It sounds like you want to sample from just the unique set of ID values and then pull all observations for those ids.&lt;/P&gt;
&lt;P&gt;So first make the unique list of ids (and grouping variable).&amp;nbsp; Then sample from that.&amp;nbsp; Then use that list of sampled ids to get all observations for those ids from the original dataset.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Let us know if you need help coding that.&lt;/P&gt;</description>
      <pubDate>Tue, 10 Aug 2021 13:19:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760636#M39284</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2021-08-10T13:19:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760637#M39285</link>
      <description>&lt;P&gt;According to the input I posted, this is an example of the output I need:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE class="lang-py s-code-block"&gt;&lt;CODE class="hljs language-python"&gt;&lt;SPAN class="hljs-built_in"&gt;id&lt;/SPAN&gt;  x   y    z     target
&lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;2&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;0&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;2&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;0&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;3&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;0&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;5&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;5&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt; &lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 10 Aug 2021 13:20:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760637#M39285</guid>
      <dc:creator>fcf</dc:creator>
      <dc:date>2021-08-10T13:20:24Z</dc:date>
    </item>
    <item>
      <title>Re: How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760638#M39286</link>
      <description>No, I don't have repeated observations, I just posted "a", "b", "c" because regardless of the information there, I'll focus on the variables "id" and "target" to determine the ids that stay in the output. I want 50% ids with target 1 (and all the rows associated with those ids) and 50% ids with target 0 and also all rows associated with those ids.</description>
      <pubDate>Tue, 10 Aug 2021 13:22:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760638#M39286</guid>
      <dc:creator>fcf</dc:creator>
      <dc:date>2021-08-10T13:22:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760645#M39287</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/355489"&gt;@fcf&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;According to the input I posted, this is an example of the output I need:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE class="lang-py s-code-block"&gt;&lt;CODE class="hljs language-python"&gt;&lt;SPAN class="hljs-built_in"&gt;id&lt;/SPAN&gt;  x   y    z     target
&lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;2&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;0&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;2&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;0&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;3&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;0&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;5&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt;
&lt;SPAN class="hljs-number"&gt;5&lt;/SPAN&gt;   a   b    c          &lt;SPAN class="hljs-number"&gt;1&lt;/SPAN&gt; &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;So you DO have repeats.&amp;nbsp; For ID=1 there are 3 observations.&lt;/P&gt;
&lt;P&gt;Here is one way to create a dataset that has only one observation per ID.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=have(keep=id target) out=unique nodupkey;
  by id target;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 10 Aug 2021 13:32:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760645#M39287</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2021-08-10T13:32:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760649#M39288</link>
      <description>Yes, but the problem it's not there. Like I said, I wrote "a", "b" and "c" just as examples. In my dataset, there are no duplicate rows. The thing is two focus on the id and the target associated. I can even only have the ID and the TARGET variables. What I need is a way to keep 50% of the ids with target 0 and 50% of the ids with target 1. Then I can perfom a join or something to gather all the rows associated with the IDS.&lt;BR /&gt;&lt;BR /&gt;I posted that way because it would be faster to get the ouput I want.</description>
      <pubDate>Tue, 10 Aug 2021 13:38:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760649#M39288</guid>
      <dc:creator>fcf</dc:creator>
      <dc:date>2021-08-10T13:38:12Z</dc:date>
    </item>
    <item>
      <title>Re: How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760659#M39289</link>
      <description>&lt;P&gt;So let's me some sample data that has different number of distinct ID values per TARGET value.&lt;/P&gt;
&lt;P&gt;So this has 2 IDS with TARGET=1 and 4 IDS with TARGET=0.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  input id x $ y $ z $ target;
cards;
1 a b c 1
1 a b c 1
1 a b c 1
2 a b c 0
2 a b c 0
3 a b c 0
4 a b c 0
5 a b c 1
5 a b c 1 
6 a b c 0
6 a b c 0
;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Now let's get the distinct list of IDS and how many ids are in the smaller target group.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql noprint;
  create table ids as 
    select distinct id,target 
    from have
    order by target,id
  ;
  select min(n) into :size trimmed
  from (select target,count(*) as n from ids group by target)
  ;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Then let's sample the IDS from the two groups.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc surveyselect data=ids  n=&amp;amp;size /*seed=47279*/ out=sample;
  strata target;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;And finally use the sampled ID values to subset the original data.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql noprint;
  create table want as 
    select * from have
    where id in (select id from sample)
  ;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Results:&lt;/P&gt;
&lt;PRE&gt;Obs    id    x    y    z    target

 1      1    a    b    c       1
 2      1    a    b    c       1
 3      1    a    b    c       1
 4      2    a    b    c       0
 5      2    a    b    c       0
 6      3    a    b    c       0
 7      5    a    b    c       1
 8      5    a    b    c       1
&lt;/PRE&gt;</description>
      <pubDate>Tue, 10 Aug 2021 14:10:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760659#M39289</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2021-08-10T14:10:43Z</dc:date>
    </item>
    <item>
      <title>Re: How to keep the same amount of people based on a column?</title>
      <link>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760664#M39290</link>
      <description>Thank you so much, that's exactly what I needed!</description>
      <pubDate>Tue, 10 Aug 2021 14:06:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Enterprise-Guide/How-to-keep-the-same-amount-of-people-based-on-a-column/m-p/760664#M39290</guid>
      <dc:creator>fcf</dc:creator>
      <dc:date>2021-08-10T14:06:16Z</dc:date>
    </item>
  </channel>
</rss>

