<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Creating an unbiased dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Creating-an-unbiased-dataset/m-p/731977#M228075</link>
    <description>&lt;P&gt;You are looking for a &lt;EM&gt;balanced&lt;/EM&gt; dataset.&lt;/P&gt;
&lt;P&gt;Start with:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
select min(sum(depvar=0), sum(depvar=1)) into : sampSize;
quit;

proc sort data=myData; by depvar; run;
 
proc surveyselect data=myData out=mySamples method=srs sampSize=&amp;amp;sampSize.;
strata depvar;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;(untested)&lt;/P&gt;</description>
    <pubDate>Wed, 07 Apr 2021 16:33:30 GMT</pubDate>
    <dc:creator>PGStats</dc:creator>
    <dc:date>2021-04-07T16:33:30Z</dc:date>
    <item>
      <title>Creating an unbiased dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-an-unbiased-dataset/m-p/731929#M228041</link>
      <description>&lt;P&gt;I have a dataset where the dependent variable (depvar) comprises of 0's and 1's. I want to modify that dataset so that it consists of 50% 0s and 50% 1s i.e. an unbiased dataset.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How can I do that?&lt;/P&gt;</description>
      <pubDate>Wed, 07 Apr 2021 14:24:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-an-unbiased-dataset/m-p/731929#M228041</guid>
      <dc:creator>aalluru</dc:creator>
      <dc:date>2021-04-07T14:24:08Z</dc:date>
    </item>
    <item>
      <title>Re: Creating an unbiased dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-an-unbiased-dataset/m-p/731933#M228045</link>
      <description>Have you tried PROC SURVEYSELECT?</description>
      <pubDate>Wed, 07 Apr 2021 14:36:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-an-unbiased-dataset/m-p/731933#M228045</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-04-07T14:36:23Z</dc:date>
    </item>
    <item>
      <title>Re: Creating an unbiased dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-an-unbiased-dataset/m-p/731937#M228048</link>
      <description>&lt;P&gt;I had a look at it but I'm not sure how I can use it to get what I need here&lt;/P&gt;</description>
      <pubDate>Wed, 07 Apr 2021 14:46:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-an-unbiased-dataset/m-p/731937#M228048</guid>
      <dc:creator>aalluru</dc:creator>
      <dc:date>2021-04-07T14:46:08Z</dc:date>
    </item>
    <item>
      <title>Re: Creating an unbiased dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-an-unbiased-dataset/m-p/731970#M228071</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/355241"&gt;@aalluru&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I have a dataset where the dependent variable (depvar) comprises of 0's and 1's. I want to modify that dataset so that it consists of 50% 0s and 50% 1s i.e. an unbiased dataset.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How can I do that?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;What other constraints might be involved? You don't mention how many records are involved, how many records should be in the resulting data set or if any other variables are involved or need to be considered.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Survey select with your data stratified by the variable should select a desired subset:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;/* needed to use strata */
Proc sort data=have;
   by dependentvar;
run;

proc surveyselect data=have out=selected
   sampsize=(1234 1234); /* this is number of each that want, not a RATE*/
   strata dependentvar;
run;
&lt;/PRE&gt;
&lt;P&gt;Replace 1234 with the number of records of each that you want.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My feeling though is that but specifying your "outcome" variable this way you are very likely creating a bias that did not exist in the original data.&lt;/P&gt;
&lt;P&gt;Consider if your outcome were to be a result like "had an adverse reaction to medication" and your independent variables are demographics where the original outcome was maybe 25% with reaction. You subset of data makes the overall "adverse rate" much higher and might obscure the common elements in the independent variables that were actually associated with the adverse reaction.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What specific types of analysis are planning for this data?&lt;/P&gt;</description>
      <pubDate>Wed, 07 Apr 2021 16:13:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-an-unbiased-dataset/m-p/731970#M228071</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2021-04-07T16:13:06Z</dc:date>
    </item>
    <item>
      <title>Re: Creating an unbiased dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Creating-an-unbiased-dataset/m-p/731977#M228075</link>
      <description>&lt;P&gt;You are looking for a &lt;EM&gt;balanced&lt;/EM&gt; dataset.&lt;/P&gt;
&lt;P&gt;Start with:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
select min(sum(depvar=0), sum(depvar=1)) into : sampSize;
quit;

proc sort data=myData; by depvar; run;
 
proc surveyselect data=myData out=mySamples method=srs sampSize=&amp;amp;sampSize.;
strata depvar;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;(untested)&lt;/P&gt;</description>
      <pubDate>Wed, 07 Apr 2021 16:33:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Creating-an-unbiased-dataset/m-p/731977#M228075</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2021-04-07T16:33:30Z</dc:date>
    </item>
  </channel>
</rss>

