<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: random selecting cases by stratifying 3 variables (2 binary and one continuous) in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/random-selecting-cases-by-stratifying-3-variables-2-binary-and/m-p/295655#M61807</link>
    <description>&lt;P&gt;Surveyselect with Strata variables selectes sample size indicated for each level of the strata level. So an actual continuous variable is a poor choice for strata. What is the TOTAL number of records you are trying to select? I can't tell if you want 9000, 15000 or 3000 or something different.&lt;/P&gt;
&lt;P&gt;I am not sure what you mean by "in the selected 3000 and the total sample".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You might try leaving out the strata and see if the summary of the resulting data is close enough for your purpose.&lt;/P&gt;</description>
    <pubDate>Wed, 31 Aug 2016 20:48:47 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2016-08-31T20:48:47Z</dc:date>
    <item>
      <title>random selecting cases by stratifying 3 variables (2 binary and one continuous)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/random-selecting-cases-by-stratifying-3-variables-2-binary-and/m-p/295650#M61806</link>
      <description>&lt;P&gt;I need to select 3000 cases from about 63000 cases, by stratefying 3 variables:&lt;/P&gt;
&lt;P&gt;age: continuous. what to have similar mean/std in the selected 3000 and the total sample.&lt;/P&gt;
&lt;P&gt;gender: binary. want to have similar percentages in the selected 3000 and in the total sample&lt;/P&gt;
&lt;P&gt;location: binary. two levels. what to have similar percentage in selected 3000 and in the total.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am not sure if the following code is right or not. Can I use both categorical and numeric variables for strata ?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;proc surveyselect data = total_sample out = selected_3000 
    method = srs n=3000 seed = 9876;
    strata gender age location;
run;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2016 20:22:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/random-selecting-cases-by-stratifying-3-variables-2-binary-and/m-p/295650#M61806</guid>
      <dc:creator>fengyuwuzu</dc:creator>
      <dc:date>2016-08-31T20:22:00Z</dc:date>
    </item>
    <item>
      <title>Re: random selecting cases by stratifying 3 variables (2 binary and one continuous)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/random-selecting-cases-by-stratifying-3-variables-2-binary-and/m-p/295655#M61807</link>
      <description>&lt;P&gt;Surveyselect with Strata variables selectes sample size indicated for each level of the strata level. So an actual continuous variable is a poor choice for strata. What is the TOTAL number of records you are trying to select? I can't tell if you want 9000, 15000 or 3000 or something different.&lt;/P&gt;
&lt;P&gt;I am not sure what you mean by "in the selected 3000 and the total sample".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You might try leaving out the strata and see if the summary of the resulting data is close enough for your purpose.&lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2016 20:48:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/random-selecting-cases-by-stratifying-3-variables-2-binary-and/m-p/295655#M61807</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-08-31T20:48:47Z</dc:date>
    </item>
    <item>
      <title>Re: random selecting cases by stratifying 3 variables (2 binary and one continuous)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/random-selecting-cases-by-stratifying-3-variables-2-binary-and/m-p/295657#M61808</link>
      <description>&lt;P&gt;Simple random sampling will give you the correct proportions (gender, location) and average (age) without stratification. Stratification is for when you want your selected sample to represent a population that is not necessarily well represented in your sampling set. Try this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Option outall creates a variable named Selected = 1 when selected in the
sample and = 0 otherwise. */
proc surveyselect data = total_sample out = selected_3000 
    method = srs n=3000 seed = 9876 outall;
run;

/* Check proportions */
proc freq data=selected_3000;
tables selected*(gender location) / chisq;
run;

/* Check age averages */
proc glm data=selected_3000;
class selected;
model age =  selected;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;(untested)&lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2016 21:21:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/random-selecting-cases-by-stratifying-3-variables-2-binary-and/m-p/295657#M61808</guid>
      <dc:creator>PGStats</dc:creator>
      <dc:date>2016-08-31T21:21:05Z</dc:date>
    </item>
    <item>
      <title>Re: random selecting cases by stratifying 3 variables (2 binary and one continuous)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/random-selecting-cases-by-stratifying-3-variables-2-binary-and/m-p/295847#M61868</link>
      <description>The "outall" option is really great. I was thinking to merge back to create such an indicator variable.</description>
      <pubDate>Thu, 01 Sep 2016 14:19:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/random-selecting-cases-by-stratifying-3-variables-2-binary-and/m-p/295847#M61868</guid>
      <dc:creator>fengyuwuzu</dc:creator>
      <dc:date>2016-09-01T14:19:25Z</dc:date>
    </item>
  </channel>
</rss>

