<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Stratified Sampling based on multiple variables in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Stratified-Sampling-based-on-multiple-variables/m-p/119829#M6292</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I don't know how it works, but I do have a suspicion.&amp;nbsp; Perhaps the procedure requires every combination of strata variables to be represented in the sample.&amp;nbsp; If the number of observations fitting into a particular strata combination were 5, the software would still have to select one of them into the sample.&amp;nbsp; If that applied to every strata combination, you would end up with a 20% sample.&amp;nbsp; You could check the strata sizes with this sort of program:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc freq data=have noprint;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; tables three*strata*variables / out=counts (keep=count rename=(count=n_observations));&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc freq data=counts;&lt;/P&gt;&lt;P&gt;&amp;nbsp; tables n_observations;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The final table would tell you how many strata combinations have 1 observation in the original data set, how many have 2 observations, etc.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good luck.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 07 Nov 2012 18:57:16 GMT</pubDate>
    <dc:creator>Astounding</dc:creator>
    <dc:date>2012-11-07T18:57:16Z</dc:date>
    <item>
      <title>Stratified Sampling based on multiple variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Stratified-Sampling-based-on-multiple-variables/m-p/119826#M6289</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;SPAN style="font-family: calibri, verdana, arial, sans-serif; font-size: 12pt;"&gt;Hi All - &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: calibri, verdana, arial, sans-serif; font-size: 12pt;"&gt;I have a dataset which contains account number, balance, limit and apr. I have to separate out 10% population from this dataset. This 10% population&amp;nbsp; should be a (stratified) random sample from the dataset&amp;nbsp; and also the distribution of balance,limit and apr between 10% popluation and remaining 90% population should be equal ( approximately equal) . &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: calibri, verdana, arial, sans-serif; font-size: 12pt;"&gt;I have used proc surveyselect procedure for sampling dataset based on one variable. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: calibri, verdana, arial, sans-serif; font-size: 12pt;"&gt;proc surveryselect data = dataset out=new_dsn samprate=.1 outall;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: calibri, verdana, arial, sans-serif; font-size: 12pt;"&gt;strata cust_flag;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: calibri, verdana, arial, sans-serif; font-size: 12pt;"&gt;run;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: calibri, verdana, arial, sans-serif; font-size: 12pt;"&gt;Can you some one help me how to do the samething for many variables.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: calibri, verdana, arial, sans-serif; font-size: 12pt;"&gt;Thanks&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: calibri, verdana, arial, sans-serif; font-size: 12pt;"&gt;Dhana&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 07 Nov 2012 17:51:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Stratified-Sampling-based-on-multiple-variables/m-p/119826#M6289</guid>
      <dc:creator>dhana</dc:creator>
      <dc:date>2012-11-07T17:51:58Z</dc:date>
    </item>
    <item>
      <title>Re: Stratified Sampling based on multiple variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Stratified-Sampling-based-on-multiple-variables/m-p/119827#M6290</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Why can't you add more variables to the strata statement?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;strata balance limit apr;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 07 Nov 2012 17:57:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Stratified-Sampling-based-on-multiple-variables/m-p/119827#M6290</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2012-11-07T17:57:36Z</dc:date>
    </item>
    <item>
      <title>Re: Stratified Sampling based on multiple variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Stratified-Sampling-based-on-multiple-variables/m-p/119828#M6291</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I tried to do the same , but instead of 10% I got 19% population. After seeing that I am little confused on how this proc works.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 07 Nov 2012 18:08:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Stratified-Sampling-based-on-multiple-variables/m-p/119828#M6291</guid>
      <dc:creator>dhana</dc:creator>
      <dc:date>2012-11-07T18:08:10Z</dc:date>
    </item>
    <item>
      <title>Re: Stratified Sampling based on multiple variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Stratified-Sampling-based-on-multiple-variables/m-p/119829#M6292</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I don't know how it works, but I do have a suspicion.&amp;nbsp; Perhaps the procedure requires every combination of strata variables to be represented in the sample.&amp;nbsp; If the number of observations fitting into a particular strata combination were 5, the software would still have to select one of them into the sample.&amp;nbsp; If that applied to every strata combination, you would end up with a 20% sample.&amp;nbsp; You could check the strata sizes with this sort of program:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc freq data=have noprint;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; tables three*strata*variables / out=counts (keep=count rename=(count=n_observations));&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc freq data=counts;&lt;/P&gt;&lt;P&gt;&amp;nbsp; tables n_observations;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The final table would tell you how many strata combinations have 1 observation in the original data set, how many have 2 observations, etc.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good luck.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 07 Nov 2012 18:57:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Stratified-Sampling-based-on-multiple-variables/m-p/119829#M6292</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2012-11-07T18:57:16Z</dc:date>
    </item>
    <item>
      <title>Re: Stratified Sampling based on multiple variables</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Stratified-Sampling-based-on-multiple-variables/m-p/119830#M6293</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Could you post the code that generated the 19% sample? I did some experimenting and I get 10% within each combination of strata variables but my trial data is probably too nice. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 07 Nov 2012 21:11:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Stratified-Sampling-based-on-multiple-variables/m-p/119830#M6293</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2012-11-07T21:11:03Z</dc:date>
    </item>
  </channel>
</rss>

