<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic selecting sample based on probability with surveyselect in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/selecting-sample-based-on-probability-with-surveyselect/m-p/821785#M40658</link>
    <description>&lt;P&gt;I'm trying to generate some bootstrapped samples and hoping to confirm whether the approach I've taken is sound. The idea is to have 250 samples that consist of 169,301 randomly selected rows, with selection based on a variable that represents the rate of an event occurring in the real world (the variable death_rate in the code below, which ranges from 0.0003 to 0.15). The higher the value of this variable the more likely selection will be, and selection will occur from the dataset of about 15 million rows until 169,301 rows are selected (with no row selected more than once). However, I'm unsure if I've interpreted the documentation correctly to achieve this. Currently my code looks like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc surveyselect data=merged_data out=BootSamples noprint seed=123 sampsize=169301 out=OUTHITS&lt;BR /&gt;method=PPS&lt;BR /&gt;reps=250;&lt;BR /&gt;Size death_rate;&lt;BR /&gt;run;&lt;/P&gt;</description>
    <pubDate>Wed, 06 Jul 2022 07:48:43 GMT</pubDate>
    <dc:creator>sas1990</dc:creator>
    <dc:date>2022-07-06T07:48:43Z</dc:date>
    <item>
      <title>selecting sample based on probability with surveyselect</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/selecting-sample-based-on-probability-with-surveyselect/m-p/821785#M40658</link>
      <description>&lt;P&gt;I'm trying to generate some bootstrapped samples and hoping to confirm whether the approach I've taken is sound. The idea is to have 250 samples that consist of 169,301 randomly selected rows, with selection based on a variable that represents the rate of an event occurring in the real world (the variable death_rate in the code below, which ranges from 0.0003 to 0.15). The higher the value of this variable the more likely selection will be, and selection will occur from the dataset of about 15 million rows until 169,301 rows are selected (with no row selected more than once). However, I'm unsure if I've interpreted the documentation correctly to achieve this. Currently my code looks like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc surveyselect data=merged_data out=BootSamples noprint seed=123 sampsize=169301 out=OUTHITS&lt;BR /&gt;method=PPS&lt;BR /&gt;reps=250;&lt;BR /&gt;Size death_rate;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Wed, 06 Jul 2022 07:48:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/selecting-sample-based-on-probability-with-surveyselect/m-p/821785#M40658</guid>
      <dc:creator>sas1990</dc:creator>
      <dc:date>2022-07-06T07:48:43Z</dc:date>
    </item>
    <item>
      <title>Re: selecting sample based on probability with surveyselect</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/selecting-sample-based-on-probability-with-surveyselect/m-p/821845#M40661</link>
      <description>&lt;P&gt;Typically, bootstrap samples are obtained by sampling with replacement, so perhaps you want to use the PPS_WR option? See&amp;nbsp;&lt;A href="https://blogs.sas.com/content/iml/2016/02/10/sample-with-replacement-and-unequal-probability-in-sas.html" target="_blank"&gt;https://blogs.sas.com/content/iml/2016/02/10/sample-with-replacement-and-unequal-probability-in-sas.html&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also, you are specifying two OUT= data sets. I suspect you meant to use the OUTHITS option in the second case.&lt;/P&gt;
&lt;P&gt;I created some small data so you can inspect the results. I hope the following answers your question or at least points you in the correct direction:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data merged_data;
input x death_rate;
datalines;
1 .1
2 .2
3 .3
4 .4
5 .5
6 .6
8 .8
;

%let SampSize=100;  /* 169301 */
proc surveyselect data=merged_data out=BootSamples noprint seed=123 
     sampsize=&amp;amp;SampSize OUTHITS
     method=PPS_WR
     reps=250;
Size death_rate;
run;

/* are the relative frequencies correct? */
proc freq data=BootSamples;
tables x;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 06 Jul 2022 14:00:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/selecting-sample-based-on-probability-with-surveyselect/m-p/821845#M40661</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2022-07-06T14:00:32Z</dc:date>
    </item>
    <item>
      <title>Re: selecting sample based on probability with surveyselect</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/selecting-sample-based-on-probability-with-surveyselect/m-p/821959#M40668</link>
      <description>Thanks Rick! Seems to work much like intended</description>
      <pubDate>Thu, 07 Jul 2022 02:20:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/selecting-sample-based-on-probability-with-surveyselect/m-p/821959#M40668</guid>
      <dc:creator>sas1990</dc:creator>
      <dc:date>2022-07-07T02:20:39Z</dc:date>
    </item>
  </channel>
</rss>

