<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sample Selection in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289344#M59757</link>
    <description>Hi Ballardw....what I mean by "True representation" those product ID with a higher customer volume should have a higher probability (maybe proportionate to total volume) of being selected into the sample. For example, if there were only 2 product ID's, Product_ID1 had 100 sales and Product_ID2 had 200 sales, and if I wanted to select a sample size of 50, then I would want 1/3 of the sample (100/300) to be selected from Product_ID1 AND 2/3 of the sample (200/300)to be selected from Product_ID2. Hope this helps.</description>
    <pubDate>Wed, 03 Aug 2016 19:25:11 GMT</pubDate>
    <dc:creator>twildone</dc:creator>
    <dc:date>2016-08-03T19:25:11Z</dc:date>
    <item>
      <title>Sample Selection</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289273#M59742</link>
      <description>&lt;P&gt;Hi....I would like to select a sample of records from a dataset and the total number of records in the dataset can vary and is not fixed. I want to select a sample in such a way that the sample is a true representation of all records when it comes to frequency (number of records) of the Product ID. Sales by differnt Product ID will vary. Do I need to cluster the datewr first and then sample within each cluster. Any suggestions would be greatly appreciated.....Thanks&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 16:14:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289273#M59742</guid>
      <dc:creator>twildone</dc:creator>
      <dc:date>2016-08-03T16:14:43Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Selection</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289277#M59744</link>
      <description>&lt;P&gt;If you have SAS/STAT, then check Proc Surveyselect,&amp;nbsp;with 'strata' option.&lt;/P&gt;
&lt;P&gt;If you don't, there are still ways of doing the random sampling, let us know if that is the case, then someone from this forum can start working on it.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 16:28:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289277#M59744</guid>
      <dc:creator>Haikuo</dc:creator>
      <dc:date>2016-08-03T16:28:04Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Selection</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289287#M59746</link>
      <description>Data whatever;&lt;BR /&gt;set whatever;&lt;BR /&gt;ran = uniform(0);&lt;BR /&gt;if ran &amp;lt;= .00; /* percent you would like to sample */</description>
      <pubDate>Wed, 03 Aug 2016 17:04:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289287#M59746</guid>
      <dc:creator>TMiles</dc:creator>
      <dc:date>2016-08-03T17:04:59Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Selection</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289299#M59751</link>
      <description>&lt;P&gt;You may need to clarify what you mean when you say "a true representation of all records when it comes to frequency (number of records) of the Product ID".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do you mean that any abritrary subgroup, say on geography or customer volumn the percent of product ID will be exactly the same as for the sample overall?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;How are you using the idea of "cluster" for this project?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Clusters are not normally created/assigned as such a way to gaurantee the same distribution of values as the whole sample but are identified as something that contributes to overall variability in the sample. Think of study involving school students from many schools. A natural cluster would be the school as generally there are differences in makeup of student bodies from school to school.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 17:28:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289299#M59751</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-08-03T17:28:53Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Selection</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289344#M59757</link>
      <description>Hi Ballardw....what I mean by "True representation" those product ID with a higher customer volume should have a higher probability (maybe proportionate to total volume) of being selected into the sample. For example, if there were only 2 product ID's, Product_ID1 had 100 sales and Product_ID2 had 200 sales, and if I wanted to select a sample size of 50, then I would want 1/3 of the sample (100/300) to be selected from Product_ID1 AND 2/3 of the sample (200/300)to be selected from Product_ID2. Hope this helps.</description>
      <pubDate>Wed, 03 Aug 2016 19:25:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289344#M59757</guid>
      <dc:creator>twildone</dc:creator>
      <dc:date>2016-08-03T19:25:11Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Selection</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289357#M59758</link>
      <description>&lt;P&gt;Assuming the data is such that you have Product_Id and the Product_Id1 and _Id2 your mention are different values then you could use Product_Id as a Strata variable and specify the sample size&amp;nbsp; for each level of the variable. If you have more than a few products involved then you would likely want to use a SAMPSIZE data set. Look at the documentation for the contents as it needs specifically named variables to work properly.&lt;/P&gt;
&lt;P&gt;You could easily make&amp;nbsp;a sampsize dataset by using proc freq to generate a data set with the product_id values and overall percentage or Proc SQL.&amp;nbsp;Multiply that percentage by the desired overall sample size to get the number for each product. The number would have to be rounded to an integer.&lt;/P&gt;
&lt;P&gt;To use STRATA the input data set would need to be sorted by the strata varaible. The sampsize dataset would need to match all values of the strata variable, attributes such as length and type and sorted in the correct order.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Aug 2016 20:06:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289357#M59758</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-08-03T20:06:41Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Selection</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289598#M59821</link>
      <description>&lt;P&gt;Hi Ballardw....I tried your your suggestions and it did work. The thing that I would also like to do is to have ID_Number appear only once in the overall sample selected without any duplicates of the ID_Number.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE width="101"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD width="45"&gt;Group&lt;/TD&gt;
&lt;TD width="56"&gt;_NSIZE_&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;1&lt;/TD&gt;
&lt;TD&gt;2&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;2&lt;/TD&gt;
&lt;TD&gt;3&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;3&lt;/TD&gt;
&lt;TD&gt;9&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;4&lt;/TD&gt;
&lt;TD&gt;21&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;5&lt;/TD&gt;
&lt;TD&gt;10&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;6&lt;/TD&gt;
&lt;TD&gt;25&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;PROC&lt;/STRONG&gt;&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;SORT&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DATA&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;=SUMMARY94;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;BY&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; GROUP;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;RUN&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;PROC&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;SURVEYSELECT&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DATA&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;=SUMMARY94 &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;N&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;=SUMMARY99C &lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;OUT&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;=hsbs3;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;STRATA&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; GROUP;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;RUN&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Aug 2016 17:42:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289598#M59821</guid>
      <dc:creator>twildone</dc:creator>
      <dc:date>2016-08-04T17:42:56Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Selection</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289672#M59836</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4061"&gt;@twildone&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;Hi Ballardw....I tried your your suggestions and it did work. The thing that I would also like to do is to have ID_Number appear only once in the overall sample selected without any duplicates of the ID_Number.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I think that using your Id_number as a SAMPLINGUNIT variable may help if the id_number does not exist in different groups.&lt;/P&gt;
&lt;P&gt;Another option if you are seeing many of these is to slightly over sample and discard "extras". You haven't said wether the selection probabilities are import for this.&lt;/P&gt;
&lt;P&gt;There may be some options in SAS 9.4 that allow more fiddling with selections but I don't have 9.4 for testing behavior.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Aug 2016 21:52:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289672#M59836</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-08-04T21:52:38Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Selection</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289782#M59872</link>
      <description>&lt;P&gt;Hi...I did try using the sampling unit statement and did get unusual results.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;PROC&lt;/STRONG&gt;&lt;/FONT&gt; &lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;SURVEYSELECT&lt;/FONT&gt;&lt;/STRONG&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DATA&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;=SUMMARY94 &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;N&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;=SUMMARY98 &lt;/FONT&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;OUT&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;=hsbs3;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;SAMPLINGUNIT ID_NUMBER;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;STRATA&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt; PRECLUS;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;RUN&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Aug 2016 12:51:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289782#M59872</guid>
      <dc:creator>twildone</dc:creator>
      <dc:date>2016-08-05T12:51:28Z</dc:date>
    </item>
    <item>
      <title>Re: Sample Selection</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289849#M59891</link>
      <description>&lt;P&gt;I thought about that and remembered it's likely to duplicate the Id. I was kind of hoping that if the duplicated id values were a small percentage overall it might allow skipping them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are you needing the probability or selection / weighting information? IF not you may be able to oversample by a bit and remove duplicate ids.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Another option is to decide which is more important, no duplicate Id or "exact" match of distribution. Perhaps create set with single values for the id and see if the overall distribution remains similar. If so then select a sample from that subset.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Aug 2016 17:27:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sample-Selection/m-p/289849#M59891</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2016-08-05T17:27:46Z</dc:date>
    </item>
  </channel>
</rss>

