<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to randomly select X no of obs randonly from a dataset? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108529#M22561</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If you don't have sas/stat, you can also do it quite easily with proc sql.&amp;nbsp; e.g.:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql OUTOBS=5125 ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; create table abc as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; select A.*&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from sample as A&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; order by RANUNI(0)&lt;/P&gt;&lt;P&gt; ;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 26 Dec 2012 17:31:06 GMT</pubDate>
    <dc:creator>art297</dc:creator>
    <dc:date>2012-12-26T17:31:06Z</dc:date>
    <item>
      <title>How to randomly select X no of obs randonly from a dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108526#M22558</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a dataset of X no of observations and I want to select Y no of records&lt;STRONG&gt; randomly&lt;/STRONG&gt;. I know that firstobs,obs OR ranuni can be used but wondering if there is any other better way to do it.Because firstobs,obs doesn't give true random sample also ranuni requires manual input of cuts like random &amp;lt; 0.2 etc.,. In the below example I want to select 5125 records randomly from dataset sample.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;%Let obs = 5125;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Data abc;&lt;/P&gt;&lt;P&gt; Set sample(firstobs = 1 obs = &amp;amp;obs.);&lt;/P&gt;&lt;P&gt;Run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance for your help!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 26 Dec 2012 16:57:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108526#M22558</guid>
      <dc:creator>vicky07</dc:creator>
      <dc:date>2012-12-26T16:57:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to randomly select X no of obs randonly from a dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108527#M22559</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If you have SAS/STAT,&amp;nbsp; you may find proc surveyselect coming in handy:&lt;/P&gt;&lt;P&gt; for your case, if non-replacement :&lt;/P&gt;&lt;P&gt;proc surveyselect data=sample method=srs n=5125&lt;/P&gt;&lt;P&gt;&amp;nbsp; out=abc;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Haikuo &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 26 Dec 2012 17:13:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108527#M22559</guid>
      <dc:creator>Haikuo</dc:creator>
      <dc:date>2012-12-26T17:13:19Z</dc:date>
    </item>
    <item>
      <title>Re: How to randomly select X no of obs randonly from a dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108528#M22560</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;As was suggested, if you have PROC SURVEYSELECT available, that's what it was built for.&amp;nbsp; But if not, there are a few ways to go about it (all of which require entering no more than the number 5125).&amp;nbsp; Here are a few questions that would become important:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1. Is your data set large so that speed becomes important?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2. Do you need exactly 5,125 observations, or would approximately 5,125 be acceptable?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;3. Can the same observation be selected more than once?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;4. What should happen if the data set contains fewer than 5,125 observations?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Even if you need exactly 5,125 randomly selected unique observations, using one pass through the data, this can be done.&amp;nbsp; The programming would involve one short but somewhat complex DATA step.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good luck.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 26 Dec 2012 17:20:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108528#M22560</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2012-12-26T17:20:32Z</dc:date>
    </item>
    <item>
      <title>Re: How to randomly select X no of obs randonly from a dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108529#M22561</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;If you don't have sas/stat, you can also do it quite easily with proc sql.&amp;nbsp; e.g.:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql OUTOBS=5125 ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; create table abc as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; select A.*&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from sample as A&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; order by RANUNI(0)&lt;/P&gt;&lt;P&gt; ;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 26 Dec 2012 17:31:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108529#M22561</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2012-12-26T17:31:06Z</dc:date>
    </item>
    <item>
      <title>Re: How to randomly select X no of obs randonly from a dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108530#M22562</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;This is a common question. The most efficient method I have seen is done using a data step and uses the POINT= option on a set statement so that only the selected observations are read from the source dataset.&amp;nbsp; For an example of this code look at this posting from John Whittington in 1998.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://listserv.uga.edu/cgi-bin/wa?A2=ind9810A&amp;amp;L=sas-l&amp;amp;D=0&amp;amp;P=2569" title="SAS_L achives"&gt;http://listserv.uga.edu/cgi-bin/wa?A2=ind9810A&amp;amp;L=sas-l&amp;amp;D=0&amp;amp;P=2569&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 26 Dec 2012 18:13:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108530#M22562</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2012-12-26T18:13:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to randomly select X no of obs randonly from a dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108531#M22563</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;A __default_attr="2431" __jive_macro_name="user" class="jive_macro jive_macro_user" data-objecttype="3" href="https://communities.sas.com/"&gt;&lt;/A&gt;: I like John's approach but, from my limited tests, John's code appears to either introduce a bias of some kind or at least doesn't select the first N records based on the random seed used.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I don't recommend the 3rd approach, below, but included it to compare with the other two methods.&amp;nbsp; The 2nd and 3rd methods appear to consistently provide the same results regardless of seed or sample size specified.&amp;nbsp; The first method, though, appears to always deviate slightly:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;%let ssize=10;&lt;/P&gt;&lt;P&gt;%let seed=5;&lt;/P&gt;&lt;P&gt;data bigdata;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set sashelp.class;&lt;/P&gt;&lt;P&gt;&amp;nbsp; do _n_=1 to 10;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; recnum+1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; randnum=ranuni(&amp;amp;seed.);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;data sample (drop = k);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; k = &amp;amp;ssize. ;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; /* specify sample size required */&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if 0 then set bigdata nobs = n ; /*&amp;nbsp; get nobs, without reading anything */&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; do i = 1 to n while (k &amp;gt; 0) ;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if ranuni(&amp;amp;seed.) &amp;lt; k/n then do;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; k = k-1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set bigdata point = i ;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output ;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end ;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; n=n-1 ;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end ;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; stop ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; run ;&lt;/P&gt;&lt;P&gt;proc sql OUTOBS=&amp;amp;ssize. ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; create table sample2 as&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; select *&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from bigdata&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; order by RANUNI(&amp;amp;seed.)&lt;/P&gt;&lt;P&gt;&amp;nbsp; ;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;proc sort data=bigdata&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out=sample3;&lt;/P&gt;&lt;P&gt;&amp;nbsp; by randnum;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;data sample3;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set sample3;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if _n_ le &amp;amp;ssize.;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 26 Dec 2012 20:21:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108531#M22563</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2012-12-26T20:21:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to randomly select X no of obs randonly from a dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108532#M22564</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You will get a different list of observations using the K/N method versus the assign a random number and then sort method.&amp;nbsp; But it has been proven mathematically that each observation will have the same probability of selection.&amp;nbsp; &lt;/P&gt;&lt;P&gt;The big advantage is when N is large relative to K as the POINT= operation will only load the needed observations. The other methods require reading all of the observations off of the disk at least once.&amp;nbsp; I have used datasets with millions of observations that take hours to process sequentially, but I can take a sample of a couple of thousand observations in seconds.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 26 Dec 2012 22:23:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108532#M22564</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2012-12-26T22:23:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to randomly select X no of obs randonly from a dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108533#M22565</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;A __default_attr="2431" __jive_macro_name="user" class="jive_macro jive_macro_user" data-objecttype="3" href="https://communities.sas.com/"&gt;&lt;/A&gt;: I ran some further tests and, while I've never seen the proof you mentioned, my own tests weren't able to identify any specific bias.&amp;nbsp; John's method definitely runs faster than any of the other methods.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 26 Dec 2012 23:29:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108533#M22565</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2012-12-26T23:29:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to randomly select X no of obs randonly from a dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108534#M22566</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;See a proof reposted in 2001 by Dr. John W. : &lt;A href="http://listserv.uga.edu/cgi-bin/wa?A2=ind0105B&amp;amp;L=sas-l&amp;amp;P=R20114&amp;amp;D=0"&gt;http://listserv.uga.edu/cgi-bin/wa?A2=ind0105B&amp;amp;L=sas-l&amp;amp;P=R20114&amp;amp;D=0&lt;/A&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 26 Dec 2012 23:51:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108534#M22566</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2012-12-26T23:51:42Z</dc:date>
    </item>
    <item>
      <title>Re: How to randomly select X no of obs randonly from a dataset?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108535#M22567</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you all for your reply. I have a 3 million dataset and&amp;nbsp; tested all the 3 methods listed by Arthur and K/N method runs much faster compared the other two.&amp;nbsp; Thanks again!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 27 Dec 2012 04:22:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-randomly-select-X-no-of-obs-randonly-from-a-dataset/m-p/108535#M22567</guid>
      <dc:creator>vicky07</dc:creator>
      <dc:date>2012-12-27T04:22:42Z</dc:date>
    </item>
  </channel>
</rss>

