<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Is there a faster way to get a random stratified sample? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782236#M249379</link>
    <description>&lt;P&gt;Doesn't sound like it should be taking that long.&amp;nbsp; But if the size of the dataset (mainly number of variables) is an issue then perhaps run the SURVEYSELECT on a smaller dataset.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data rows;
  set bigdatset(keep=year);
  point=_n_;
run;
proc surveyselect data=rows out=sample_rows sampsize=1000;
  strata year;
run;
data sampledataset ;
  set sample_rows(keep=point);
  set bigdatset point=point;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Wed, 24 Nov 2021 15:18:21 GMT</pubDate>
    <dc:creator>Tom</dc:creator>
    <dc:date>2021-11-24T15:18:21Z</dc:date>
    <item>
      <title>Is there a faster way to get a random stratified sample?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782055#M249292</link>
      <description>&lt;P&gt;I'm working on a dataset that has over 7 million records and a few hundred variables. I want a small sample, around 1,000 will do, to mess around with graphs and tables and things so every experiment doesn't have to be run on the large data set. Random records are fine, but I do need a few from each year I have data for (55 years). Here's what I have, but it's been running all day and I'm wondering if there is a more efficient way to do this, or if this will even work.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc surveyselect data=bigdataset out=sampledataset sampsize=1000;&lt;/P&gt;&lt;P&gt;strata year;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you!&lt;/P&gt;</description>
      <pubDate>Tue, 23 Nov 2021 19:40:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782055#M249292</guid>
      <dc:creator>smg3141</dc:creator>
      <dc:date>2021-11-23T19:40:28Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a faster way to get a random stratified sample?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782057#M249294</link>
      <description>What are your computer specs? I would expect that to run in under 15 minutes typically on a standard desktop with 8GB of RAM. &lt;BR /&gt;Is the data sorted by year ahead of time?&lt;BR /&gt;&lt;BR /&gt;Are you familiar with the option obs= option in SAS to help you test programs? Though I suspect you need a random sample to be able to test it all since it has multiple years. &lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 23 Nov 2021 19:47:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782057#M249294</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-11-23T19:47:20Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a faster way to get a random stratified sample?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782058#M249295</link>
      <description>Yes, the data is sorted by year. My computer is good, 16GB of RAM. Everything is being done over our network though, so maybe that's the reason it's so slow.</description>
      <pubDate>Tue, 23 Nov 2021 19:53:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782058#M249295</guid>
      <dc:creator>smg3141</dc:creator>
      <dc:date>2021-11-23T19:53:52Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a faster way to get a random stratified sample?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782060#M249296</link>
      <description>&lt;P&gt;Try to load the file into RAM first using SASFILE if you have enough then to make it super fast. But check with your administrator first - are you using a SAS Server? It's possible they've limited resources as this isn't a data intensive SURVEYSELECT so any other solution I suspect will be less efficient though it's trivial to do manually.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://documentation.sas.com/doc/en/vdmmlcdc/1.0/lestmtsref/n0osyhi338pfaan1plin9ioilduk.htm" target="_blank"&gt;https://documentation.sas.com/doc/en/vdmmlcdc/1.0/lestmtsref/n0osyhi338pfaan1plin9ioilduk.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Nov 2021 20:07:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782060#M249296</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2021-11-23T20:07:30Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a faster way to get a random stratified sample?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782080#M249304</link>
      <description>&lt;P&gt;Agree with &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;.&amp;nbsp; "Running all day" is surprising for this amount of data.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are you using local PC SAS, with data stored on a network?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would try something like:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  set mylib.bigdata(obs=10000); *increase obs and see how it slows down;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;And keep increasing the value for obs unless/until it becomes painful.&amp;nbsp; If you can pull the whole 7M record down to your work library in a reasonable amount of time, maybe do that and run SURVEYSELECT against the local dataset.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But this seems suspicious.&amp;nbsp; You might want to have a chat with your local support about network speeds.&amp;nbsp; If you're at home on WiFi, try plugging into the router.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Nov 2021 21:32:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782080#M249304</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2021-11-23T21:32:25Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a faster way to get a random stratified sample?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782141#M249323</link>
      <description>&lt;P&gt;Your proc surveyselect will select 1000 obs per year. I am not a stats-guy, so i can't suggest a fix for the code.&lt;/P&gt;
&lt;P&gt;Maybe posting proc contents of the dataset is helpful, so that we actually know how big "bigdataset" is.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Nov 2021 06:46:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782141#M249323</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2021-11-24T06:46:58Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a faster way to get a random stratified sample?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782193#M249357</link>
      <description>&lt;P&gt;I would doubt that PROC SUVERYSELECT running so slowly ,&lt;/P&gt;
&lt;P&gt;Maybe it is due to you have hundreds of variables in big table .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
call streaminit(123456789);
do year=1 to 55;
  do i=1 to 1E6;
      var=rand('uniform');
      output;
  end;
end;
drop i ;
run;







data temp;
 set have;
 id+1;
run;
proc surveyselect noprint data=temp seed=123 out=key(keep=id) sampsize=1000;
strata year;
run;
data want;
 if _n_=1 then do;
  if 0 then set key;
  declare hash h(dataset:'key',hashexp:20);
  h.definekey('id');
  h.definedone();
 end;
 set temp;
 if h.check()=0;
 drop id;
 run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 24 Nov 2021 11:50:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782193#M249357</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2021-11-24T11:50:52Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a faster way to get a random stratified sample?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782212#M249371</link>
      <description>Or &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt; maybe have a good idea.</description>
      <pubDate>Wed, 24 Nov 2021 12:48:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782212#M249371</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2021-11-24T12:48:21Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a faster way to get a random stratified sample?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782236#M249379</link>
      <description>&lt;P&gt;Doesn't sound like it should be taking that long.&amp;nbsp; But if the size of the dataset (mainly number of variables) is an issue then perhaps run the SURVEYSELECT on a smaller dataset.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data rows;
  set bigdatset(keep=year);
  point=_n_;
run;
proc surveyselect data=rows out=sample_rows sampsize=1000;
  strata year;
run;
data sampledataset ;
  set sample_rows(keep=point);
  set bigdatset point=point;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 24 Nov 2021 15:18:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782236#M249379</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2021-11-24T15:18:21Z</dc:date>
    </item>
    <item>
      <title>Re: Is there a faster way to get a random stratified sample?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782428#M249425</link>
      <description>Tom,&lt;BR /&gt;This code is awesome !</description>
      <pubDate>Thu, 25 Nov 2021 12:24:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Is-there-a-faster-way-to-get-a-random-stratified-sample/m-p/782428#M249425</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2021-11-25T12:24:22Z</dc:date>
    </item>
  </channel>
</rss>

