<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Select Subset of a Group and Return All Rows in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854477#M337691</link>
    <description>&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt; Thanks a lot. This is exactly what I was looking for!!</description>
    <pubDate>Thu, 19 Jan 2023 01:56:00 GMT</pubDate>
    <dc:creator>sasmaverick</dc:creator>
    <dc:date>2023-01-19T01:56:00Z</dc:date>
    <item>
      <title>Select Subset of a Group and Return All Rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854237#M337595</link>
      <description>&lt;P&gt;I have a dataset with 10,000 rows and 1000 unique account ids. I need to select 300 random account ids and return all rows for those 300 ids (without specifying the ids in a where clause). How do I do it? Please see the below example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;Input dataset&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;account_id Amount&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;ABC&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 100&lt;/P&gt;
&lt;P&gt;ABC&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 150&lt;/P&gt;
&lt;P&gt;ABC&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 200&lt;/P&gt;
&lt;P&gt;DEF&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 90&lt;/P&gt;
&lt;P&gt;DEF&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; 80&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;Output dataset&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;e.g., If I select 1 random account id from the above data, I expect the following result:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;account_id Amount&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;DEF&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;90&lt;/P&gt;
&lt;P&gt;DEF&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;80&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;Thanks a bunch for the help!&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jan 2023 23:59:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854237#M337595</guid>
      <dc:creator>sasmaverick</dc:creator>
      <dc:date>2023-01-17T23:59:49Z</dc:date>
    </item>
    <item>
      <title>Re: Select Subset of a Group and Return All Rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854238#M337596</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/27005"&gt;@sasmaverick&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I have a dataset with 10,000 rows and 1000 unique account ids. I need to select 300 random account ids and return all rows for those 300 ids &lt;STRONG&gt;(without specifying the ids in a where clause). How do I do it? Please see the below example:&lt;/STRONG&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;What do you mean without specifying the ids in a where clause? A join is ok?&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jan 2023 00:01:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854238#M337596</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2023-01-18T00:01:17Z</dc:date>
    </item>
    <item>
      <title>Re: Select Subset of a Group and Return All Rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854239#M337597</link>
      <description>&lt;P&gt;I mean is there a way to do it in a single step (like proc surveyselect). I can surely think of doing it in multiple steps.&lt;/P&gt;
&lt;P&gt;1. Select unique IDS from dataset A into dataset B&lt;/P&gt;
&lt;P&gt;2. Pick random sample (x%) from dataset B into dataset C&lt;/P&gt;
&lt;P&gt;3. Inner join dataset C to dataset A&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jan 2023 00:24:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854239#M337597</guid>
      <dc:creator>sasmaverick</dc:creator>
      <dc:date>2023-01-18T00:24:24Z</dc:date>
    </item>
    <item>
      <title>Re: Select Subset of a Group and Return All Rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854250#M337600</link>
      <description>&lt;P&gt;I can think of one way to cover all of it in a single data step. But it would include a hash table for the unique id's (=still two passes through the data) and code would be rather complicated. What I have in mind would be an extension of method 3 from &lt;A href="https://support.sas.com/kb/24/722.html" target="_self"&gt;Sample 24722: Simple random sample without replacement&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But.... code would become so more complicated to understand as compared to doing this via multiple steps so really not worth it. You've got only 100000 source rows so also performance can't be a reason to go for a single data step.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Update: Just for fun below a single data step approach.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  do i=1 to 10000;
    id=rand('integer',1,1000);
    output;
  end;
run;

%let n_distinct_id=300;
data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have', multidata:'n');
      h1.defineKey('id');
      h1.defineDone();

      /* Initialize _K to the number of sample obs needed and _N to the */
      /*  total number of obs in the data set.                        */
      retain _k &amp;amp;n_distinct_id _n;
      _n=h1.num_items;
    end;

    set have; 

    if h1.check()=0 then
      do;

        /* To randomly select the first observation for the sample, use the */
        /* fact that each obs in the data set has an equal chance of being  */
        /* selected: _k/_n. If a random number between 0 and 1 is less than   */
        /* or equal to _k/_n, we select that the first obs for our sample     */
        /* and also adjust _k and the number of obs needed to complete the   */
        /* sample.                                                          */

         if ranuni(1230498) &amp;lt;= _k/_n then
          do;
            _rc=h1.remove();
            _k=_k-1;
          end;

        /* At every iteration, adjust _N, the number of obs left to */
        /* sample from.                                            */
        _n=_n-1;

      end;

    if h1.check() ne 0 then output;

run;

proc sql;
  select count(distinct id) as cnt_dist_id, count(*) as n_obs from want;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Patrick_0-1674016063351.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/79472i20703D29DECEC680/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Patrick_0-1674016063351.png" alt="Patrick_0-1674016063351.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Jan 2023 04:32:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854250#M337600</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2023-01-18T04:32:19Z</dc:date>
    </item>
    <item>
      <title>Re: Select Subset of a Group and Return All Rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854278#M337613</link>
      <description>&lt;P&gt;Here is a solution in a single data step. It assumes that the number of unique IDs is not known when you start, and that the input data is sorted by ID:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  retain _noofIDs 0 _IDsToExtract 300;
  /* count the IDs */
  if _N_=1 then do until(done);
    set have(keep=id) end=done;
    by id;
    _noofIDs+first.id;
    end;
  /* are we going to extract this ID? */
  _Extract=(_IDsToExtract)/(_noofIDs-_N_+1)&amp;gt;rand('UNIFORM');
  do until(last.id);
    set have;
    by id;
    if _Extract then output;
    end;
  _IDsToExtract+(-_Extract);
  if _IDsToExtract=0 then stop;
  drop _:;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 18 Jan 2023 07:59:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854278#M337613</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2023-01-18T07:59:37Z</dc:date>
    </item>
    <item>
      <title>Re: Select Subset of a Group and Return All Rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854302#M337620</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data Sample;    
set Sashelp.Cars(keep=Type MPG_City);
run;


%let n=1 ;

proc surveyselect data=Sample out=want noprint seed=12345  sampsize=&amp;amp;n.;   
cluster type; 
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 18 Jan 2023 11:45:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854302#M337620</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2023-01-18T11:45:11Z</dc:date>
    </item>
    <item>
      <title>Re: Select Subset of a Group and Return All Rows</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854477#M337691</link>
      <description>&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt; Thanks a lot. This is exactly what I was looking for!!</description>
      <pubDate>Thu, 19 Jan 2023 01:56:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-Subset-of-a-Group-and-Return-All-Rows/m-p/854477#M337691</guid>
      <dc:creator>sasmaverick</dc:creator>
      <dc:date>2023-01-19T01:56:00Z</dc:date>
    </item>
  </channel>
</rss>

