<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Extracting subset of observations without merging in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247348#M6549</link>
    <description>&lt;P&gt;Sounds like the ideal candidate for a hash table solution:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data Want;
  attrib ID length = $8; * Define key variable.;
  drop rc;
* Load the small dataset as a hash table.;
  if _n_ = 1 then do;
    declare hash h(dataset: "WORK.Small");
    h.defineKey("ID");
    h.defineDone();
    call missing(ID);
  end;
* Read in large table;
	set Large;  
  rc = h.find(); * Search in data for match.; 
  if rc = 0; * Keep only rows where ID matches.;  
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 02 Feb 2016 01:27:56 GMT</pubDate>
    <dc:creator>SASKiwi</dc:creator>
    <dc:date>2016-02-02T01:27:56Z</dc:date>
    <item>
      <title>Extracting subset of observations without merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247346#M6548</link>
      <description>&lt;P&gt;Hi, so I have two SAS datasets, one of which is approximately 780GB and the other with approximately 5700 observations. In the smaller dataset, the 5700 observations have a unique identifier also present in the 780GB dataset. &amp;nbsp;I was wondering if there was a way to extract all the observations in the large dataset&amp;nbsp;that match the identifiers from the smaller dataset (there are multiple observations for each identifier in this dataset). &amp;nbsp;I have tried merging the two using an inner join; however, it took SAS over 72 hours to complete this. &amp;nbsp;Is there a way to reference the variable in the smaller dataset in a where statement or any other way to trim down the length of time it takes to perform this function? Thank you in advance.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Feb 2016 00:00:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247346#M6548</guid>
      <dc:creator>Timmay</dc:creator>
      <dc:date>2016-02-02T00:00:30Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting subset of observations without merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247348#M6549</link>
      <description>&lt;P&gt;Sounds like the ideal candidate for a hash table solution:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data Want;
  attrib ID length = $8; * Define key variable.;
  drop rc;
* Load the small dataset as a hash table.;
  if _n_ = 1 then do;
    declare hash h(dataset: "WORK.Small");
    h.defineKey("ID");
    h.defineDone();
    call missing(ID);
  end;
* Read in large table;
	set Large;  
  rc = h.find(); * Search in data for match.; 
  if rc = 0; * Keep only rows where ID matches.;  
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Feb 2016 01:27:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247348#M6549</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2016-02-02T01:27:56Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting subset of observations without merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247373#M6550</link>
      <description>If this type of query will occur frequently it would be wise to index on the id column, and use SQL join. This will prevent a full table scan and sorting.</description>
      <pubDate>Tue, 02 Feb 2016 05:45:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247373#M6550</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-02-02T05:45:49Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting subset of observations without merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247387#M6551</link>
      <description>&lt;P&gt;With 5700 obs in the smaller dataset it should be possible to create a format that yields "yes" for any of the 5700 identifiers, and "no" for all other values.&lt;/P&gt;
&lt;P&gt;Then you only need to do a single sequential pass through the large dataset and look if put(identifier,format.) = "yes".&lt;/P&gt;</description>
      <pubDate>Tue, 02 Feb 2016 09:29:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247387#M6551</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-02-02T09:29:31Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting subset of observations without merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247395#M6552</link>
      <description>&lt;P&gt;Would this not also yeild one pass through the data&amp;amp;colon;&lt;/P&gt;
&lt;PRE&gt;data _null_;
  set small_dataset end=last;
  if _n_=1 then call execute('data want;  set big_dataset; select(id);');
  call execute(' when ('||strip(id)||') output;');
  if last then call execute(' otherwise; end; run;');
run;&lt;/PRE&gt;
&lt;P&gt;I.e. a big datastep is generated from the small one, that code is then run on the dataset once. &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Feb 2016 10:12:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247395#M6552</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2016-02-02T10:12:09Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting subset of observations without merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247408#M6553</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/45151"&gt;@RW9&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;Would this not also yeild one pass through the data&amp;amp;colon;&lt;/P&gt;
&lt;PRE&gt;data _null_;
  set small_dataset end=last;
  if _n_=1 then call execute('data want;  set big_dataset; select(id);');
  call execute(' when ('||strip(id)||') output;');
  if last then call execute(' otherwise; end; run;');
run;&lt;/PRE&gt;
&lt;P&gt;I.e. a big datastep is generated from the small one, that code is then run on the dataset once. &amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Nice. Still another method to avoid the merge.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Feb 2016 11:22:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247408#M6553</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-02-02T11:22:49Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting subset of observations without merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247568#M6554</link>
      <description>&lt;P&gt;Love the way there are so many methods for processing big data in SAS. It would be interesting to see some of these compared.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Feb 2016 18:54:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247568#M6554</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2016-02-02T18:54:23Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting subset of observations without merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247667#M6558</link>
      <description>&lt;P&gt;I guess that anything that allows you to only perform one sequential read through a big dataset will only be bottlenecked by the I/O throughput, therefore being equal in terms of performance. What then guides the decision will be personal programming preferences and which method is most easily unterstood by the next one to maintain the code.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 07:37:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247667#M6558</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-02-03T07:37:02Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting subset of observations without merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247681#M6559</link>
      <description>&lt;P&gt;Just to note, in my opinion from:&lt;/P&gt;
&lt;P&gt;"&lt;SPAN&gt;What then guides the decision will be personal programming preferences and which method is most easily unterstood by the next one to maintain the code."&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;The personal programming preference is very low on the list. &amp;nbsp;In terms of importance:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;1) Documentation - Functional Design Spec, Testing document, User Guides etc.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;2) Good programming practice&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;3) Company standards&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;4) Personal preference&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;A programmers afterlife consists of meeting all those programmers who have had to use your code after you, be prepared &lt;span class="lia-unicode-emoji" title=":monkey_face:"&gt;🐵&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 03 Feb 2016 09:32:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/247681#M6559</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2016-02-03T09:32:08Z</dc:date>
    </item>
    <item>
      <title>Re: Extracting subset of observations without merging</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/248125#M6581</link>
      <description>&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/45151"&gt;@RW9﻿&lt;/a&gt;&amp;nbsp;said:&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;A programmers afterlife consists of meeting all those programmers who have had to use your code after you, be prepared &lt;IMG class="emoticon emoticon-smileysurprised" src="https://communities.sas.com/i/smilies/16x16_smiley-surprised.png" border="0" alt="Smiley Surprised" title="Smiley Surprised" /&gt;)&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Ahahaha, I am laughing so hard I am crying. Or maybe I am just crying... The struggle&amp;nbsp;is real.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Feb 2016 22:55:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/Extracting-subset-of-observations-without-merging/m-p/248125#M6581</guid>
      <dc:creator>Michelle</dc:creator>
      <dc:date>2016-02-04T22:55:28Z</dc:date>
    </item>
  </channel>
</rss>

