<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Efficient ways to extract data in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Efficient-ways-to-extract-data/m-p/766538#M80993</link>
    <description>&lt;P&gt;My code will include ALL variables from the big dataset B and filter on ID's found in dataset A.&lt;/P&gt;
&lt;P&gt;the DEFINEDATA() method is used to retrieve values from the hash table to the PDV, in which case you also must use FIND() instead of CHECK().&lt;/P&gt;
&lt;P&gt;The code you posted would only be valid if dataset A contained the variables b and c, and you wanted those variables added to those retrieved from dataset B; you must also define those variables in the PDV (usually done with a LENGTH statement); additionally, replace CHECK() with FIND().&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So please be more specific: which variables are contained in which dataset?&lt;/P&gt;</description>
    <pubDate>Wed, 08 Sep 2021 10:01:48 GMT</pubDate>
    <dc:creator>Kurt_Bremser</dc:creator>
    <dc:date>2021-09-08T10:01:48Z</dc:date>
    <item>
      <title>Efficient ways to extract data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Efficient-ways-to-extract-data/m-p/766527#M80990</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a dataset A that contains about 25 000 records&lt;/P&gt;
&lt;P&gt;I have a dataset B that contains about 14 million records&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I want some data in dataset B, where A.ID = B.ID, however it takes ages to pull the data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So far I am using/tried&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1. Proc sql :&amp;nbsp;where B.ID in(select ID from A)&lt;/P&gt;
&lt;P&gt;(where A only contains distinct numbers of ID)&lt;/P&gt;
&lt;P&gt;2.A macro where the smallest A.ID is written and I pull the B.IDs that are greater than that one into a temp file and using that for extraction.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I wish there were a way to write all the A.IDs to a macro and use that, as it looks like my SAS server is having trouble with two tables at the same time&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2021 09:06:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Efficient-ways-to-extract-data/m-p/766527#M80990</guid>
      <dc:creator>Kiteulf</dc:creator>
      <dc:date>2021-09-08T09:06:14Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient ways to extract data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Efficient-ways-to-extract-data/m-p/766534#M80991</link>
      <description>&lt;P&gt;This is THE task for which you use a hash object:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
set b; /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;No sorting is done (except in-memory for the small table), and the hash object discards duplicates in A by default.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2021 09:25:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Efficient-ways-to-extract-data/m-p/766534#M80991</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-09-08T09:25:39Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient ways to extract data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Efficient-ways-to-extract-data/m-p/766535#M80992</link>
      <description>&lt;P&gt;Could I use : to choose the column b and c from the big dataset?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
set b; /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedata ("id", "b", "c") ; 
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2021 09:47:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Efficient-ways-to-extract-data/m-p/766535#M80992</guid>
      <dc:creator>Kiteulf</dc:creator>
      <dc:date>2021-09-08T09:47:13Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient ways to extract data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Efficient-ways-to-extract-data/m-p/766538#M80993</link>
      <description>&lt;P&gt;My code will include ALL variables from the big dataset B and filter on ID's found in dataset A.&lt;/P&gt;
&lt;P&gt;the DEFINEDATA() method is used to retrieve values from the hash table to the PDV, in which case you also must use FIND() instead of CHECK().&lt;/P&gt;
&lt;P&gt;The code you posted would only be valid if dataset A contained the variables b and c, and you wanted those variables added to those retrieved from dataset B; you must also define those variables in the PDV (usually done with a LENGTH statement); additionally, replace CHECK() with FIND().&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So please be more specific: which variables are contained in which dataset?&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2021 10:01:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Efficient-ways-to-extract-data/m-p/766538#M80993</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-09-08T10:01:48Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient ways to extract data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Efficient-ways-to-extract-data/m-p/766539#M80994</link>
      <description>&lt;P&gt;A contains ID&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;B contains&lt;/P&gt;
&lt;P&gt;ID, DATE, AMOUNT&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;pluss a lot more that I don't care about.&lt;/P&gt;</description>
      <pubDate>Wed, 08 Sep 2021 10:05:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Efficient-ways-to-extract-data/m-p/766539#M80994</guid>
      <dc:creator>Kiteulf</dc:creator>
      <dc:date>2021-09-08T10:05:01Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient ways to extract data</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Efficient-ways-to-extract-data/m-p/766540#M80995</link>
      <description>&lt;P&gt;So if you only want those three variables and filter on A, my code would need an additional KEEP= dataset option:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
set b (keep=id date amount); /* big dataset */
if _n_ = 1
then do;
  declare hash a (dataset:"a"); /* small dataset */
  a.definekey("id");
  a.definedone();
end;
if a.check() = 0; /* found an entry */
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 08 Sep 2021 10:08:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Efficient-ways-to-extract-data/m-p/766540#M80995</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2021-09-08T10:08:35Z</dc:date>
    </item>
  </channel>
</rss>

