<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Fastest method to find some records in a dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Fastest-method-to-find-some-records-in-a-dataset/m-p/25637#M4476</link>
    <description>Hi all, &lt;BR /&gt;
can anybody tell me what is the fastest method to find some records from a dataset in another dataset?&lt;BR /&gt;
I explain better: I have two dataset, the first one has 6000 records and differents columns; the second one has 500 records and just one column with the ID.&lt;BR /&gt;
I need to select in the first dataset the records with the ID from the second dataset. I have tried with merge and inner join (the where in (,,,) woluld be crazy), but the program runs to slowly, around 4 minutes, then i'd want to know if there is another faster method to do this.&lt;BR /&gt;
&lt;BR /&gt;
Thanks in advance, &lt;BR /&gt;
Elena</description>
    <pubDate>Fri, 21 May 2010 15:18:32 GMT</pubDate>
    <dc:creator>deleted_user</dc:creator>
    <dc:date>2010-05-21T15:18:32Z</dc:date>
    <item>
      <title>Fastest method to find some records in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-method-to-find-some-records-in-a-dataset/m-p/25637#M4476</link>
      <description>Hi all, &lt;BR /&gt;
can anybody tell me what is the fastest method to find some records from a dataset in another dataset?&lt;BR /&gt;
I explain better: I have two dataset, the first one has 6000 records and differents columns; the second one has 500 records and just one column with the ID.&lt;BR /&gt;
I need to select in the first dataset the records with the ID from the second dataset. I have tried with merge and inner join (the where in (,,,) woluld be crazy), but the program runs to slowly, around 4 minutes, then i'd want to know if there is another faster method to do this.&lt;BR /&gt;
&lt;BR /&gt;
Thanks in advance, &lt;BR /&gt;
Elena</description>
      <pubDate>Fri, 21 May 2010 15:18:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-method-to-find-some-records-in-a-dataset/m-p/25637#M4476</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2010-05-21T15:18:32Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest method to find some records in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-method-to-find-some-records-in-a-dataset/m-p/25638#M4477</link>
      <description>try something like&lt;BR /&gt;
&lt;BR /&gt;
proc sql;&lt;BR /&gt;
create table want as&lt;BR /&gt;
select * from have1&lt;BR /&gt;
where id in (select id from have2);&lt;BR /&gt;
quit;&lt;BR /&gt;
&lt;BR /&gt;
I have no idea if that will be faster....but unless you have hundreds of text columns I'm not sure why it would take 4 minutes.</description>
      <pubDate>Fri, 21 May 2010 16:04:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-method-to-find-some-records-in-a-dataset/m-p/25638#M4477</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2010-05-21T16:04:59Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest method to find some records in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-method-to-find-some-records-in-a-dataset/m-p/25639#M4478</link>
      <description>For data sets of your size (assuming a reasonable number of columns) the SQL step ought to do the trick.  If the SQL look-up is too slow, there are other alternatives that generally require more coding, but can usually process quicker.  The DATA Step Components (HASH) Objects are about the fastest.  An overview of look-up techniques can  be found at:&lt;BR /&gt;
&lt;A href="http://caloxy.com/papers/43-i_how_table_lookups_from_ift.pdf" target="_blank"&gt;http://caloxy.com/papers/43-i_how_table_lookups_from_ift.pdf&lt;/A&gt;</description>
      <pubDate>Fri, 21 May 2010 23:09:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-method-to-find-some-records-in-a-dataset/m-p/25639#M4478</guid>
      <dc:creator>ArtC</dc:creator>
      <dc:date>2010-05-21T23:09:50Z</dc:date>
    </item>
    <item>
      <title>Re: Fastest method to find some records in a dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Fastest-method-to-find-some-records-in-a-dataset/m-p/25640#M4479</link>
      <description>Hi.&lt;BR /&gt;
I think you should first index these two dataset by using proc sort.then that wil be faster.&lt;BR /&gt;
And if you do not want sort.;&lt;BR /&gt;
there are some code i copy from somewhere.Wish will help you .&lt;BR /&gt;
&lt;BR /&gt;
[pre]&lt;BR /&gt;
proc sort data=small(keep=id) nodupkey force;*small is your second dataset;&lt;BR /&gt;
  by id;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
data fmt(rename=(id=start));&lt;BR /&gt;
  retain fmtname 'key' &lt;BR /&gt;
           type 'N'   *'N' means numerical type format,if id is character type then use 'C';&lt;BR /&gt;
           label 'Y';&lt;BR /&gt;
      set small end=eof;&lt;BR /&gt;
    output;&lt;BR /&gt;
  if eof then do;&lt;BR /&gt;
    HLO='o';&lt;BR /&gt;
    label='N';&lt;BR /&gt;
   output;&lt;BR /&gt;
  end;&lt;BR /&gt;
&lt;BR /&gt;
proc format cntlin=fmt;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
data matched;&lt;BR /&gt;
  set test;   *test is your first dataset;&lt;BR /&gt;
  where put(id,key.)='Y';&lt;BR /&gt;
run;&lt;BR /&gt;
[/pre]&lt;BR /&gt;
&lt;BR /&gt;
Ksharp&lt;BR /&gt;
&lt;BR /&gt;
Message was edited by: Ksharp</description>
      <pubDate>Sat, 22 May 2010 06:18:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Fastest-method-to-find-some-records-in-a-dataset/m-p/25640#M4479</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2010-05-22T06:18:29Z</dc:date>
    </item>
  </channel>
</rss>

