<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Efficient one to many join to a huge dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555799#M154728</link>
    <description>&lt;P&gt;Recreating a large table for just a few hundreds records modified is not the most efficient.&lt;/P&gt;
&lt;P&gt;Updating using an index is probably the best way.&lt;/P&gt;</description>
    <pubDate>Thu, 02 May 2019 22:46:41 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2019-05-02T22:46:41Z</dc:date>
    <item>
      <title>Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555444#M154573</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;One to many match of a very small input file (&amp;amp;sfile._&amp;amp;dbase ) to a very large file (sdCumV.VIN_VEH_OPTNS_&amp;amp;dbase)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In one particular example small input has 2 unique records, and very large file as 419M (where each input record will typically find ~200 matches).&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This particular example run for 44 minutes. Obviously, we can do better.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;What I'm not sure of is how best to utilize SAS for such a task.&amp;nbsp; Any ideas would be welcome.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;* ------------------------ ;&lt;BR /&gt;* one to MANY matches ;&lt;BR /&gt;* ------------------------ ;&lt;/P&gt;&lt;P&gt;* match to VIN_VEH_OPTNS file to get OPTN_CD ;&lt;BR /&gt;proc sql ;&lt;BR /&gt;create table squish_optns_&amp;amp;dbase as&lt;BR /&gt;select * from&lt;BR /&gt;&amp;amp;sfile._&amp;amp;dbase as G left join sdCumV.VIN_VEH_OPTNS_&amp;amp;dbase as S on G.VIN = S.GMC_VEH_IDENT_NBR ;&lt;BR /&gt;quit ;&lt;/P&gt;</description>
      <pubDate>Wed, 01 May 2019 18:47:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555444#M154573</guid>
      <dc:creator>ChuckC</dc:creator>
      <dc:date>2019-05-01T18:47:41Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555445#M154574</link>
      <description>Instead of join how about filters?&lt;BR /&gt;&lt;BR /&gt;select * from bigTable where ID in (select distinct ID from smallTable);&lt;BR /&gt;&lt;BR /&gt;And consider adding an index to your big table which will speed up that query a lot.</description>
      <pubDate>Wed, 01 May 2019 18:52:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555445#M154574</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2019-05-01T18:52:57Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555492#M154587</link>
      <description>&lt;P&gt;I second&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;'s recommendation. For such a low percentage of the table being retrieved, an index is the way to go.&lt;/P&gt;</description>
      <pubDate>Wed, 01 May 2019 21:24:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555492#M154587</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-05-01T21:24:25Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555508#M154600</link>
      <description>&lt;P&gt;A dynamic filtering clause does not use indexes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data B(index=(I) compress=no)   S(compress=no); 
  do I= 1 to 2e6;
   do J=1 to 20; 
     output B; 
   end;
   if I in(333333,999999) then output S;
 end;
run;

 proc sql;     * 0.3 seconds ;
   select B.* from S left join B on B.I=S.I; quit;
         
 proc sql;     * 8.6 seconds ;
   select B.* from B where I in (select I from S); quit;

 proc sql;     * 0.3 seconds ;
   select distinct I into :values separated by ',' from S; 
   select B.* from B where I in (&amp;amp;values); quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 01 May 2019 23:10:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555508#M154600</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-05-01T23:10:03Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555646#M154640</link>
      <description>&lt;P&gt;There is an alternative approach, using a &lt;A href="https://go.documentation.sas.com/?cdcId=pgmsascdc&amp;amp;cdcVersion=9.4_3.4&amp;amp;docsetId=proc&amp;amp;docsetTarget=p1xidhqypi0fnwn1if8opjpqpbmn.htm&amp;amp;locale=en" target="_self"&gt;custom format&lt;/A&gt; &amp;amp; then &lt;A href="https://go.documentation.sas.com/?cdcId=pgmsascdc&amp;amp;cdcVersion=9.4_3.4&amp;amp;docsetId=lefunctionsref&amp;amp;docsetTarget=n1en5ed71v1ai3n1pxwx2rskgz7g.htm&amp;amp;locale=en" target="_self"&gt;PUTC function.&lt;/A&gt;&amp;nbsp;Here's a simple example:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data large ;
	do obs=1 to 1000 ;
		match=substr("ABC",int(ranuni(0)*3)+1,1) ;
		output ;
	end ;
run ;

data small ;
	fmtname="$myFmt" ;
	do start="A", "B","C" ;
		label=repeat(start,3) ;
		output ;
	end ;
run ;
	
proc format cntlin=small ;
run ;

data join ;
	set large ;
	matchvalue=putc(match,"$myFmt.") ;
run ;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 02 May 2019 16:02:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555646#M154640</guid>
      <dc:creator>AMSAS</dc:creator>
      <dc:date>2019-05-02T16:02:39Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555799#M154728</link>
      <description>&lt;P&gt;Recreating a large table for just a few hundreds records modified is not the most efficient.&lt;/P&gt;
&lt;P&gt;Updating using an index is probably the best way.&lt;/P&gt;</description>
      <pubDate>Thu, 02 May 2019 22:46:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555799#M154728</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-05-02T22:46:41Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555837#M154738</link>
      <description>&lt;P&gt;Here's some example code.&amp;nbsp; See the examples in the doc for the SET statement.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data small;
   do x=1 to 1E4;
      key=int(ranuni(0)*1E4);
      output;
   end;
   drop x;
run;

data large (index=(key));
   do x=1 to 1E8;
      key=int(ranuni(0)*1E8);
      * other columns go here ;
      foo=x;
      output;
   end;
   drop x;
run;

* first match only (one-to-one matching);
data want1;
   if 0 then set large;
   call missing(foo);  * implied retain on data set variables ;
   set small;
   set large key=key;
   _error_=0;           * SAS treats non-matches as an error ;
   * if _iorc_=0;       * to keep matches only ;
run;

* multiple matches (one-to-many matching) ;
data want2;
   if 0 then set large;
   call missing(foo);  * implied retain on data set variables ;
   set small;
   do until (_iorc_ ne 0);
      set large key=key;
      if _iorc_=0 then output;
   end;
   _error_=0;           * SAS treats non-matches as an error ;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 03 May 2019 04:39:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/555837#M154738</guid>
      <dc:creator>ScottBass</dc:creator>
      <dc:date>2019-05-03T04:39:04Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/557477#M155413</link>
      <description>&lt;P&gt;Good morning all,&lt;BR /&gt;Wanted to take a quick minute and post a word of thanks for all of the EXCELLENT ideas and insights shared.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In the end, what I found was that moving from the use of Index to COMPRESS and adding a select Distinct, in my case, provided the best result (see snip below).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This change miaculously reduced the cycle time from 40+ minutes to less than 4 minutes!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;HOWEVER, when the full production jobs runs, which runs this code over a series of datasets, it gets much longer. In production it runs 16-20 mins per cycle. So still a huge improvement from the original, but leaves me believing I have a memory issue to address next.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks again all. Until next time &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Chuck&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;* ------------------------ ;&lt;BR /&gt;* one to MANY matches ;&lt;BR /&gt;* ------------------------ ;&lt;/P&gt;&lt;P&gt;/* * ORIGINAL WAY - match to VIN_VEH_OPTNS file to get OPTN_CD ;&lt;BR /&gt;proc sql ;&lt;BR /&gt;create table TEST_Selectto2017 as&lt;BR /&gt;select * from&lt;BR /&gt;sdresp.grabvin_o2017 as G left join sdCumV.VIN_VEH_OPTNS_o2017 as S on G.VIN = S.GMC_VEH_IDENT_NBR ;&lt;BR /&gt;quit ;&lt;BR /&gt;*/&lt;BR /&gt;/* * BY WAY OF SELECT DISTINCT ;&lt;BR /&gt;proc sql ;&lt;BR /&gt;create table sdresp.TEST_Selecto2017 as&lt;BR /&gt;select * from sdCumV.VIN_VEH_OPTNS_o2017&lt;BR /&gt;where GMC_VEH_IDENT_NBR in (select distinct VIN from sdresp.grabvin_o2017) ;&lt;BR /&gt;quit ;&lt;BR /&gt;*/&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 09 May 2019 15:10:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/557477#M155413</guid>
      <dc:creator>ChuckC</dc:creator>
      <dc:date>2019-05-09T15:10:38Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/557716#M155526</link>
      <description>&lt;P&gt;Set the FULLSTIMER option to monitor resource consumption.&lt;/P&gt;
&lt;P&gt;MSGLEVEL=I will give information about sorting and index usage.&lt;/P&gt;
&lt;P&gt;PROC SQL _method; will give the SQL planner evaluation.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It could be a good idea to monitor the overall server resource consumption at the same time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Depending on the result from these inputs, adjust MEMSIZE and SORTSIZE global options.&lt;/P&gt;
&lt;P&gt;Potentially if the SQL triggers can hash join, yo might want to increase the BUFFERSIZE (specified as a PROC SQL option).&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 10 May 2019 09:28:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/557716#M155526</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2019-05-10T09:28:44Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/557736#M155531</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Fine for me but, do you realize that the results you get with your both SQL snippets are completely different?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;in the first code you get all records from your big table sdresp.grabvin_o2017&amp;nbsp;with all columns from both tables&lt;/P&gt;&lt;P&gt;but in the second code you solely get the matching&amp;nbsp;records and the columns&amp;nbsp;from sdCumV.VIN_VEH_OPTNS_o2017&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If this is what you want and performance you seek, consider using an INNER JOIN Statement combined with DISTINCT on indexed tables:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
   CREATE TABLE sdresp.TEST_Selecto2017 AS
      SELECT DISTINCT S.*
      FROM sdCumV.VIN_VEH_OPTNS_o2017 S
      INNER JOIN sdresp.grabvin_o2017 G
      ON S.GMC_VEH_IDENT_NBR EQ G.VIN
   ;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 10 May 2019 11:27:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/557736#M155531</guid>
      <dc:creator>Oligolas</dc:creator>
      <dc:date>2019-05-10T11:27:10Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/557752#M155537</link>
      <description>&lt;P&gt;Do you have some control over the production process?&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You may be able to combine the data sets in your "series of data sets" (possibly adding an identifier to indicate the source of the observation).&amp;nbsp; Then run with this combined data set against your large data set.&amp;nbsp; That way you only need to hit the large data set once.&amp;nbsp; You're left with the task of slicing and dicing many small data sets afterwards, but the CPU time it takes to do that ought to be small.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;.......................................................&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Another idea along similar lines but with fewer changes to the production process:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1. Add a step to find all the required records based on the "series of data sets".&lt;/P&gt;
&lt;P&gt;2. Extract those records from the large data set.&lt;/P&gt;
&lt;P&gt;3. Continue with the current production process, but using the extract instead of the original large data set.&lt;/P&gt;</description>
      <pubDate>Fri, 10 May 2019 13:20:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/557752#M155537</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2019-05-10T13:20:18Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/558160#M155745</link>
      <description>&lt;P&gt;If you decide to use indexes, just to let you know that you can load the small indexed table in memory using the SASFILE statement.&lt;BR /&gt;Indexes are loaded too and the random reads are &lt;U&gt;much&lt;/U&gt; faster.&lt;/P&gt;</description>
      <pubDate>Sun, 12 May 2019 21:41:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/558160#M155745</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-05-12T21:41:13Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/558166#M155750</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;If you decide to use indexes, just to let you know that you can load the small indexed table in memory using the SASFILE statement.&lt;BR /&gt;Indexes are loaded too and the random reads are &lt;U&gt;much&lt;/U&gt; faster.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I like to think of indexed datasets as allowing random access based on the index, but with the data remaining on disk.&amp;nbsp; So each read requires disk I/O.&amp;nbsp; In that respect, they are analogous to hash objects, but without loading the data into memory.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But, I don't know the internals of SAS indexes vs. the hash object "index" (search algorithm).&amp;nbsp; I don't know if they are similar, or completely different.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I find if my source table is small, but lookup table is huge, I can get better elapsed time by using index key lookup (esp. if the index already exists, say via overnight ETL processing).&amp;nbsp; This approach saves the overhead of loading a hash object for relatively few lookups.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, if my source table is huge, and the lookup table is small (or huge, but still fits in memory), then I can get better elapsed time by "suffering" the overhead of loading the hash object, but then having "blinding speed" for the lookups, where the lookups span the majority of the keys in the lookup table.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Question:&lt;/P&gt;
&lt;P&gt;With the caveats of the above, if the lookup table is small (and fits in memory), would it be better to use a hash object, even if the table is indexed?&amp;nbsp; Or, would the performance be similar, since as you say, the index is also read into memory, and therefore the lookups are similar to a hash object?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;(I guess the OP can try both approaches and see which works best...)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 13 May 2019 01:03:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/558166#M155750</guid>
      <dc:creator>ScottBass</dc:creator>
      <dc:date>2019-05-13T01:03:03Z</dc:date>
    </item>
    <item>
      <title>Re: Efficient one to many join to a huge dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/558188#M155763</link>
      <description>&lt;P&gt;&lt;EM&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/15043"&gt;@ScottBass&lt;/a&gt;&amp;nbsp;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&amp;gt;&amp;nbsp;if the lookup table is small (and fits in memory), would it be better to use a hash object, even if the table is indexed?&amp;nbsp; Or, would the performance be similar, since as you say, the index is also read into memory&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You're in luck Scott. Look at pages 162+ of my book, since you had the superior wisdom to procure it. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You'll see this very question benchmarked together with other look up methods. In broad lines: &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;- Index is faster if few rows (say 1%) are retrieved, &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;- Hash gets better as a larger proportion of the rows is fetched, &lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;- SASFILE+index sits somewhere in the middle and is best if not too many columns are needed and if more than 1% (say) of the rows are needed, due to the initial overhead of loading the data in memory.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 13 May 2019 05:39:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Efficient-one-to-many-join-to-a-huge-dataset/m-p/558188#M155763</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2019-05-13T05:39:34Z</dc:date>
    </item>
  </channel>
</rss>

