<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Merge large datasets using sql or harsh objects (need help!!) in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585355#M166913</link>
    <description>&lt;P&gt;Hey&amp;nbsp;&lt;SPAN class="login-bold"&gt;&lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/138205" target="_self"&gt;novinosrin&lt;/A&gt;! Thank you so much for the reply. I guess you can tell that I don't know much about SAS while I am struggling to learn by working... Do you have any suggestions? I am trying the following code:&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="login-bold"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture3.PNG" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/32143iE2B1B6D3FCD1EC6F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Capture3.PNG" alt="Capture3.PNG" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 30 Aug 2019 20:37:56 GMT</pubDate>
    <dc:creator>constancezeng</dc:creator>
    <dc:date>2019-08-30T20:37:56Z</dc:date>
    <item>
      <title>Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585346#M166909</link>
      <description>&lt;P&gt;Hey guys. New sas user here. I am having trouble merging two super large datasets named ibes and combine (around 30g). I've tried using proc sql but it took me forever and still I cannot have the results.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So I am thinking about merging those two datasets based on sedol=cusip (same company id but different var names in ibes and combine) and ANNDATS&amp;lt;=date&amp;lt;=ANNDATS+365. So each date will correspond to a PRC in combine; and&amp;nbsp;each price in ibes will have 200 PRC (from combine). At the end I need to calculate the number of PRCs for each price in ibes dateset.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I read some posts that suggested a use of hush objects. Here is my attempt:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture.PNG" style="width: 600px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/32136i1943B14F40600E6F/image-size/large?v=v2&amp;amp;px=999" role="button" title="Capture.PNG" alt="Capture.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;But somehow sas has returned me an error:&amp;nbsp;: Variable cusip has been defined as both character and numeric..&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Please let me know if you have better ideas in terms of how to deal with those two datasets! Appreciate for your help!!&lt;BR /&gt;&amp;nbsp;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture2.png.PNG" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/32140iBF6AE229373DB94D/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Capture2.png.PNG" alt="Capture2.png.PNG" /&gt;&lt;/span&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture1.png.PNG" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/32139iA7A5A7CE13D296C8/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Capture1.png.PNG" alt="Capture1.png.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2019 19:43:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585346#M166909</guid>
      <dc:creator>constancezeng</dc:creator>
      <dc:date>2019-08-30T19:43:09Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585347#M166910</link>
      <description>&lt;P&gt;1. Why do you need a length or attrib statement when at compilation SAS will read the descriptor portion of the dataset and creates a PDV to hold an observation(record) at execution?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2. Hash object variables inherit their type/length from PDV host variables. In your case&amp;nbsp; I can't see any variable(like that of assignment /explicit retain)outside of the PDV host&amp;nbsp; that is loaded/retrieved in/from hash object.&amp;nbsp; --Very important fundamental !!!!!!!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;3. prc=.; date=.; *Why play with representation when call missing will assign missing values appropriately pertaining to type inherited from PDV host variable&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt; rc=group.find(key:sedol);

&amp;nbsp;if rc ne 0 then call missing(prc,date);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2019 21:11:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585347#M166910</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-08-30T21:11:46Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585355#M166913</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;SPAN class="login-bold"&gt;&lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/138205" target="_self"&gt;novinosrin&lt;/A&gt;! Thank you so much for the reply. I guess you can tell that I don't know much about SAS while I am struggling to learn by working... Do you have any suggestions? I am trying the following code:&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class="login-bold"&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture3.PNG" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/32143iE2B1B6D3FCD1EC6F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Capture3.PNG" alt="Capture3.PNG" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2019 20:37:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585355#M166913</guid>
      <dc:creator>constancezeng</dc:creator>
      <dc:date>2019-08-30T20:37:56Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585358#M166914</link>
      <description>&lt;P&gt;Fair enough. This time your approach &lt;EM&gt;&lt;STRONG&gt;syntactically&lt;/STRONG&gt; &lt;/EM&gt;is fine. Did it meet your logical requirement i.e did you get the results what you need?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You are the best judge to assess the results.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2019 20:43:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585358#M166914</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-08-30T20:43:59Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585359#M166915</link>
      <description>&lt;P&gt;If you don't define the variables before naming them in the HASH object method calls then the data step compiler will define them as numeric.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I usually use a line like this to define the variables from the existing dataset before creating the hash object without actually reading in any of the data.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if 0 then set dsname;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 30 Aug 2019 20:47:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585359#M166915</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-08-30T20:47:34Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585361#M166916</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;If you don't define the variables before naming them in the HASH object method calls then the data step compiler will define them as numeric.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I usually use a line like this to define the variables from the existing dataset before creating the hash object without actually reading in any of the data.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;if 0 then set dsname;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Thank you Sir . I am afraid PDV host gets the precedence whether or not variables are defined before naming in Hash obj&lt;/P&gt;
&lt;P&gt;&amp;nbsp;For example,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_ ;
   dcl hash H () ;
   h.definekey  ("name") ;
   h.definedata ("name","height","weight") ;
   h.definedone () ;
name="xxx";
height=6868;
weight=5676;
rc=h.add();
h.output(dataset:'test');
stop;
run;

proc contents data=test;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Or a better example&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_ ;
   dcl hash H () ;
   h.definekey  ("name") ;
   h.definedata ("name","height","weight") ;
   h.definedone () ;
do until(z);
 set sashelp.class end=z;
 rc=h.add();
end;
h.output(dataset:'test');
stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2019 20:59:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585361#M166916</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-08-30T20:59:05Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585363#M166917</link>
      <description>&lt;P&gt;How many records are in each of your data sets, as you begin?&amp;nbsp; (Ballpark number is OK.)&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2019 22:29:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585363#M166917</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2019-08-30T22:29:22Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585364#M166918</link>
      <description>&lt;P&gt;Have you tried a simple DATA step MERGE? This requires sorted or indexed keys though. If you are essentially joining all rows from both source tables, performance will be limited by IO anyway, regardless of what joining technique you use.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You should also explore other options like using COMPRESS to shrink the size of your SAS datasets to begin with. If your datasets are wide and contain a lot of text then compressing can really speed things up.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2019 22:36:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585364#M166918</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2019-08-30T22:36:35Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585366#M166919</link>
      <description>&lt;P&gt;I have around 300,000,000 rows in combine and 16,000,000 rows in ibes. Thanks for asking!&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2019 23:19:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585366#M166919</guid>
      <dc:creator>constancezeng</dc:creator>
      <dc:date>2019-08-30T23:19:51Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585367#M166920</link>
      <description>Yes I tried. Both merge and sql gave me an error says im running out of space. And I used the option compress already. Thank you for your suggestions!</description>
      <pubDate>Fri, 30 Aug 2019 23:22:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585367#M166920</guid>
      <dc:creator>constancezeng</dc:creator>
      <dc:date>2019-08-30T23:22:02Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585368#M166921</link>
      <description>&lt;P&gt;In my experience COMPRESS = BINARY is best. Are you running SAS on a PC or a remote server? You need to fix the out of space issue first as any join technique is going to need a similar amount of free space. Your SAS administrator should be able to help if you are using a remote SAS server.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2019 23:31:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585368#M166921</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2019-08-30T23:31:27Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585370#M166923</link>
      <description>&lt;P&gt;You want to compare 300M records to each of 16M records?&amp;nbsp; No wonder your query never finished.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Are your source datasets already sorted?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What are the min/max number of records per BY group for both of the files?&amp;nbsp; Perhaps one of those will be small enough that you can load each BY group from one of the files into memory/hash.&lt;/P&gt;</description>
      <pubDate>Fri, 30 Aug 2019 23:35:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585370#M166923</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2019-08-30T23:35:42Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585371#M166924</link>
      <description>&lt;P&gt;&lt;SPAN&gt;A technique that can be used to reduce the hash memory footprint and still avoid sorting the entire file is to index it first by the splitting variable and then use by-processing, For example:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
proc sql noprint;                                                              
  create index cusip on ibes (cusip) ;   
  create index cusip on combine (cusip) ;  
quit 

data want;
if _n_=1 then do;
 dcl hash H (multidata:'y') ;
 h.definekey  ("cusip") ;
 h.definedata ("ANNDATS", "value") ;
 h.definedone () ;
 end;
 do until(last.cusip);
  set ibes(in=a) combine(in=b);
  by cusip;
  if a then rc=h.add();	/*load ibes by group in hash*/
  else do;
   /*Here when b do your find,find_next retrieve and
   whatever logic want*/
  end;
 end;
 h.clear();
 drop rc;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;This way, the hash memory footprint is confined to the extent of the largest CUSIP&amp;nbsp; by-group, at the expense of the overhead of creating the index and indexed data retrieval..HTH&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 31 Aug 2019 00:11:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585371#M166924</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2019-08-31T00:11:22Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585374#M166925</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/287397"&gt;@constancezeng&lt;/a&gt;:&lt;/P&gt;
&lt;P&gt;1. From what you've shown, you have a data type conflict between SEDOL defined as numeric in the LENGTH statement and its character type in the data sets you're processing (it contains alpha data, so it can't be numeric). First the compiler sees the LENGTH statement and stores SEDOL in the compiler symbol table as numeric; then it sees COMBINE referenced in the first SET and reads its descriptor. But in the latter, SEDOL is defined as character, hence the conflict. If you haven't seen a related error before, you will. &lt;EM&gt;Solution&lt;/EM&gt;: Eliminate the LENGTH statement. (In the docs, it's used to create a PDV host variable for the corresponding hash variable, but you don't need it here since the compiler creates it using its attributes stored in the descriptor of COMBINE.)&amp;nbsp; &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2. I don't see from your code how the CUSIP data type conflict is possible. It cannot have anything to do with the hash table, as no such hash variable is defined. Judging from your code, CUSIP can only come from IBES since it's not kept in COMBINE and not defined elsewhere in the step. Hence, the code that resulted in this error cannot be the code you showed.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;3. As&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/138205"&gt;@novinosrin&lt;/a&gt;&amp;nbsp;said, it's safer to use CALL MISSING (PRC, DATE), though in your case assigning standard numeric missing values to these variables explicitly doesn't hurt, as both are numeric. However, in your case what you're effectively executing is an inner join: If SEDOL isn't found in IBES, you don't output the record. Therefore, you don't need to set PRC and DATE to missing at all; so, just remove the corresponding statements.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;4. Memory shortage is what you may encounter once your step has compiled successfully and commences on loading the table GROUP. If you're on a 64-bit system, its item length is 64 if SEDOL is $16 or shorter and 80 if it's longer. With 16 million items loaded, you're looking at about 1.25 GB of RAM footprint. If your SAS session is allotted 4 GB (which is not much to ask), you should be okay.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;5. The reason your shot at doing it with SQL has bombed is most likely an attempt by SQL to sort COMBINE behind the scenes. Since you want an inner join, you can change this default behavior by coding the MAGIC=103 option with the proc SQL statement. It will force the SQL optimizer to select the SQXJHSH access method - in other words, to store IBES in an internal hash table and look it up for every record in COMBINE without sorting the latter; i.e. to do implicitly what you're trying to do by using the hash explicitly. Note that it will only work for an inner join (not left, right, etc.), but since that's what you need, you're in luck here.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;6. Yet another option is: Index the file IBES and use the index to look it up for every record in COMBINE using the SET statement with the KEY= option. You can find a gazillion SAS articles on how to do that (including handling duplicate keys on the driver side) or just read the definitive opus on SAS indexes by Michael Raithel:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.amazon.com/Complete-Guide-SAS-Indexes/dp/1590478495" target="_blank"&gt;https://www.amazon.com/Complete-Guide-SAS-Indexes/dp/1590478495&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The method is slower than a hash (or implicitly hashed SQL), but it's advantage is that you don't have to worry about the memory footprint.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kind regards&lt;/P&gt;
&lt;P&gt;Paul D.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 31 Aug 2019 03:05:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585374#M166925</guid>
      <dc:creator>hashman</dc:creator>
      <dc:date>2019-08-31T03:05:15Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585380#M166928</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;:&lt;/P&gt;
&lt;P&gt;&amp;gt;&lt;SPAN&gt;If you don't define the variables before naming them in the HASH object method calls then the data step compiler will define them as numeric.&amp;lt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Nope. The compiler doesn't see anything inside the DefineKey or DefineData methods' parentheses, as they're executed after it has already done its job, i.e. at the run time. As far as the hash object directives go, the compiler isn't concerned with anything but (a) the correctness of the object.dot syntax, (b) making sure that the non-scalar variable (used as the hash object name, such as H) is defined only once, and (c) checking that if a hash object is referenced by a method, operator, or attribute, it's been defined.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In other words, if you do this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;1 data _null_ ;                                                                 
2   dcl hash h () ;                                                             
3   h.definekey ("name") ;                                                      
4   h.definedata ("weight") ;                                                   
5   h.definedone () ;                                                           
6 run ;                                                                         
                                                                                
ERROR: Undeclared key symbol name for hash object at line 5 column 3.           
ERROR: DATA STEP Component Object failure.  Aborted during the EXECUTION phase. 
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;the step &lt;EM&gt;compiles just fine&lt;/EM&gt; and doesn't define any non-auto PDV variables at all; the compiler sees neither NAME nor WEIGHT. It is &lt;EM&gt;during the run time&lt;/EM&gt; (aka EXECUTION phase in the hash log speak) that DefineDone checks for the presence of NAME in the PDV, and having not found it, aborts the step (it doesn't even bother to go on to check for WEIGHT). But if you code:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_ ;                  
  if 0 then set sashelp.class ;
  dcl hash h () ;              
  h.definekey ("name") ;       
  h.definedata ("weight") ;    
  h.definedone () ;            
  stop ;                       
run ;                          
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;everything is hunky-dory. However, if you coded:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_ ;              
  dcl hash h () ;          
  h.definekey ("name") ;   
  h.definedata ("weight") ;
  h.definedone () ;        
  stop ;                   
  set sashelp.class ;      
run ;                      
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;everything is fine, too. DefineDone (at run time) couldn't care less where in the code - before it or after it - the compiler has seen NAME and WEIGHT at compile time; all it cares about is that the &lt;EM&gt;host variables&lt;/EM&gt;&amp;nbsp;&lt;EM&gt;called&lt;/EM&gt; NAME and WEIGHT are present in the PDV when DefineDone is called at run time. Note that if your assertion that these variables are defined as numeric if the compiler sees them after the hash definition cluster were true, the compiler would abort the step at compile time, as there would be a data type conflict between NAME coming from SASHELP.CLASS and DefineKey upstream.&amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I more than understand that from the standpoint of an experienced pre-hash SAS programmer, this kind of behavior looks odd. Indeed, before the advent of the hash object, we had accustomed that if a variable is named anywhere in the step, the compiler sees it and compiles it according to its set of (fairly convoluted) rules. When I got my hands on the early version 9.0 circa 2002, I was struck by this behavior, just like anyone in my shoes would be. Luckily, for a length of time I was in direct email contact with the developers, who helped me absorb the concept of parameter type matching rather quickly - in fact, one of them wrote, verbatim, "the compiler doesn't see anything within those parentheses", at which point if kind of instantly clicked.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is the very reason&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13569"&gt;@DonH&lt;/a&gt;&amp;nbsp;and I dedicated the whole section 2.2 (CREATE Operation) to illuminating the things of this nature from different angles in:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.amazon.com/Management-Solutions-Using-Table-Operations/dp/1629601438" target="_blank"&gt;https://www.amazon.com/Management-Solutions-Using-Table-Operations/dp/1629601438&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kind regards&lt;/P&gt;
&lt;P&gt;Paul D.&amp;nbsp; &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 31 Aug 2019 04:20:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585380#M166928</guid>
      <dc:creator>hashman</dc:creator>
      <dc:date>2019-08-31T04:20:53Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585382#M166929</link>
      <description>&lt;P&gt;Hey Paul. I'd like to first thank you and&amp;nbsp;&lt;A href="https://communities.sas.com/t5/user/viewprofilepage/user-id/138205" target="_blank"&gt;@novinosrin&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;for your answers. I just tried the magic option that you suggested, unfortunately I do not think that I have sufficient space according to SAS.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture4.PNG" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/32145i7FFA43E8DD0CD0EE/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Capture4.PNG" alt="Capture4.PNG" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;In addition to that, using what novinosrin suggested, SAS has produced an output. But you can see that a lot of observations are dropped - I am still trying to figuring the reason, maybe I coded it in a wrong way again (I have indexed cusip in both datasets).&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture5.PNG" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/32146i5426362CD986D4CC/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Capture5.PNG" alt="Capture5.PNG" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Again, thanks to all your comments! I appreciate for your work and looking forward to your reply!&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 31 Aug 2019 04:21:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585382#M166929</guid>
      <dc:creator>constancezeng</dc:creator>
      <dc:date>2019-08-31T04:21:50Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585385#M166930</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/287397"&gt;@constancezeng&lt;/a&gt;:&lt;/P&gt;
&lt;P&gt;You just don't have enough space in your WORK library to write the output data set you are writing.&lt;/P&gt;
&lt;P&gt;As long as this is the case, it doesn't matter what kind of file matching method you're using.&lt;/P&gt;
&lt;P&gt;Either clean your WORK library of the junk you don't need or work with your admin to give you more disk space in WORK (or write to a different, permanent, library where you do have enough space).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;From your log, SQL reacts as I've expected: Note the &lt;EM&gt;"merge join has been transformed into a hash join"&lt;/EM&gt; message.&lt;/P&gt;
&lt;P&gt;And yes, you're creating an inner join: (a) because of how your SQL query is structured and (b) otherwise, you wouldn't receive the above message from the optimizer; no join other than an inner join can be prodded into hashing by MAGIC=103.&lt;/P&gt;
&lt;P&gt;If you indeed want a left join, you have to code it explicitly. But remember that MAGIC=103 won't work in this case (it makes no sense to me and a number of other big data guys I know, but the fact is, it won't).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Two more suggestions:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Turn on the system options FULLSTIMER to see memory usage.&lt;/LI&gt;
&lt;LI&gt;Turn on the system option MSGLEVEL=I and add the option _METHOD (in addition to MAGIC=103) to the SQL statement to see the SQL access methods reported in the log.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;In fact, since you are dealing with fairly large data, I'd recommend keeping these system options on at all times, i.e. just code:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;option fullstimer msglevel=i ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The more &lt;EM&gt;relevant&lt;/EM&gt; info in the SAS log, the better.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kind regards&lt;/P&gt;
&lt;P&gt;Paul D.&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 31 Aug 2019 04:53:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585385#M166930</guid>
      <dc:creator>hashman</dc:creator>
      <dc:date>2019-08-31T04:53:49Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585390#M166932</link>
      <description>&lt;P&gt;Do you get ERRORs or WARNINGs? Is the result not what you expected?&lt;/P&gt;
&lt;P&gt;Please post the complete log of the step (copy/paste into a window opened with the {i} button).&lt;/P&gt;
&lt;P&gt;For posting code, just copy/paste it into a "little running man" window. Don't post pictures.&lt;/P&gt;</description>
      <pubDate>Sat, 31 Aug 2019 05:15:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585390#M166932</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2019-08-31T05:15:37Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585408#M166938</link>
      <description>&lt;P&gt;A corollary to&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/21262"&gt;@hashman&lt;/a&gt;'s suggestion about WORK, if you have access to a location with more space, define a libname to it (e.g., a libname like MYWORK) and then use the &lt;A href="https://documentation.sas.com/?docsetId=lesysoptsref&amp;amp;docsetTarget=p1tl8adik7ypwun1utdwxaauf9wq.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en" target="_blank" rel="noopener"&gt;USER&lt;/A&gt; option to have SAS use that location for any data sets that have a one level name. That allows you to use a different location for your temporary data sets without having to update each reference.&lt;BR /&gt;&lt;BR /&gt;Just make sure to delete all the SAS datasets you created. If the location only contains your &lt;EM&gt;WORK&lt;/EM&gt; data sets (e.g., by creating a sub-folder in that location with more space), you can simply run something like to delete them all:&lt;BR /&gt;&lt;BR /&gt;proc datasets lib=mywork nolist kill;&lt;BR /&gt;run;&lt;/P&gt;</description>
      <pubDate>Sat, 31 Aug 2019 12:19:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585408#M166938</guid>
      <dc:creator>DonH</dc:creator>
      <dc:date>2019-08-31T12:19:21Z</dc:date>
    </item>
    <item>
      <title>Re: Merge large datasets using sql or harsh objects (need help!!)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585409#M166939</link>
      <description>&lt;P&gt;As a followup to&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/21262"&gt;@hashman&lt;/a&gt;'s spot-on comments, you might want to review the communities article&amp;nbsp;&lt;SPAN&gt;&lt;A href="https://communities.sas.com/t5/SAS-Communities-Library/The-SET-Statement-s-Compile-Time-Functions/ta-p/485895" target="_blank" rel="noopener"&gt;The SET Statement's Compile Time Functions&lt;/A&gt; that provides some additional context for why you sometimes see a conflict about numeric for character conflicts when defining hash objects.&lt;BR /&gt;&lt;BR /&gt;The key point is that the compiler does not recognize quoted strings as variable names (as&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/21262"&gt;@hashman&lt;/a&gt;&amp;nbsp;said in his comment). If you never reference the variable that is defined to the hash object that is a quoted string in your DATA step, you get the execution time message &lt;EM&gt;Undeclared key symbo&lt;/EM&gt;l . . . If you do reference it elsewhere in your program, the compiler will add it to the PDV at compile time and typically will define the variable to the PDV as a numeric variable. The article above provides some examples.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 31 Aug 2019 12:29:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-large-datasets-using-sql-or-harsh-objects-need-help/m-p/585409#M166939</guid>
      <dc:creator>DonH</dc:creator>
      <dc:date>2019-08-31T12:29:14Z</dc:date>
    </item>
  </channel>
</rss>

