<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Hash join with compress(upcase) function in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262651#M51342</link>
    <description>In my experience, things seldom "has" to be in a certain way. Most resolutions are compromises. If you are worried about disk rewrite in the extra data step, just define it as a view. Should pose very little overhead if you got memory enough. &lt;BR /&gt;&lt;BR /&gt;If you worry about the performance of your database, you should concern more about how you store your data. Like having join keys that never needs a function call to work. That should have been taken care of during data load.</description>
    <pubDate>Sat, 09 Apr 2016 09:54:04 GMT</pubDate>
    <dc:creator>LinusH</dc:creator>
    <dc:date>2016-04-09T09:54:04Z</dc:date>
    <item>
      <title>Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262548#M51283</link>
      <description>&lt;P&gt;I would like to do a join using sas hash object, and get the same result as 4 lines of proc sql follows:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data dataset_A;
input key:$4.;
datalines;
AAA
BBB
CCC
DDD
run;
data dataset_B;
input key:$4.;
infile datalines dlm = ",";
datalines;
AAa
B BB
CcC
DDD
run;
proc sql;
create table sqljoin_dataset as
select A.*
from dataset_A A
join dataset_B B 
on compress(upcase(A.key)) = compress(upcase(B.key));
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;As in the clause "ON" are present the functions "compress" and “upcase", the following hash procedure does not allow to obtain the same result (1 row rather than 4):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data  hashjoin_dataset ;
set dataset_A;key=compress(upcase(key));
if  _N_ = 1 then do;
	declare hash T(dataset: 'dataset_B');
		    T.definekey(compress(upcase('key')));
		    T.definedata (all:'yes');
		    T.definedone();
			 key=compress(upcase(key));
	end;
	key=compress(upcase(key));
if T.find() = 0; /* join */
drop rc;
rc = t.find(); 
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Does anyone know if it is possible, through &lt;U&gt;&lt;STRONG&gt;a single data step&lt;/STRONG&gt;&lt;/U&gt;, to obtain with hash object the same result of proc sql? Thank you&lt;/P&gt;</description>
      <pubDate>Fri, 08 Apr 2016 17:52:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262548#M51283</guid>
      <dc:creator>mariopellegrini</dc:creator>
      <dc:date>2016-04-08T17:52:45Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262552#M51284</link>
      <description>&lt;P&gt;May I ask what requirement drives this inquiry?&lt;/P&gt;
&lt;P&gt;Just for learning and fun? &lt;img id="smileyvery-happy" class="emoticon emoticon-smileyvery-happy" src="https://communities.sas.com/i/smilies/16x16_smiley-very-happy.png" alt="Smiley Very Happy" title="Smiley Very Happy" /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since I can't see any other reason to switch from the&amp;nbsp;simplicity of SQL unless you have severe&amp;nbsp;performance problems.&lt;/P&gt;</description>
      <pubDate>Fri, 08 Apr 2016 18:13:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262552#M51284</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-04-08T18:13:33Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262602#M51308</link>
      <description>The reason is of course the improvement in processing time on large datasets</description>
      <pubDate>Fri, 08 Apr 2016 21:40:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262602#M51308</guid>
      <dc:creator>mariopellegrini</dc:creator>
      <dc:date>2016-04-08T21:40:09Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262605#M51310</link>
      <description>&lt;P&gt;Assuming dataset_B is the smaller I would preprocess it and replace key with conpress(uocase(key)). This way it gets the proper value in the has table without issue.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then find() will find it as you do the same with key from dataset_A.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data dataset_b2;&lt;BR /&gt;   set dataset_b;&lt;BR /&gt;   key=compress(upcase(key));&lt;BR /&gt;run;&lt;BR /&gt;data  hashjoin_dataset ;
set dataset_A;key=compress(upcase(key));
if  _N_ = 1 then do;
	declare hash T(dataset: 'dataset_B');
		    T.definekey(compress(upcase('key')));
		    T.definedata (all:'yes');
		    T.definedone();
		/*	 key=compress(upcase(key)); */
	end;
	key=compress(upcase(key));
if T.find() = 0; /* join */
drop rc;
rc = t.find(); 
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The compress(upcase()) doesn't do anything when loading the hash table so you can take that out.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps,&lt;/P&gt;
&lt;P&gt;- Jan.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;** edited for completenesss of code **&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Apr 2016 22:00:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262605#M51310</guid>
      <dc:creator>jklaverstijn</dc:creator>
      <dc:date>2016-04-08T22:00:44Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262606#M51311</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;* modify key for hash below with view;

data v_dataset_B/view=v_dataset_B;
  set dataset_B;
  key=compress(upcase(key));
run;

data  hashjoin_dataset ;
if 0 then set v_dataset_B;
if _N_ = 1 then do; 
  declare hash T(dataset: 'v_dataset_B'); 
    T.definekey('key'); 
    T.definedata (all:'yes'); 
    T.definedone(); 
end; 
set dataset_A;
key=compress(upcase(key));
if T.find() = 0; /* join */
drop rc;
rc = t.find(); 
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 08 Apr 2016 22:03:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262606#M51311</guid>
      <dc:creator>FriedEgg</dc:creator>
      <dc:date>2016-04-08T22:03:23Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262635#M51329</link>
      <description>&lt;PRE&gt;
Why do you have to use a single data step? Does FriedEgg's code make some sense ?


data dataset_A;
input key:$4.;
datalines;
AAA
BBB
CCC
DDD
EEe
;
run;
data dataset_B;
input key:$4.;
infile datalines dlm = ",";
datalines;
AAa
B BB
CcC
DDD
;
run;
data want;
 if _n_=1 then do;
  if 0 then set dataset_B;
  declare hash h();
  h.definekey('key');
  h.definedone();
  
  do until(last);
   set dataset_B end=last;
   key=compress(upcase(key));
   h.replace();
  end;
 
 end;
set dataset_B;
key=compress(upcase(key));
if h.check()=0;
run;


&lt;/PRE&gt;</description>
      <pubDate>Sat, 09 Apr 2016 03:57:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262635#M51329</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-04-09T03:57:40Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262646#M51338</link>
      <description>&lt;P&gt;But it is never used dataset A&lt;/P&gt;</description>
      <pubDate>Sat, 09 Apr 2016 07:03:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262646#M51338</guid>
      <dc:creator>mario_pellegrin</dc:creator>
      <dc:date>2016-04-09T07:03:14Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262647#M51339</link>
      <description>&lt;P&gt;Thanks, but I have the need to use only a data step&lt;/P&gt;</description>
      <pubDate>Sat, 09 Apr 2016 07:13:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262647#M51339</guid>
      <dc:creator>mariopellegrini</dc:creator>
      <dc:date>2016-04-09T07:13:24Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262648#M51340</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/80481"&gt;@mario_pellegrin&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;But it is never used dataset A&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;That's undoubtedly a typo.&lt;/P&gt;</description>
      <pubDate>Sat, 09 Apr 2016 08:19:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262648#M51340</guid>
      <dc:creator>jklaverstijn</dc:creator>
      <dc:date>2016-04-09T08:19:10Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262649#M51341</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/80481"&gt;@mario_pellegrin&lt;/a&gt;i wrote:&lt;BR /&gt;
&lt;P&gt;Thanks, but I have the need to use only a data step&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I am curious as to why that need exists. Can you explain? People are coding for you and this way you present a moving target.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards Jan&lt;/P&gt;</description>
      <pubDate>Sat, 09 Apr 2016 08:22:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262649#M51341</guid>
      <dc:creator>jklaverstijn</dc:creator>
      <dc:date>2016-04-09T08:22:25Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262651#M51342</link>
      <description>In my experience, things seldom "has" to be in a certain way. Most resolutions are compromises. If you are worried about disk rewrite in the extra data step, just define it as a view. Should pose very little overhead if you got memory enough. &lt;BR /&gt;&lt;BR /&gt;If you worry about the performance of your database, you should concern more about how you store your data. Like having join keys that never needs a function call to work. That should have been taken care of during data load.</description>
      <pubDate>Sat, 09 Apr 2016 09:54:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262651#M51342</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-04-09T09:54:04Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262655#M51343</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13674"&gt;@LinusH&lt;/a&gt; wrote:&lt;BR /&gt;In my experience, things seldom "has" to be in a certain way. Most resolutions are compromises. If you are worried about disk rewrite in the extra data step, just define it as a view. Should pose very little overhead if you got memory enough. &lt;BR /&gt;&lt;BR /&gt;If you worry about the performance of your database, you should concern more about how you store your data. Like having join keys that never needs a function call to work. That should have been taken care of during data load.&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I totally agree &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13674"&gt;@LinusH&lt;/a&gt;. But we can only guess. It is utlimately up to &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/3574"&gt;@mariopellegrini&lt;/a&gt; to explain.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;- Jan.&lt;/P&gt;</description>
      <pubDate>Sat, 09 Apr 2016 10:43:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/262655#M51343</guid>
      <dc:creator>jklaverstijn</dc:creator>
      <dc:date>2016-04-09T10:43:28Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/263016#M51464</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data dataset_A;
input key:$4.;
datalines;
AAA
BBB
CCC
DDD
;
run;
data dataset_B;
input key:$4.;
infile datalines dlm = ",";
datalines;
AAa
B BB
CcC
DDD
;
run;

proc ds2;

data hashjoin_dataset(overwrite=yes);
dcl package hash h ([key],[key],0,'{select upcase(compress(key)) as key from dataset_b}');
method run();
set dataset_A;
key = upcase(compress(key));
if h.find()=0 then output;
end;
enddata;
run;

quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 11 Apr 2016 20:13:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/263016#M51464</guid>
      <dc:creator>FriedEgg</dc:creator>
      <dc:date>2016-04-11T20:13:41Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/263869#M51665</link>
      <description>&lt;P&gt;I think they are two different problems:&lt;BR /&gt;1) performance of your database&lt;BR /&gt;2) how you store your data&lt;BR /&gt;in this thread I'm looking at the first point&lt;/P&gt;</description>
      <pubDate>Thu, 14 Apr 2016 13:35:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/263869#M51665</guid>
      <dc:creator>mariopellegrini</dc:creator>
      <dc:date>2016-04-14T13:35:55Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/263891#M51676</link>
      <description>&lt;P&gt;Thank you for this excellent solution. A deepening wonder: if I wanted to add a field to join?&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;I would get&lt;/SPAN&gt; &lt;SPAN&gt;this result&lt;/SPAN&gt;&lt;SPAN class=""&gt;:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data dataset_A;
input key:$4.;
datalines;
AAA
BBB
CCC
DDD
run;
data dataset_B;
input key:$4. val1;
infile datalines dlm = ",";
datalines;
AAa,10
B BB,20
CcC,30
run;
proc sql;
create table sqljoin_dataset as
select A.*, B.val1
from dataset_A A
join dataset_B B 
on compress(upcase(A.key)) = compress(upcase(B.key));
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 14 Apr 2016 14:45:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/263891#M51676</guid>
      <dc:creator>mariopellegrini</dc:creator>
      <dc:date>2016-04-14T14:45:39Z</dc:date>
    </item>
    <item>
      <title>Re: Hash join with compress(upcase) function</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/263894#M51677</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data dataset_A;
input key:$4.;
datalines;
AAA
BBB
CCC
DDD
;
run;
data dataset_B;
input key:$4. val1;
infile datalines dlm = ",";
datalines;
AAa,10
B BB,20
CcC,30
;
run;

proc ds2;

data hashjoin_dataset(overwrite=yes);
dcl double val1;
dcl package hash h ([key],[key,val1],0,'{select upcase(compress(key)) as key,val1 from dataset_b}');
method run();
set dataset_A;
key = upcase(compress(key));
if h.find()=0 then output;
end;
enddata;
run;

quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 14 Apr 2016 14:53:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Hash-join-with-compress-upcase-function/m-p/263894#M51677</guid>
      <dc:creator>FriedEgg</dc:creator>
      <dc:date>2016-04-14T14:53:29Z</dc:date>
    </item>
  </channel>
</rss>

