<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: PROC SQL Joining on Substrings? in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/PROC-SQL-Joining-on-Substrings/m-p/682009#M79492</link>
    <description>&lt;P&gt;Function COMPBL compress the text to a single space between words&lt;/P&gt;
&lt;P&gt;Function UPCASE eliminates case sensitivity ("A" in not equal to 'a').&lt;/P&gt;
&lt;P&gt;Function Compress with delimiters '()'&amp;nbsp; - i/e/ both parenthesis - will remove them from the text.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The only issue that I did not deal with is row # 5 - order of sub-strings (a.o. smith).&lt;/P&gt;</description>
    <pubDate>Mon, 07 Sep 2020 12:56:37 GMT</pubDate>
    <dc:creator>Shmuel</dc:creator>
    <dc:date>2020-09-07T12:56:37Z</dc:date>
    <item>
      <title>PROC SQL Joining on Substrings?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/PROC-SQL-Joining-on-Substrings/m-p/681996#M79489</link>
      <description>&lt;P&gt;I am trying to merge two datasets, but the problem is that my data is not in such a great format across either table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My primary identifier in both tables is under "Company_Name".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The issues I am having can be seen if we place the tables side by side.&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Row#&lt;/TD&gt;&lt;TD&gt;Table 1 Companies&lt;/TD&gt;&lt;TD&gt;Table 2 Companies&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;1ST SOURCE CORPORATION&lt;/TD&gt;&lt;TD&gt;1ST SOURCE CORP&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;3D SYSTEMS&lt;/TD&gt;&lt;TD&gt;3D SYSTEMS CORP&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;3M COMPANY&lt;/TD&gt;&lt;TD&gt;3M CO&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;A.H. BELO&lt;/TD&gt;&lt;TD&gt;A. H. BELO CORP&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;A.O. SMITH&lt;/TD&gt;&lt;TD&gt;SMITH (A.O.)&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;ACADIA HEALTHCARE CO&lt;/TD&gt;&lt;TD&gt;ACADIA HEALTHCARE&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;7&lt;/TD&gt;&lt;TD&gt;ACADIA PHARMACEUTICALS&amp;nbsp;&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;If I use the code:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
	CREATE TABLE WANT AS
	SELECT l.*, r.*
	FROM TABLE1 AS l LEFT JOIN TABLE2 AS r
	ON	(l.company_name = r.company_name) and l.year = r.year;
QUIT;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Then the above values will not find a match, even though they should be matched.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Since these 5 rows are the most common examples of a match failing in my data, I was wondering if&amp;nbsp;&lt;/P&gt;&lt;P&gt;there were a way to join tables contingent on matching substrings match?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In Row 1 &amp;amp; 3, Table 2 is a substring of Table 1, in perfect order&lt;/P&gt;&lt;P&gt;In Row 2 &amp;amp; 4, Table 1 is a substring of Table 2, in perfect order (although in Row 4 there is a space after the first period, in table 2)&lt;/P&gt;&lt;P&gt;In Row 5, Table 1 is substring of Table 2, but the values are out of order.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In Row 6 and 7 - we have two companies with similar names - "Acadia" is present in both. In that case, I want to create the match between the entries which have more matching substrings. In this case, Row 6 has two matching words, whereas row 7 would only have had one.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I understand this will probably take a bit of guess work and fiddling around, as I can imagine that matching on substrings may lead to incorrect matches. In any case, I would really appreciate some help to get started.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Sep 2020 11:29:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/PROC-SQL-Joining-on-Substrings/m-p/681996#M79489</guid>
      <dc:creator>UniversitySas</dc:creator>
      <dc:date>2020-09-07T11:29:52Z</dc:date>
    </item>
    <item>
      <title>Re: PROC SQL Joining on Substrings?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/PROC-SQL-Joining-on-Substrings/m-p/682006#M79490</link>
      <description>&lt;P&gt;Next code may give you a preparation to the join :&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data table1;
   input company $CHAR50.;
cards4;
1ST SOURCE CORPORATION
3D SYSTEMS
3M COMPANY
A.H. BELO
A.O. SMITH
ACADIA HEALTHCARE CO
ACADIA PHARMACEUTICALS 
;;;;
run;
data table2;
   input company $CHAR50.;
cards4;
1ST SOURCE CORP
3D SYSTEMS CORP
3M CO
A. H. BELO CORP
SMITH (A.O.)
ACADIA HEALTHCARE
;;;;
run;

%macro short(text);
   &amp;amp;TEXT = COMPRESS(UPCASE(COMPBL(&amp;amp;TEXT)),'()');
   &amp;amp;text = tranwrd(&amp;amp;text,'CORPORATION','CORP');
   &amp;amp;text = tranwrd(&amp;amp;text,'COMPANY','CO');
%MEND SHORT;

DATA TABLE1A; SET TABLE1; %SHORT(COMPANY);RUN;
DATA TABLE2A; SET TABLE2; %SHORT(COMPANY);RUN;

&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Check tables TABLE1A TABLE2A;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Sep 2020 12:03:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/PROC-SQL-Joining-on-Substrings/m-p/682006#M79490</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2020-09-07T12:03:04Z</dc:date>
    </item>
    <item>
      <title>Re: PROC SQL Joining on Substrings?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/PROC-SQL-Joining-on-Substrings/m-p/682007#M79491</link>
      <description>Ah, I see what you are doing.&lt;BR /&gt;&lt;BR /&gt;Could I confirm what the line:&lt;BR /&gt;&amp;amp;TEXT = COMPRESS(UPCASE(COMPBL(&amp;amp;TEXT)),'()');&lt;BR /&gt;&lt;BR /&gt;accomplishes?&lt;BR /&gt;</description>
      <pubDate>Mon, 07 Sep 2020 12:12:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/PROC-SQL-Joining-on-Substrings/m-p/682007#M79491</guid>
      <dc:creator>UniversitySas</dc:creator>
      <dc:date>2020-09-07T12:12:41Z</dc:date>
    </item>
    <item>
      <title>Re: PROC SQL Joining on Substrings?</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/PROC-SQL-Joining-on-Substrings/m-p/682009#M79492</link>
      <description>&lt;P&gt;Function COMPBL compress the text to a single space between words&lt;/P&gt;
&lt;P&gt;Function UPCASE eliminates case sensitivity ("A" in not equal to 'a').&lt;/P&gt;
&lt;P&gt;Function Compress with delimiters '()'&amp;nbsp; - i/e/ both parenthesis - will remove them from the text.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The only issue that I did not deal with is row # 5 - order of sub-strings (a.o. smith).&lt;/P&gt;</description>
      <pubDate>Mon, 07 Sep 2020 12:56:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/PROC-SQL-Joining-on-Substrings/m-p/682009#M79492</guid>
      <dc:creator>Shmuel</dc:creator>
      <dc:date>2020-09-07T12:56:37Z</dc:date>
    </item>
  </channel>
</rss>

