<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Joining/Matching on URL in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/460933#M117191</link>
    <description>&lt;P&gt;Hi Ody,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I recommend to use regular expressions,if you want to search/substitue complex patterns in strings (like url's).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;SAS provides the prxmatch function for such tasks&amp;nbsp;(for further informations&amp;nbsp;I recommend the&amp;nbsp;SAS help)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;create table URL_Match as&lt;BR /&gt;select&lt;BR /&gt;a.url,&lt;BR /&gt;b.match,&lt;BR /&gt;b.field1,&lt;BR /&gt;b.field2&lt;BR /&gt;from test1 as a&lt;BR /&gt;inner join test as b&lt;BR /&gt;on&amp;nbsp; prxmatch("/\/"!!strip(b.match)!!"\//",a.url)&amp;gt;0;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regular expressions are&amp;nbsp;very powerful/standarized and available&amp;nbsp;in&amp;nbsp;a lot of modern programming languages.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can find a lot of tutorials on the web,&amp;nbsp;I strongly recommend to take a look.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 09 May 2018 08:08:18 GMT</pubDate>
    <dc:creator>Guardian</dc:creator>
    <dc:date>2018-05-09T08:08:18Z</dc:date>
    <item>
      <title>Joining/Matching on URL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/460849#M117169</link>
      <description>&lt;P&gt;Hello folks,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a large table of URL info (and associated fields) I would like to categorize using a partial match to info found in the URL. Instead of writing a case statement to account for this, what would be the best method?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I've come across a few cases where people are using index in the where clause to match but that's not working 100% for me.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here is some example data.&lt;/P&gt;
&lt;PRE&gt;data test;
infile datalines dsd truncover;
input
match:$200.
field1:$200.
field2:$200.
;

datalines;
'test', 'val1', 'val2'
'testing', 'val3', 'val4'
'tested', 'val5', 'val6'
;;;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;data test1;&lt;BR /&gt;infile datalines dsd truncover;&lt;BR /&gt;input&lt;BR /&gt;url:$200.&lt;BR /&gt;;&lt;BR /&gt;&lt;BR /&gt;datalines;&lt;BR /&gt;'www.facebook.com/test/fsdfne'&lt;BR /&gt;'www.facebook.com/testing/deeheheksjbd'&lt;BR /&gt;'www.facebook.com/tested/5wnf4wljn/4rwef'&lt;BR /&gt;;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If I try some code like what's below Im getting a cartesian product. I only want to match on the exact characters from 'test.match'. How can I accomplish this?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As always, appreciate any help!&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sql;&lt;/P&gt;
&lt;P&gt;create table URL_Match as&lt;/P&gt;
&lt;P&gt;select&amp;nbsp;&lt;/P&gt;
&lt;P&gt;a.url,&lt;/P&gt;
&lt;P&gt;b.match,&lt;/P&gt;
&lt;P&gt;b.field1,&lt;/P&gt;
&lt;P&gt;b.field2&lt;/P&gt;
&lt;P&gt;from test1 a, test b&lt;/P&gt;
&lt;P&gt;where index(upper(a.url),strip(upper(b.match))) &amp;gt; 0;&lt;/P&gt;
&lt;P&gt;quit;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 May 2018 20:36:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/460849#M117169</guid>
      <dc:creator>Ody</dc:creator>
      <dc:date>2018-05-08T20:36:51Z</dc:date>
    </item>
    <item>
      <title>Re: Joining/Matching on URL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/460855#M117170</link>
      <description>&lt;P&gt;Can you show what you want the output to look like.&lt;/P&gt;
&lt;P&gt;What actual role do field1 and field2 play in the output?&lt;/P&gt;
&lt;P&gt;How many sets of these values are you going to need to look for?&lt;/P&gt;
&lt;P&gt;Is there a particular reason you are defining your variables match, field1 and field2 as 200 characters?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You might look into FINDW instead of Index in this case as "test" will match "testing" and "tested". Findw allows setting&amp;nbsp;one or more&amp;nbsp;delimiters so only words that match for the entire length&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;proc sql;
   create table URL_Match as
   select 
   a.url,
   b.match,
   b.field1,
   b.field2
   from test1 a , 
         test as b
   where findw(upcase(a.url),strip(upcase(b.match)),'/') &amp;gt; 0;
quit;&lt;/PRE&gt;</description>
      <pubDate>Tue, 08 May 2018 20:55:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/460855#M117170</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2018-05-08T20:55:38Z</dc:date>
    </item>
    <item>
      <title>Re: Joining/Matching on URL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/460862#M117171</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="results.jpg" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/20422iDAC1AC86F5DB647F/image-size/medium?v=v2&amp;amp;px=400" role="button" title="results.jpg" alt="results.jpg" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Results look like this.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 May 2018 21:02:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/460862#M117171</guid>
      <dc:creator>Ody</dc:creator>
      <dc:date>2018-05-08T21:02:18Z</dc:date>
    </item>
    <item>
      <title>Re: Joining/Matching on URL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/460933#M117191</link>
      <description>&lt;P&gt;Hi Ody,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I recommend to use regular expressions,if you want to search/substitue complex patterns in strings (like url's).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;SAS provides the prxmatch function for such tasks&amp;nbsp;(for further informations&amp;nbsp;I recommend the&amp;nbsp;SAS help)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;create table URL_Match as&lt;BR /&gt;select&lt;BR /&gt;a.url,&lt;BR /&gt;b.match,&lt;BR /&gt;b.field1,&lt;BR /&gt;b.field2&lt;BR /&gt;from test1 as a&lt;BR /&gt;inner join test as b&lt;BR /&gt;on&amp;nbsp; prxmatch("/\/"!!strip(b.match)!!"\//",a.url)&amp;gt;0;&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regular expressions are&amp;nbsp;very powerful/standarized and available&amp;nbsp;in&amp;nbsp;a lot of modern programming languages.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You can find a lot of tutorials on the web,&amp;nbsp;I strongly recommend to take a look.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 08:08:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/460933#M117191</guid>
      <dc:creator>Guardian</dc:creator>
      <dc:date>2018-05-09T08:08:18Z</dc:date>
    </item>
    <item>
      <title>Re: Joining/Matching on URL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/461365#M117358</link>
      <description>&lt;P&gt;Thanks for the feedback and potential solutions.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Ultimately, I need to rethink my approach to this problem. The URLs I'm trying to match against often contain multiple values (keywords) from my lookup list so I need to rethink how those lookup values get prioritized, otherwise I'm still gonna end up with multiple matches for one URL.&lt;/P&gt;</description>
      <pubDate>Thu, 10 May 2018 17:03:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/461365#M117358</guid>
      <dc:creator>Ody</dc:creator>
      <dc:date>2018-05-10T17:03:36Z</dc:date>
    </item>
    <item>
      <title>Re: Joining/Matching on URL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/461433#M117389</link>
      <description>&lt;P&gt;So, after some more research today I think that PRXMATCH is the way to go however I'm not sure how to account for metacharacters in my URL strings.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example, when I try to match on this URL i get an error: prxmatch('/home.synchrony.com/summer/')&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have tried changing this to &lt;SPAN&gt;prxmatch('/home\.synchrony\.com\/summer/') to count the period and forward slash as regular characters but it's not working well for me. I'm not sure what I'm doing wrong. Thoughts?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 10 May 2018 22:27:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/461433#M117389</guid>
      <dc:creator>Ody</dc:creator>
      <dc:date>2018-05-10T22:27:55Z</dc:date>
    </item>
    <item>
      <title>Re: Joining/Matching on URL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/461446#M117392</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/29074"&gt;@Ody&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;If you want to use characters with special meaning as normal characters then mask them with a backward slash. For the example you've posted that would look as below:&lt;/P&gt;
&lt;PRE&gt;  prxmatch('/home\.synchrony\.com\/summer/',&amp;lt;your variable&amp;gt;)&lt;/PRE&gt;
&lt;P&gt;If you don't need to define patterns like in your sample string (which is just a constant string) then I'd use index() of find() as these functions are much less resource hungry.&lt;/P&gt;</description>
      <pubDate>Thu, 10 May 2018 23:05:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/461446#M117392</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2018-05-10T23:05:45Z</dc:date>
    </item>
    <item>
      <title>Re: Joining/Matching on URL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/461455#M117399</link>
      <description>&lt;P&gt;Thanks for the feedback.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Unfortunately I need explicit definitions for my my string search. I've tried find/index, and their variations, to no avail. I had some success with PRXMatch but kept running into issues with the metacharacters.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'll give this a go tomorrow, thanks again.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;edit: Thanks for the help. Solution verified.&lt;/P&gt;</description>
      <pubDate>Tue, 15 May 2018 02:30:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-Matching-on-URL/m-p/461455#M117399</guid>
      <dc:creator>Ody</dc:creator>
      <dc:date>2018-05-15T02:30:25Z</dc:date>
    </item>
  </channel>
</rss>

