<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Regex to find position of string in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Regex-to-find-position-of-string/m-p/229562#M41557</link>
    <description>&lt;P&gt;spot on...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;thank you very much.&lt;/P&gt;</description>
    <pubDate>Mon, 12 Oct 2015 16:33:54 GMT</pubDate>
    <dc:creator>UMAnalyst</dc:creator>
    <dc:date>2015-10-12T16:33:54Z</dc:date>
    <item>
      <title>Regex to find position of string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Regex-to-find-position-of-string/m-p/229524#M41548</link>
      <description>&lt;P&gt;I have a huge file (&amp;gt;200,000 obs) containing street addresses. I need to clean these data as well as possible. SAS states that one should remove the unit, apt number, etc from the street address before geocoding. I plan on using the position and length output from PRXSUBSTR to extract the apartment number from the street address.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So, I can get this to work:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data _null_;&lt;BR /&gt;&amp;nbsp; patternID = prxparse('/(\d+)$/');&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt; /* Use PRXSUBSTR to find the position and length of the string. */&lt;BR /&gt; call prxsubstr(patternID, '12345&amp;nbsp;CONFUSED ST 1100', position, length);&lt;/P&gt;
&lt;P&gt;put position= length=;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But, when I apply this code to the&amp;nbsp;data (on 'ADDRESSVAR' $50):&lt;/P&gt;
&lt;P&gt;data OUT;&lt;/P&gt;
&lt;P&gt;set IN;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; patternID = prxparse('/(\d+)$/');&lt;/P&gt;
&lt;P&gt;call prxsubstr(patternID, ADDRESSVAR, position, length);&lt;/P&gt;
&lt;P&gt;put position= length=;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I get position = 0 and length = 0 for each obs. What am I missing?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks for your help.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Oct 2015 12:09:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Regex-to-find-position-of-string/m-p/229524#M41548</guid>
      <dc:creator>UMAnalyst</dc:creator>
      <dc:date>2015-10-12T12:09:38Z</dc:date>
    </item>
    <item>
      <title>Re: Regex to find position of string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Regex-to-find-position-of-string/m-p/229555#M41553</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think it's due to the length of the ADDRESSVAR variable in the IN data set. The $ character is looking for the end of the line and ADDRESSVAR in the IN data set will be padded at the end.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you wrap ADDRESSVAR in the TRIM function within the PRXSUBSTR, it will work.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Try this:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data OUT;&lt;BR /&gt;set IN;&lt;BR /&gt; patternID = prxparse('/(\d+)$/');&lt;BR /&gt;call prxsubstr(patternID, trim(ADDRESSVAR), position, length);&lt;BR /&gt;put position= length=;&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;- John&lt;/P&gt;</description>
      <pubDate>Mon, 12 Oct 2015 16:03:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Regex-to-find-position-of-string/m-p/229555#M41553</guid>
      <dc:creator>jnvickery</dc:creator>
      <dc:date>2015-10-12T16:03:12Z</dc:date>
    </item>
    <item>
      <title>Re: Regex to find position of string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Regex-to-find-position-of-string/m-p/229561#M41556</link>
      <description>&lt;P&gt;Hi ,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sometimes there is a&amp;nbsp;possibility that the address may not exactly have the apartment number at the end of the address, but&amp;nbsp;consider that it might follow "ST", in that case our task is to extract the apartment number following the "ST". So please try the below regular expression&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input address$50.;
id=prxparse('/(\w\s\d+)/');
call prxsubstr(id,address,start,length);
put start= length=;
new2=substr(address,start+1,length);
cards;
12345 CONFUSED ST 1100
;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The expression \w will look for a letter followed by space(\s) and then&amp;nbsp;followed by digits(\d+). This will recognise the apartnumber alternatively to&amp;nbsp;the above code.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;while extracting the apartnumber we use the substr function using the start and length variable. To the start variable please +1 so that it will skip the first letter and only extract the digit portion.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this helps.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Jag&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Oct 2015 16:33:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Regex-to-find-position-of-string/m-p/229561#M41556</guid>
      <dc:creator>Jagadishkatam</dc:creator>
      <dc:date>2015-10-12T16:33:48Z</dc:date>
    </item>
    <item>
      <title>Re: Regex to find position of string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Regex-to-find-position-of-string/m-p/229562#M41557</link>
      <description>&lt;P&gt;spot on...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;thank you very much.&lt;/P&gt;</description>
      <pubDate>Mon, 12 Oct 2015 16:33:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Regex-to-find-position-of-string/m-p/229562#M41557</guid>
      <dc:creator>UMAnalyst</dc:creator>
      <dc:date>2015-10-12T16:33:54Z</dc:date>
    </item>
    <item>
      <title>Re: Regex to find position of string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Regex-to-find-position-of-string/m-p/229614#M41564</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12503"&gt;@UMAnalyst﻿&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;Just throwing in another way of how to extract a sub-string from a string using RegEx.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc format;
  invalue $street_num
    's/^[^\d]*(\d+)\s*$/\1/oi' (regexpe) = _same_
    other=' '
    ;
run;

data sample;
  infile datalines truncover;
  input addressvar $char50.;
  length street_num $5.;
  street_num=input(addressvar,$street_num.);
  datalines;
AAAAAAAAAAAAAaAA aaa 40 
BBBB40
CCC40xx
40
  40
40xx
;
run; 
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;A href="https://support.sas.com/resources/papers/proceedings12/245-2012.pdf" target="_blank"&gt;https://support.sas.com/resources/papers/proceedings12/245-2012.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 13 Oct 2015 00:08:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Regex-to-find-position-of-string/m-p/229614#M41564</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2015-10-13T00:08:39Z</dc:date>
    </item>
    <item>
      <title>Re: Regex to find position of string</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Regex-to-find-position-of-string/m-p/229615#M41565</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/3156"&gt;@jnvickery﻿&lt;/a&gt;,&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12503"&gt;@UMAnalyst﻿&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;You need to change the code as below using a retain statement for "patternID" as else a new version of the RegEx will get compiled in every single iteration of the data step. This is not only unnecessary and very inefficient it also clutters memory.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data OUT;
set IN;
retain patternID ;
if _n_=1 then patternID = prxparse('/(\d+)$/');
call prxsubstr(patternID, trim(ADDRESSVAR), position, length);
put position= length=;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 13 Oct 2015 00:12:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Regex-to-find-position-of-string/m-p/229615#M41565</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2015-10-13T00:12:49Z</dc:date>
    </item>
  </channel>
</rss>

