<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Extract Information from text in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Extract-Information-from-text/m-p/345684#M273162</link>
    <description>&lt;P&gt;The way I would approach this task:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1. Data exploration and profiling&lt;/P&gt;
&lt;P&gt;Scan through your data and collect as many different patterns of phone numbers as you can, i.e. string of 10 digits, first 2 digits 04&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2. Define the business rules&lt;/P&gt;
&lt;P&gt;Define the extraction rules for your phone numbers, i.e. string with 10 digits, first 2 digits 04 -&amp;gt; mobile number&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;3. Define sequence how to apply business rules&lt;/P&gt;
&lt;P&gt;Example: if substring of digits points to landline number but there is also the word "Mobile" in the string: What comes first? Will this be qualified as a landline or a mobile number? Eventually plan also for field which contains data quality score which in such a case wouldn't be 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;4. Implement&lt;/P&gt;
&lt;P&gt;Now that you've got the business rules and sequence of rules you can implement via a data step and a set of IF.. THEN.. ELSE statements.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;5. Test&lt;/P&gt;
&lt;P&gt;A: Run your code against&amp;nbsp;the sample data containing all the cases. Verify if result is as expected/defined in the business rules.&lt;/P&gt;
&lt;P&gt;B: Run your code agains the full data set. Check log for any signs of issues, check that for all source string a phone number could get extracted (or that there is a clear explanation why this didn't happen).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once you get stuck in step 4. on how to technically implement a business rule then come back to this forum, provide a data step which creates sample data with the source string, provide the business rule, the not yet working code you've already developed and show us how the expected result should look like when applied on the sample data you've provided.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 30 Mar 2017 08:33:23 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2017-03-30T08:33:23Z</dc:date>
    <item>
      <title>Extract Information from text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-Information-from-text/m-p/345642#M273160</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need to extract the phone numbers out of a&amp;nbsp;text field to&amp;nbsp;establish a new cleaned&amp;nbsp;contact mobile number variable&amp;nbsp;and contact landline variable.&amp;nbsp; There is no set order&amp;nbsp;the phone numbers are written in the text field.&amp;nbsp;Attached is a sample file.&amp;nbsp;&amp;nbsp;Can anyone help?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;thanks&lt;/P&gt;&lt;P&gt;Sally&lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 04:04:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-Information-from-text/m-p/345642#M273160</guid>
      <dc:creator>Selli5</dc:creator>
      <dc:date>2017-03-30T04:04:35Z</dc:date>
    </item>
    <item>
      <title>Re: Extract Information from text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-Information-from-text/m-p/345647#M273161</link>
      <description>&lt;P&gt;&lt;SPAN&gt;as of my understand to keep only numbers there are several ways .and you can use like bellow&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;num = compress(long_string, , 'kd');&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 05:07:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-Information-from-text/m-p/345647#M273161</guid>
      <dc:creator>Madansas7b</dc:creator>
      <dc:date>2017-03-30T05:07:39Z</dc:date>
    </item>
    <item>
      <title>Re: Extract Information from text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-Information-from-text/m-p/345684#M273162</link>
      <description>&lt;P&gt;The way I would approach this task:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1. Data exploration and profiling&lt;/P&gt;
&lt;P&gt;Scan through your data and collect as many different patterns of phone numbers as you can, i.e. string of 10 digits, first 2 digits 04&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2. Define the business rules&lt;/P&gt;
&lt;P&gt;Define the extraction rules for your phone numbers, i.e. string with 10 digits, first 2 digits 04 -&amp;gt; mobile number&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;3. Define sequence how to apply business rules&lt;/P&gt;
&lt;P&gt;Example: if substring of digits points to landline number but there is also the word "Mobile" in the string: What comes first? Will this be qualified as a landline or a mobile number? Eventually plan also for field which contains data quality score which in such a case wouldn't be 1&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;4. Implement&lt;/P&gt;
&lt;P&gt;Now that you've got the business rules and sequence of rules you can implement via a data step and a set of IF.. THEN.. ELSE statements.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;5. Test&lt;/P&gt;
&lt;P&gt;A: Run your code against&amp;nbsp;the sample data containing all the cases. Verify if result is as expected/defined in the business rules.&lt;/P&gt;
&lt;P&gt;B: Run your code agains the full data set. Check log for any signs of issues, check that for all source string a phone number could get extracted (or that there is a clear explanation why this didn't happen).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Once you get stuck in step 4. on how to technically implement a business rule then come back to this forum, provide a data step which creates sample data with the source string, provide the business rule, the not yet working code you've already developed and show us how the expected result should look like when applied on the sample data you've provided.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 08:33:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-Information-from-text/m-p/345684#M273162</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2017-03-30T08:33:23Z</dc:date>
    </item>
    <item>
      <title>Re: Extract Information from text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-Information-from-text/m-p/345697#M273163</link>
      <description>From your dataset i am seeing the same number with the text of specific category('land','mobile) number in the column of current_phone. Can you let us know what extactly you are looking for the output?</description>
      <pubDate>Thu, 30 Mar 2017 09:55:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-Information-from-text/m-p/345697#M273163</guid>
      <dc:creator>lakshmi_74</dc:creator>
      <dc:date>2017-03-30T09:55:48Z</dc:date>
    </item>
    <item>
      <title>Re: Extract Information from text</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Extract-Information-from-text/m-p/345991#M273164</link>
      <description>&lt;P&gt;Hi,&lt;BR /&gt;&lt;BR /&gt;I have a current phone variable which has mixed text with the phone numbers.&amp;nbsp; I would like to create a new phone variable that just had the phone numbers. &amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Secondly I would like all mobile numbers beginning with (4 or 04) to All begin with 04. &amp;nbsp;&lt;BR /&gt;And numbers beginning with 08 or 8 or 9 with a state = WA to All begin 08. and so on for all states.&lt;BR /&gt;&lt;BR /&gt;CURRENT PHONE&amp;nbsp;&amp;nbsp; &amp;nbsp;STATE&lt;BR /&gt;0418448841MOBILE&amp;nbsp;&amp;nbsp; &amp;nbsp;WA&lt;BR /&gt;FAX90227506&amp;nbsp;&amp;nbsp; &amp;nbsp;WA&lt;BR /&gt;CAR0L'S MOBILE 0409211082&amp;nbsp;&amp;nbsp; &amp;nbsp;WA&lt;BR /&gt;890213600&amp;nbsp;&amp;nbsp; &amp;nbsp;WA&lt;BR /&gt;90214178&amp;nbsp;&amp;nbsp; &amp;nbsp;WA&lt;BR /&gt;427866574&amp;nbsp;&amp;nbsp; &amp;nbsp;WA&lt;BR /&gt;(HOME90930658)&amp;nbsp;&amp;nbsp; &amp;nbsp;NSW&lt;BR /&gt;418934400&amp;nbsp;&amp;nbsp; &amp;nbsp;WA&lt;BR /&gt;0439394323/DESMOND&amp;nbsp;&amp;nbsp; &amp;nbsp;WA&lt;BR /&gt;&lt;BR /&gt;thanks&lt;BR /&gt;Sally&lt;/P&gt;</description>
      <pubDate>Thu, 30 Mar 2017 23:24:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Extract-Information-from-text/m-p/345991#M273164</guid>
      <dc:creator>Selli5</dc:creator>
      <dc:date>2017-03-30T23:24:57Z</dc:date>
    </item>
  </channel>
</rss>

