Extract Information from text

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Extract Information from text

Hello,

 

I need to extract the phone numbers out of a text field to establish a new cleaned contact mobile number variable and contact landline variable.  There is no set order the phone numbers are written in the text field. Attached is a sample file.  Can anyone help?

 

thanks

Sally


Accepted Solutions
Solution
‎04-05-2017 12:12 AM
Occasional Contributor
Posts: 12

Re: Extract Information from text

as of my understand to keep only numbers there are several ways .and you can use like bellow

 

num = compress(long_string, , 'kd');

View solution in original post


All Replies
Solution
‎04-05-2017 12:12 AM
Occasional Contributor
Posts: 12

Re: Extract Information from text

as of my understand to keep only numbers there are several ways .and you can use like bellow

 

num = compress(long_string, , 'kd');

Respected Advisor
Posts: 4,131

Re: Extract Information from text

[ Edited ]

The way I would approach this task:

 

1. Data exploration and profiling

Scan through your data and collect as many different patterns of phone numbers as you can, i.e. string of 10 digits, first 2 digits 04 

 

2. Define the business rules

Define the extraction rules for your phone numbers, i.e. string with 10 digits, first 2 digits 04 -> mobile number

 

3. Define sequence how to apply business rules

Example: if substring of digits points to landline number but there is also the word "Mobile" in the string: What comes first? Will this be qualified as a landline or a mobile number? Eventually plan also for field which contains data quality score which in such a case wouldn't be 1

 

4. Implement

Now that you've got the business rules and sequence of rules you can implement via a data step and a set of IF.. THEN.. ELSE statements.

 

5. Test

A: Run your code against the sample data containing all the cases. Verify if result is as expected/defined in the business rules.

B: Run your code agains the full data set. Check log for any signs of issues, check that for all source string a phone number could get extracted (or that there is a clear explanation why this didn't happen).

 

 

Once you get stuck in step 4. on how to technically implement a business rule then come back to this forum, provide a data step which creates sample data with the source string, provide the business rule, the not yet working code you've already developed and show us how the expected result should look like when applied on the sample data you've provided.

 

Contributor
Posts: 56

Re: Extract Information from text

From your dataset i am seeing the same number with the text of specific category('land','mobile) number in the column of current_phone. Can you let us know what extactly you are looking for the output?
Occasional Contributor
Posts: 10

Re: Extract Information from text

Hi,

I have a current phone variable which has mixed text with the phone numbers.  I would like to create a new phone variable that just had the phone numbers.  

Secondly I would like all mobile numbers beginning with (4 or 04) to All begin with 04.  
And numbers beginning with 08 or 8 or 9 with a state = WA to All begin 08. and so on for all states.

CURRENT PHONE    STATE
0418448841MOBILE    WA
FAX90227506    WA
CAR0L'S MOBILE 0409211082    WA
890213600    WA
90214178    WA
427866574    WA
(HOME90930658)    NSW
418934400    WA
0439394323/DESMOND    WA

thanks
Sally

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 233 views
  • 1 like
  • 4 in conversation