BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MB_Analyst
Obsidian | Level 7

I am new to using Regex expressions using SAS and trying to parse out the a phone number from a string of text. I read through this document and also this one. This is what I have so far, and I'm not getting the result I'm looking for:

 

DATA in_string;
 INPUT @1 STRING $50.;
DATALINES;
Here is a phone number 111-123-4567
other type of number:TT44679
111-234-6583
number is (231)-390-5710
this has two: 123-523-4545 222-333-4444 ; data out_string ; set in_string; if _n_ = 1 then do; phone_prx= prxparse("/ \(?(\d\d\d).(\d\d\d).(\d\d\d\d)/"); end; retain phone_prx ; pos_phone= prxmatch(phone_prx,string); call prxposn(phone_prx, 1, areacode_pos); call prxposn(phone_prx, 2, phone1_pos); call prxposn(phone_prx, 3, phone2_pos); length phonenumber $15; phonenumber= substrn(string, areacode_pos, 3) || "-" || substrn(string, phone1_pos, 3) || "-" || substrn(string,phone2_pos, 4); run;

 The output I would like is:

obs  phonenumber

1 111-123-4567

2 111-234-6583

3 (231)-390-5710

4 123-523-4545 222-333-4444 

 

**I've tried editing the text to make the code more clear, but some spacing isn't working.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
DATA in_string;
 INPUT STRING $50.;
DATALINES;
Here is a phone number 111-123-4567
other type of number:TT44679111-234-6583
number is (231)-390-5710
this has two: 123-523-4545 222-333-4444
;

data want;
 set in_string;
 obs+1;
 pid=prxparse('/\(?\d{3}\)?-\d{3}-\d{4}/');
 s=1;e=length(string);
 call prxnext(pid,s,e,string,p,l);
 do while(p>0);
  want=substr(string,p,l);
  output;
  call prxnext(pid,s,e,string,p,l);
end;
drop s e p l pid;
run;

View solution in original post

4 REPLIES 4
Ksharp
Super User
DATA in_string;
 INPUT STRING $50.;
DATALINES;
Here is a phone number 111-123-4567
other type of number:TT44679111-234-6583
number is (231)-390-5710
this has two: 123-523-4545 222-333-4444
;

data want;
 set in_string;
 obs+1;
 pid=prxparse('/\(?\d{3}\)?-\d{3}-\d{4}/');
 s=1;e=length(string);
 call prxnext(pid,s,e,string,p,l);
 do while(p>0);
  want=substr(string,p,l);
  output;
  call prxnext(pid,s,e,string,p,l);
end;
drop s e p l pid;
run;
ballardw
Super User

Extensions? Do you want an extension if found?

Country codes or other country dialing strings?

Other separators than dash or none at all? Or Dash and space like (231) 390-5710

 

MB_Analyst
Obsidian | Level 7

Country codes and extensions shouldn't arise, but I guess I would want to be as flexible as possible when it comes to formats. Parenthesis, spaces/no spaces, dashes, could all be used. How is it best to approach casting a wide net to find the phone numbers?

ballardw
Super User

@MB_Analyst wrote:

Country codes and extensions shouldn't arise, but I guess I would want to be as flexible as possible when it comes to formats. Parenthesis, spaces/no spaces, dashes, could all be used. How is it best to approach casting a wide net to find the phone numbers?


Haven't really got a suggestion on specific issues other than possibly sequential processing of different rules to find different candidates.

I brought up some issues that appeared in phone number fields from when I worked with a company where we needed to parse numbers to create calling strings for CATI software. Even though the data came from one company the way their staff recorded things could vary quite a bit and these are some of the simpler cases.

I didn't for instance bring up the "or" such as 555-123-4567/8855 where either of the last 4 digits could be used.

 

Extensions could well be of concern if folks are entering them in a form of 555-123-4567-1 (or more digits in the last place) or with one of the other delimiters.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 2252 views
  • 2 likes
  • 3 in conversation