SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MB_Analyst
Obsidian | Level 7

I am new to using Regex expressions using SAS and trying to parse out the a phone number from a string of text. I read through this document and also this one. This is what I have so far, and I'm not getting the result I'm looking for:

 

DATA in_string;
 INPUT @1 STRING $50.;
DATALINES;Here is a phone number 111-123-4567other type of number:TT44679111-234-6583number is (231)-390-5710this has two: 123-523-4545 222-333-4444
;

data out_string ;
 	set in_string;
	if _n_ = 1 then do;
 		phone_prx= prxparse("/ \(?(\d\d\d).(\d\d\d).(\d\d\d\d)/");
 	end;
	retain phone_prx ;
	pos_phone= prxmatch(phone_prx,string);
	call prxposn(phone_prx, 1, areacode_pos);
	call prxposn(phone_prx, 2, phone1_pos);
	call prxposn(phone_prx, 3, phone2_pos);

	length phonenumber $15;

	phonenumber= substrn(string, areacode_pos, 3) || "-" || 
				 substrn(string, phone1_pos, 3) || "-" || 
				substrn(string,phone2_pos, 4);
run; 

 The output I would like is:

obs phonenumber

1 111-123-4567

2 111-234-6583

3 (231)-390-5710

4 123-523-4545 222-333-4444 

 

**I've tried editing the text to make the code more clear, but some spacing isn't working.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
DATA in_string;
 INPUT STRING $50.;
DATALINES;
Here is a phone number 111-123-4567
other type of number:TT44679111-234-6583
number is (231)-390-5710
this has two: 123-523-4545 222-333-4444
;

data want;
 set in_string;
 obs+1;
 pid=prxparse('/\(?\d{3}\)?-\d{3}-\d{4}/');
 s=1;e=length(string);
 call prxnext(pid,s,e,string,p,l);
 do while(p>0);
  want=substr(string,p,l);
  output;
  call prxnext(pid,s,e,string,p,l);
end;
drop s e p l pid;
run;

View solution in original post

4 REPLIES 4
Ksharp
Super User
DATA in_string;
 INPUT STRING $50.;
DATALINES;
Here is a phone number 111-123-4567
other type of number:TT44679111-234-6583
number is (231)-390-5710
this has two: 123-523-4545 222-333-4444
;

data want;
 set in_string;
 obs+1;
 pid=prxparse('/\(?\d{3}\)?-\d{3}-\d{4}/');
 s=1;e=length(string);
 call prxnext(pid,s,e,string,p,l);
 do while(p>0);
  want=substr(string,p,l);
  output;
  call prxnext(pid,s,e,string,p,l);
end;
drop s e p l pid;
run;
ballardw
Super User

Extensions? Do you want an extension if found?

Country codes or other country dialing strings?

Other separators than dash or none at all? Or Dash and space like (231) 390-5710

 

MB_Analyst
Obsidian | Level 7

Country codes and extensions shouldn't arise, but I guess I would want to be as flexible as possible when it comes to formats. Parenthesis, spaces/no spaces, dashes, could all be used. How is it best to approach casting a wide net to find the phone numbers?

ballardw
Super User

@MB_Analyst wrote:

Country codes and extensions shouldn't arise, but I guess I would want to be as flexible as possible when it comes to formats. Parenthesis, spaces/no spaces, dashes, could all be used. How is it best to approach casting a wide net to find the phone numbers?


Haven't really got a suggestion on specific issues other than possibly sequential processing of different rules to find different candidates.

I brought up some issues that appeared in phone number fields from when I worked with a company where we needed to parse numbers to create calling strings for CATI software. Even though the data came from one company the way their staff recorded things could vary quite a bit and these are some of the simpler cases.

I didn't for instance bring up the "or" such as 555-123-4567/8855 where either of the last 4 digits could be used.

 

Extensions could well be of concern if folks are entering them in a form of 555-123-4567-1 (or more digits in the last place) or with one of the other delimiters.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 3142 views
  • 2 likes
  • 3 in conversation