BookmarkSubscribeRSS Feed
iressa131
Calcite | Level 5

Hello!

 

I'm trying to extract phone numbers from a data set where they are included in cells with text data (such as emails and name, etc). How would I go about doing this? Another problem is that the entries are not all formatted the same. For example the phone numbers are written in different ways as follows. 

123.456.7890

123-456-7890

(123) 456-7890

 

There are 3500 entries so there may be variations I haven't seen.

 

Thank you for any help you can provide!

 

6 REPLIES 6
Reeza
Super User

@iressa131 wrote:

Hello!

 

I'm trying to extract phone numbers from a data set where they are included in cells with text data (such as emails and name, etc). How would I go about doing this? Another problem is that the entries are not all formatted the same. For example the phone numbers are written in different ways as follows. 

123.456.7890

123-456-7890

(123) 456-7890

 

There are 3500 entries so there may be variations I haven't seen.

 

Thank you for any help you can provide!

 



Use the COMPRESS() function. 

 

PhoneNum = compress(oldVarName, , ‘kd’);

 

 

iressa131
Calcite | Level 5

Thanks for the reply! When I run the code I get the followingScreen Shot 2019-02-02 at 10.47.38 AM.png

iressa131
Calcite | Level 5

I just realized the quotation marks are incorrect because Im using university. so now when I run it I don't get any errors however my output only shows 1 observation with no entry. 

PROC CONTENTS DATA=WORK.IMPORT; RUN;
data WORK.IMPORT;
PhoneNum = compress(agent_details, ,'kd'); 
run;
Reeza
Super User

@iressa131 wrote:

I just realized the quotation marks are incorrect because Im using university. so now when I run it I don't get any errors however my output only shows 1 observation with no entry. 

PROC CONTENTS DATA=WORK.IMPORT; RUN;
data WORK.IMPORT;
PhoneNum = compress(agent_details, ,'kd'); 
run;

You didn't include an input data set, in fact you overwrote your data. You'll need to import it again. 

 

data Step1; *output data set name is STEP1;

set IMPORT; *input data set name is IMPORT;

PhoneNum = compress(agent_details, ,'kd'); 

run;
Ksharp
Super User
data have;
input x $40. ;
want=prxchange('s/\D*(\d\d\d)\D+(\d\d\d)\D+(\d\d\d\d)\D*/$1 $2 $3/o',-1,x);
cards;
123.456.7890
123-456-7890
(123) 456-7890
;
run;


proc print;run;

ballardw
Super User

@iressa131 wrote:

Hello!

 

I'm trying to extract phone numbers from a data set where they are included in cells with text data (such as emails and name, etc). How would I go about doing this? Another problem is that the entries are not all formatted the same. For example the phone numbers are written in different ways as follows. 

123.456.7890

123-456-7890

(123) 456-7890

 

There are 3500 entries so there may be variations I haven't seen.

 

Thank you for any help you can provide!

 


Check the lengths of any of these after the digits only are preserved. You may find extensions or international codes in longer than expected values, missing area codes (7 digit).

 

Then go back to who ever designed the data entry and use a somewhat more modern approach that has area code, exchange (the second 3 digit group) and number entry that will collect these consistently.

Collecting cleanly is preferred over cleaning collection.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 885 views
  • 2 likes
  • 4 in conversation