Hello!
I'm trying to extract phone numbers from a data set where they are included in cells with text data (such as emails and name, etc). How would I go about doing this? Another problem is that the entries are not all formatted the same. For example the phone numbers are written in different ways as follows.
123.456.7890
123-456-7890
(123) 456-7890
There are 3500 entries so there may be variations I haven't seen.
Thank you for any help you can provide!
@iressa131 wrote:
Hello!
I'm trying to extract phone numbers from a data set where they are included in cells with text data (such as emails and name, etc). How would I go about doing this? Another problem is that the entries are not all formatted the same. For example the phone numbers are written in different ways as follows.
123.456.7890
123-456-7890
(123) 456-7890
There are 3500 entries so there may be variations I haven't seen.
Thank you for any help you can provide!
Use the COMPRESS() function.
PhoneNum = compress(oldVarName, , ‘kd’);
Thanks for the reply! When I run the code I get the following
I just realized the quotation marks are incorrect because Im using university. so now when I run it I don't get any errors however my output only shows 1 observation with no entry.
PROC CONTENTS DATA=WORK.IMPORT; RUN;
data WORK.IMPORT;
PhoneNum = compress(agent_details, ,'kd');
run;
@iressa131 wrote:
I just realized the quotation marks are incorrect because Im using university. so now when I run it I don't get any errors however my output only shows 1 observation with no entry.
PROC CONTENTS DATA=WORK.IMPORT; RUN; data WORK.IMPORT; PhoneNum = compress(agent_details, ,'kd'); run;
You didn't include an input data set, in fact you overwrote your data. You'll need to import it again.
data Step1; *output data set name is STEP1; set IMPORT; *input data set name is IMPORT; PhoneNum = compress(agent_details, ,'kd'); run;
data have;
input x $40. ;
want=prxchange('s/\D*(\d\d\d)\D+(\d\d\d)\D+(\d\d\d\d)\D*/$1 $2 $3/o',-1,x);
cards;
123.456.7890
123-456-7890
(123) 456-7890
;
run;
proc print;run;
@iressa131 wrote:
Hello!
I'm trying to extract phone numbers from a data set where they are included in cells with text data (such as emails and name, etc). How would I go about doing this? Another problem is that the entries are not all formatted the same. For example the phone numbers are written in different ways as follows.
123.456.7890
123-456-7890
(123) 456-7890
There are 3500 entries so there may be variations I haven't seen.
Thank you for any help you can provide!
Check the lengths of any of these after the digits only are preserved. You may find extensions or international codes in longer than expected values, missing area codes (7 digit).
Then go back to who ever designed the data entry and use a somewhat more modern approach that has area code, exchange (the second 3 digit group) and number entry that will collect these consistently.
Collecting cleanly is preferred over cleaning collection.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.