Good afternoon.
I have a list of telephone numbers and types all contained within the same string. What I'd like to do is to parse the data out at each instance of 'TEL:' into separate columns. Each column should contain 'TEL;' and all the characters to the beginning of the next "TEL;"
I've been doing some reading on scan but can't seem to get it to work correctly. I cannot get the array to work correctly and keep getting empty columns.
I'd sure appreciate any help anyone could give me.
Thank you.
TEL;TYPE=home,voice,pref:(XXX) XXX-XXXX |
TEL;TYPE=home,voice,pref:(XXX) XXX-XXXX |
TEL;TYPE=cell,voice,pref:+1 (XXX) XXX-XXXX |
TEL;TYPE=work,voice,pref:XXX-XXX-XXXX,TEL;TYPE=work,voice:(XXX) XXX-XXXX,TEL;TYPE=home,voice:(XXX) XXX-XXXX,TEL;TYPE=cell,voice:(XXX) XXX-XXXX,TEL;TYPE=pager:(XXX) XXX-XXXX |
TEL;TYPE=work,voice,pref:XXX-XXX-XXXX,TEL;TYPE=work,voice:(XXX) XXX-XXXX,TEL;TYPE=home,voice:(XXX) XXX-XXXX,TEL;TYPE=cell,voice:(XXX) XXX-XXXX,TEL;TYPE=pager:(XXX) XXX-XXXX |
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX |
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX |
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX |
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX |
TEL;TYPE=cell,voice,pref:+XXXXXXXXXXX |
TEL;TYPE=cell,voice,pref:+XXXXXXXXXXX |
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX,TEL;TYPE=work,voice:XXXXXXXXXX |
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX,TEL;TYPE=work,voice:XXXXXXXXXX |
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX |
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX |
First convert it to multiple observations. Converting it to multiple variables is a job for PROC TRANSPOSE.
data tall;
set have;
length num 8 string $50 ;
do num=1 by 1 until(not loc);
loc = find(col1,'TEL;',5);
if loc then do;
string=substrn(col1,1,loc-2);
col1=substrn(col1,loc);
end;
else string=col1;
output;
end;
drop col1 loc;
run;
So if you have this input:
data have;
row+1;
input;
col1=_infile_;
cards4;
TEL;TYPE=work,voice,pref:xxx-xxx-xxxx,TEL;TYPE=work,voice:(xxx) xxx-xxxx,TEL;TYPE=home,voice:(xxx) xxx-xxxx
TEL;TYPE=work,voice,pref:xxx-xxx-xxxx
TEL;TYPE=work,voice,pref:xxx-xxx-xxxx,TEL;TYPE=work,voice:(xxx) xxx-xxxx
;;;;
You get this output:
Obs row num string 1 1 1 TEL;TYPE=work,voice,pref:xxx-xxx-xxxx 2 1 2 TEL;TYPE=work,voice:(xxx) xxx-xxxx 3 1 3 TEL;TYPE=home,voice:(xxx) xxx-xxxx 4 2 1 TEL;TYPE=work,voice,pref:xxx-xxx-xxxx 5 3 1 TEL;TYPE=work,voice,pref:xxx-xxx-xxxx 6 3 2 TEL;TYPE=work,voice:(xxx) xxx-xxxx
Which you can transpose into multiple variables.
You could use PROC TRANSPOSE.
proc transpose data=tall out=wide(drop=_name_) prefix=tel;
by row;
id num;
var string;
run;
Or a data step:
proc sql noprint;
select max(num) into :&ncols from tall;
quit;
data want;
do _n_=1 to &ncols until(last.row);
set tall;
by row;
array tel [&ncols] $50 ;
tel[_n_]=string;
end;
run;
If you know the upperbound on the number of telephone numbers you will have might even be able to use the ARRAY in the step that splits the strings and avoid making the TALL dataset.
Pick a couple of these and then manually create what you expect as a result.
If your data came from a text file it might be a good idea to provide 10 or so lines of the text as it may be easier to read than parse existing data.
I was reading it in like this:
data tel_line_nums;
infile datalines truncover;
input telco_record $200.;
datalines;
TEL,TYPE=home,voice,pref:(XXX) 555-1212
TEL,TYPE=home,voice,pref:(XXX) XXX-2222
TEL,TYPE=cell,voice,pref:+1 (XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:XXX-555-1213,TEL,TYPE=work,voice:(XXX) 555-1214,TEL,TYPE=home,voice:(XXX) XXX-XXXX,TEL,TYPE=cell,voice:(XXX) 555-1215,TEL,TYPE=mrbig:(XXX) XXX-mrbig
TEL,TYPE=work,voice,pref:XXX-XXX-XXXX,TEL,TYPE=work,voice:(XXX) XXX-XXXX,TEL,TYPE=home,voice:(XXX) XXX-XXXX,TEL,TYPE=cell,voice:(XXX) XXX-XXXX,TEL,TYPE=pager:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=cell,voice,pref:+XXXXXXXXXXX
TEL,TYPE=cell,voice,pref:+XXXXXXXXXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX,TEL,TYPE=work,voice:XXXXXXXXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX,TEL,TYPE=work,voice:XXXXXXXXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
;
but I couldn't figure out how to make it work quite right. "TEL" as the delimiter in the SCAN function affects the T in TYPE for reasons I don't understand and I don't like the trailing commas on the broken out phone numbers.
I changed a few X's into numbers so could track things better.
Each of the characters you list as delimiters are treated as delimiters. So if you tell that the letter T is a delimiter than any T it sees will be treated as a delimiter.
If you really have lines of text you could try using the DLMSTR= option on the INFILE statement.
data want;
infile datalines dlmstr='TEL,TYPE=' truncover;
row+1;
length order 8 value $100;
do order=1 by 1 until(value=' ');
input value @;
if value ne ' ' or order=1 then output;
end;
datalines4;
TEL,TYPE=home,voice,pref:(XXX) 555-1212
TEL,TYPE=home,voice,pref:(XXX) XXX-2222
TEL,TYPE=cell,voice,pref:+1 (XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:XXX-555-1213,TEL,TYPE=work,voice:(XXX) 555-1214,TEL,TYPE=home,voice:(XXX) XXX-XXXX,TEL,TYPE=cell,voice:(XXX) 555-1215,TEL,TYPE=mrbig:(XXX) XXX-mrbig
TEL,TYPE=work,voice,pref:XXX-XXX-XXXX,TEL,TYPE=work,voice:(XXX) XXX-XXXX,TEL,TYPE=home,voice:(XXX) XXX-XXXX,TEL,TYPE=cell,voice:(XXX) XXX-XXXX,TEL,TYPE=pager:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=cell,voice,pref:+XXXXXXXXXXX
TEL,TYPE=cell,voice,pref:+XXXXXXXXXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX,TEL,TYPE=work,voice:XXXXXXXXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX,TEL,TYPE=work,voice:XXXXXXXXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
;;;;
proc print;
run;
Results:
Obs row order value 1 1 1 home,voice,pref:(XXX) 555-1212 2 2 1 home,voice,pref:(XXX) XXX-2222 3 3 1 cell,voice,pref:+1 (XXX) XXX-XXXX 4 4 1 work,voice,pref:XXX-555-1213, 5 4 2 work,voice:(XXX) 555-1214, 6 4 3 home,voice:(XXX) XXX-XXXX, 7 4 4 cell,voice:(XXX) 555-1215, 8 4 5 mrbig:(XXX) XXX-mrbig 9 5 1 work,voice,pref:XXX-XXX-XXXX, 10 5 2 work,voice:(XXX) XXX-XXXX, 11 5 3 home,voice:(XXX) XXX-XXXX, 12 5 4 cell,voice:(XXX) XXX-XXXX, 13 5 5 pager:(XXX) XXX-XXXX 14 6 1 work,voice,pref:(XXX) XXX-XXXX 15 7 1 work,voice,pref:(XXX) XXX-XXXX 16 8 1 work,voice,pref:(XXX) XXX-XXXX 17 9 1 work,voice,pref:(XXX) XXX-XXXX 18 10 1 cell,voice,pref:+XXXXXXXXXXX 19 11 1 cell,voice,pref:+XXXXXXXXXXX 20 12 1 work,voice,pref:(XXX) XXX-XXXX, 21 12 2 work,voice:XXXXXXXXXX 22 13 1 work,voice,pref:(XXX) XXX-XXXX, 23 13 2 work,voice:XXXXXXXXXX 24 14 1 work,voice,pref:(XXX) XXX-XXXX 25 15 1 work,voice,pref:(XXX) XXX-XXXX
If you have it as a character variable the perhaps use TRANWRD to convert the TEL or TEL,TYPE= into a single character like | or ^ that you could then use with SCAN(). Or try figuring out regular expressions.
You do not say how you will use the data if you have these values in different variables on a single observation. Often that is not the best approach for many tasks. However using @Tom 's solution you could transpose the data by the Row variable to get the values on one observation.
Thank you Tom for providing the read solution. I thought something along this line would be very possible and yours is cleaner than my thoughts.
Hi Tom.
Thanks for taking the time to reply. I should have been more clear about a few things. The data is an imported Excel sheet and I'm hoping to get each TEL to it's own column (see the example attached.
COL 1 | COL 2 | COL3 | |
HAVE | TEL;TYPE=work,voice,pref:xxx-xxx-xxxx,TEL;TYPE=work,voice:(xxx) xxx-xxxx,TEL;TYPE=home,voice:(xxx) xxx-xxxx | ||
WANT | TEL;TYPE=work,voice,pref:xxx-xxx-xxxx | TEL;TYPE=work,voice:(xxx) xxx-xxxx | TEL;TYPE=home,voice:(xxx) xxx-xxxx |
First convert it to multiple observations. Converting it to multiple variables is a job for PROC TRANSPOSE.
data tall;
set have;
length num 8 string $50 ;
do num=1 by 1 until(not loc);
loc = find(col1,'TEL;',5);
if loc then do;
string=substrn(col1,1,loc-2);
col1=substrn(col1,loc);
end;
else string=col1;
output;
end;
drop col1 loc;
run;
So if you have this input:
data have;
row+1;
input;
col1=_infile_;
cards4;
TEL;TYPE=work,voice,pref:xxx-xxx-xxxx,TEL;TYPE=work,voice:(xxx) xxx-xxxx,TEL;TYPE=home,voice:(xxx) xxx-xxxx
TEL;TYPE=work,voice,pref:xxx-xxx-xxxx
TEL;TYPE=work,voice,pref:xxx-xxx-xxxx,TEL;TYPE=work,voice:(xxx) xxx-xxxx
;;;;
You get this output:
Obs row num string 1 1 1 TEL;TYPE=work,voice,pref:xxx-xxx-xxxx 2 1 2 TEL;TYPE=work,voice:(xxx) xxx-xxxx 3 1 3 TEL;TYPE=home,voice:(xxx) xxx-xxxx 4 2 1 TEL;TYPE=work,voice,pref:xxx-xxx-xxxx 5 3 1 TEL;TYPE=work,voice,pref:xxx-xxx-xxxx 6 3 2 TEL;TYPE=work,voice:(xxx) xxx-xxxx
Which you can transpose into multiple variables.
You could use PROC TRANSPOSE.
proc transpose data=tall out=wide(drop=_name_) prefix=tel;
by row;
id num;
var string;
run;
Or a data step:
proc sql noprint;
select max(num) into :&ncols from tall;
quit;
data want;
do _n_=1 to &ncols until(last.row);
set tall;
by row;
array tel [&ncols] $50 ;
tel[_n_]=string;
end;
run;
If you know the upperbound on the number of telephone numbers you will have might even be able to use the ARRAY in the step that splits the strings and avoid making the TALL dataset.
This worked perfectly! Thank you very much for taking your time to help.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.