BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jeff_DOC
Pyrite | Level 9

Good afternoon.

 

I have a list of telephone numbers and types all contained within the same string. What I'd like to do is to parse the data out at each instance of 'TEL:' into separate columns. Each column should contain 'TEL;' and all the characters to the beginning of the next "TEL;"

 

I've been doing some reading on scan but can't seem to get it to work correctly. I cannot get the array to work correctly and keep getting empty columns. 

 

I'd sure appreciate any help anyone could give me.

 

Thank you.

 

TEL;TYPE=home,voice,pref:(XXX) XXX-XXXX
TEL;TYPE=home,voice,pref:(XXX) XXX-XXXX
TEL;TYPE=cell,voice,pref:+1 (XXX) XXX-XXXX
TEL;TYPE=work,voice,pref:XXX-XXX-XXXX,TEL;TYPE=work,voice:(XXX) XXX-XXXX,TEL;TYPE=home,voice:(XXX) XXX-XXXX,TEL;TYPE=cell,voice:(XXX) XXX-XXXX,TEL;TYPE=pager:(XXX) XXX-XXXX
TEL;TYPE=work,voice,pref:XXX-XXX-XXXX,TEL;TYPE=work,voice:(XXX) XXX-XXXX,TEL;TYPE=home,voice:(XXX) XXX-XXXX,TEL;TYPE=cell,voice:(XXX) XXX-XXXX,TEL;TYPE=pager:(XXX) XXX-XXXX
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL;TYPE=cell,voice,pref:+XXXXXXXXXXX
TEL;TYPE=cell,voice,pref:+XXXXXXXXXXX
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX,TEL;TYPE=work,voice:XXXXXXXXXX
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX,TEL;TYPE=work,voice:XXXXXXXXXX
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL;TYPE=work,voice,pref:(XXX) XXX-XXXX

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

First convert it to multiple observations.  Converting it to multiple variables is a job for PROC TRANSPOSE.

data tall;
  set have;
  length num 8 string $50 ;
  do num=1 by 1 until(not loc);
     loc = find(col1,'TEL;',5);
     if loc then do;
       string=substrn(col1,1,loc-2);
       col1=substrn(col1,loc);
     end;
     else string=col1;
     output;
  end;
  drop col1 loc;
run;

So if you have this input:

data have;
  row+1;
  input;
  col1=_infile_;
cards4;
TEL;TYPE=work,voice,pref:xxx-xxx-xxxx,TEL;TYPE=work,voice:(xxx) xxx-xxxx,TEL;TYPE=home,voice:(xxx) xxx-xxxx
TEL;TYPE=work,voice,pref:xxx-xxx-xxxx
TEL;TYPE=work,voice,pref:xxx-xxx-xxxx,TEL;TYPE=work,voice:(xxx) xxx-xxxx
;;;;

You get this output:

Obs    row    num                   string

 1      1      1     TEL;TYPE=work,voice,pref:xxx-xxx-xxxx
 2      1      2     TEL;TYPE=work,voice:(xxx) xxx-xxxx
 3      1      3     TEL;TYPE=home,voice:(xxx) xxx-xxxx
 4      2      1     TEL;TYPE=work,voice,pref:xxx-xxx-xxxx
 5      3      1     TEL;TYPE=work,voice,pref:xxx-xxx-xxxx
 6      3      2     TEL;TYPE=work,voice:(xxx) xxx-xxxx


Which you can transpose into multiple variables.

You could use PROC TRANSPOSE.

proc transpose data=tall out=wide(drop=_name_) prefix=tel;
  by row;
  id num;
  var string;
run;

Or a data step:

proc sql noprint;
  select max(num) into :&ncols from tall;
quit;
data want;
  do _n_=1 to &ncols until(last.row);
    set tall;
    by row;
    array tel [&ncols] $50 ;
    tel[_n_]=string;
  end;
run;

If you know the upperbound on the number of telephone numbers you will have might even be able to use the ARRAY in the step that splits the strings and avoid making the TALL dataset.

 

 

View solution in original post

7 REPLIES 7
ballardw
Super User

Pick a couple of these and then manually create what you expect as a result.

 

If your data came from a text file it might be a good idea to provide 10 or so lines of the text as it may be easier to read than parse existing data.

HB
Barite | Level 11 HB
Barite | Level 11

I was reading it in like this:

 

data tel_line_nums;
	infile datalines truncover;
	input telco_record $200.;
datalines;
TEL,TYPE=home,voice,pref:(XXX) 555-1212
TEL,TYPE=home,voice,pref:(XXX) XXX-2222
TEL,TYPE=cell,voice,pref:+1 (XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:XXX-555-1213,TEL,TYPE=work,voice:(XXX) 555-1214,TEL,TYPE=home,voice:(XXX) XXX-XXXX,TEL,TYPE=cell,voice:(XXX) 555-1215,TEL,TYPE=mrbig:(XXX) XXX-mrbig
TEL,TYPE=work,voice,pref:XXX-XXX-XXXX,TEL,TYPE=work,voice:(XXX) XXX-XXXX,TEL,TYPE=home,voice:(XXX) XXX-XXXX,TEL,TYPE=cell,voice:(XXX) XXX-XXXX,TEL,TYPE=pager:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=cell,voice,pref:+XXXXXXXXXXX
TEL,TYPE=cell,voice,pref:+XXXXXXXXXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX,TEL,TYPE=work,voice:XXXXXXXXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX,TEL,TYPE=work,voice:XXXXXXXXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
;

but I couldn't figure out how to make it work quite right. "TEL" as the delimiter in the SCAN function affects the T in TYPE for reasons I don't understand and I don't like the trailing commas on the broken out phone numbers.

 

I changed a few X's into numbers so could track things better.      

Tom
Super User Tom
Super User

Each of the characters you list as delimiters are treated as delimiters.  So if you tell that the letter T is a delimiter than any T it sees will be treated as a delimiter.

 

If you really have lines of text you could try using the DLMSTR= option on the INFILE statement.

data want;
	infile datalines dlmstr='TEL,TYPE=' truncover;
  row+1;
  length order 8 value $100;
  do order=1 by 1 until(value=' ');
    input value @;
    if value ne ' ' or order=1 then output;
  end;
datalines4;
TEL,TYPE=home,voice,pref:(XXX) 555-1212
TEL,TYPE=home,voice,pref:(XXX) XXX-2222
TEL,TYPE=cell,voice,pref:+1 (XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:XXX-555-1213,TEL,TYPE=work,voice:(XXX) 555-1214,TEL,TYPE=home,voice:(XXX) XXX-XXXX,TEL,TYPE=cell,voice:(XXX) 555-1215,TEL,TYPE=mrbig:(XXX) XXX-mrbig
TEL,TYPE=work,voice,pref:XXX-XXX-XXXX,TEL,TYPE=work,voice:(XXX) XXX-XXXX,TEL,TYPE=home,voice:(XXX) XXX-XXXX,TEL,TYPE=cell,voice:(XXX) XXX-XXXX,TEL,TYPE=pager:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=cell,voice,pref:+XXXXXXXXXXX
TEL,TYPE=cell,voice,pref:+XXXXXXXXXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX,TEL,TYPE=work,voice:XXXXXXXXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX,TEL,TYPE=work,voice:XXXXXXXXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
TEL,TYPE=work,voice,pref:(XXX) XXX-XXXX
;;;;

proc print;
run;

Results:

Obs    row    order    value

  1      1      1      home,voice,pref:(XXX) 555-1212
  2      2      1      home,voice,pref:(XXX) XXX-2222
  3      3      1      cell,voice,pref:+1 (XXX) XXX-XXXX
  4      4      1      work,voice,pref:XXX-555-1213,
  5      4      2      work,voice:(XXX) 555-1214,
  6      4      3      home,voice:(XXX) XXX-XXXX,
  7      4      4      cell,voice:(XXX) 555-1215,
  8      4      5      mrbig:(XXX) XXX-mrbig
  9      5      1      work,voice,pref:XXX-XXX-XXXX,
 10      5      2      work,voice:(XXX) XXX-XXXX,
 11      5      3      home,voice:(XXX) XXX-XXXX,
 12      5      4      cell,voice:(XXX) XXX-XXXX,
 13      5      5      pager:(XXX) XXX-XXXX
 14      6      1      work,voice,pref:(XXX) XXX-XXXX
 15      7      1      work,voice,pref:(XXX) XXX-XXXX
 16      8      1      work,voice,pref:(XXX) XXX-XXXX
 17      9      1      work,voice,pref:(XXX) XXX-XXXX
 18     10      1      cell,voice,pref:+XXXXXXXXXXX
 19     11      1      cell,voice,pref:+XXXXXXXXXXX
 20     12      1      work,voice,pref:(XXX) XXX-XXXX,
 21     12      2      work,voice:XXXXXXXXXX
 22     13      1      work,voice,pref:(XXX) XXX-XXXX,
 23     13      2      work,voice:XXXXXXXXXX
 24     14      1      work,voice,pref:(XXX) XXX-XXXX
 25     15      1      work,voice,pref:(XXX) XXX-XXXX

If you have it as a character variable the perhaps use TRANWRD to convert the TEL or TEL,TYPE= into a single character like | or ^ that you could then use with SCAN().  Or try figuring out regular expressions.

ballardw
Super User

You do not say how you will use the data if you have these values in different variables on a single observation. Often that is not the best approach for many tasks. However using @Tom 's solution you could transpose the data by the Row variable to get the values on one observation.

 

Thank you Tom for providing the read solution. I thought something along this line would be very possible and yours is cleaner than my thoughts.

Jeff_DOC
Pyrite | Level 9

Hi Tom.

 

Thanks for taking the time to reply. I should have been more clear about a few things. The data is an imported Excel sheet and I'm hoping to get each TEL to it's own column (see the example attached.

 

  COL 1 COL 2 COL3
HAVE TEL;TYPE=work,voice,pref:xxx-xxx-xxxx,TEL;TYPE=work,voice:(xxx) xxx-xxxx,TEL;TYPE=home,voice:(xxx) xxx-xxxx    
WANT TEL;TYPE=work,voice,pref:xxx-xxx-xxxx TEL;TYPE=work,voice:(xxx) xxx-xxxx TEL;TYPE=home,voice:(xxx) xxx-xxxx
Tom
Super User Tom
Super User

First convert it to multiple observations.  Converting it to multiple variables is a job for PROC TRANSPOSE.

data tall;
  set have;
  length num 8 string $50 ;
  do num=1 by 1 until(not loc);
     loc = find(col1,'TEL;',5);
     if loc then do;
       string=substrn(col1,1,loc-2);
       col1=substrn(col1,loc);
     end;
     else string=col1;
     output;
  end;
  drop col1 loc;
run;

So if you have this input:

data have;
  row+1;
  input;
  col1=_infile_;
cards4;
TEL;TYPE=work,voice,pref:xxx-xxx-xxxx,TEL;TYPE=work,voice:(xxx) xxx-xxxx,TEL;TYPE=home,voice:(xxx) xxx-xxxx
TEL;TYPE=work,voice,pref:xxx-xxx-xxxx
TEL;TYPE=work,voice,pref:xxx-xxx-xxxx,TEL;TYPE=work,voice:(xxx) xxx-xxxx
;;;;

You get this output:

Obs    row    num                   string

 1      1      1     TEL;TYPE=work,voice,pref:xxx-xxx-xxxx
 2      1      2     TEL;TYPE=work,voice:(xxx) xxx-xxxx
 3      1      3     TEL;TYPE=home,voice:(xxx) xxx-xxxx
 4      2      1     TEL;TYPE=work,voice,pref:xxx-xxx-xxxx
 5      3      1     TEL;TYPE=work,voice,pref:xxx-xxx-xxxx
 6      3      2     TEL;TYPE=work,voice:(xxx) xxx-xxxx


Which you can transpose into multiple variables.

You could use PROC TRANSPOSE.

proc transpose data=tall out=wide(drop=_name_) prefix=tel;
  by row;
  id num;
  var string;
run;

Or a data step:

proc sql noprint;
  select max(num) into :&ncols from tall;
quit;
data want;
  do _n_=1 to &ncols until(last.row);
    set tall;
    by row;
    array tel [&ncols] $50 ;
    tel[_n_]=string;
  end;
run;

If you know the upperbound on the number of telephone numbers you will have might even be able to use the ARRAY in the step that splits the strings and avoid making the TALL dataset.

 

 

Jeff_DOC
Pyrite | Level 9

This worked perfectly! Thank you very much for taking your time to help.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 752 views
  • 0 likes
  • 4 in conversation