Strange behavior when splitting a column

Reply
Frequent Contributor
Posts: 135

Strange behavior when splitting a column

This should have been a stright forward case.

I am trying to split the contents of a column into multiple columns based on the pccurance of the string '_x000D_'

 

 

data want;
  set t;
  length reason1-reason11 $10;
  array reason(11) $;
  do _i = 1 to dim(reason);
    reason[_i] = scan(lowcase(TextBox),_i,"_x000D_");
  end;
run;

The string to split looks like this:

 

 

IDN:         0742520_x000D_Surname:         JOHN_x000D_Given Names:   Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911_x000D_DOB: _x000D__x000D_

 

But the result of the code is erratic and it looks like SAS is not splitting by 

_x000D_

 

Sometimes it is splitting by "D", others by space etc, please see attached screenshot of splitting the string above

 

am i missing something here??

 

 


Screen Shot 2017-07-10 at 9.20.10 pm.pngScreen Shot 2017-07-10 at 9.21.47 pm.png
Esteemed Advisor
Posts: 5,007

Re: Strange behavior when splitting a column

A couple of things ...

 

The third parameter of SCAN is a list of characters, not a word.  So any instance of any of those characters is considered a delimiter.

 

Also note, your new variables are being defined as 10 characters long.  You might want to switch to a longer length, to hold some of the longer values that your original string contains.

 

Take a look at the INDEX function instead of SCAN.  That will find the location of your key word, and then SUBSTR can pick out the characters before the key word.

 

 

Super User
Posts: 1,256

Re: Strange behavior when splitting a column

@Astounding is right in his remarks, but even if the 3rd argument of scan was a string (which is not),

then after using function lowcase thd D will change into and you will not find the string _x000D_

Grand Advisor
Posts: 17,464

Re: Strange behavior when splitting a column

Because your delimiter is a string you may want to use PRXPARSE instead of FIND/INDEXW. 

Grand Advisor
Posts: 10,252

Re: Strange behavior when splitting a column

Perhaps this will give you a different starting place:

data junk;
string ="IDN:         0742520_x000D_Surname:         JOHN_x000D_Given Names:   Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911_x000D_DOB: _x000D__x000D_";
newstr = tranwrd(string,'_x000D_',' ');
  length reason1-reason11 $20;
  array reason(11) $;
  do _i = 1 to dim(reason);
    reason[_i] = scan(newstr,_i);
  end;

run;
Super User
Super User
Posts: 6,389

Re: Strange behavior when splitting a column

If you want to use SCAN() then first use TRANWRD() to convert your string into a single character.  You could just pick any character that is not already in your string.

data have;
  textbox=
  'IDN:         0742520_x000D_Surname:         JOHN_x000D_Given'
||' Names:   Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911'
||'_x000D_DOB: _x000D__x000D_'
  ;
run;

data want;
  set have ;
  array reason(11) $50 ;
  do i = 1 to dim(reason);
    reason[i] = scan(tranwrd(TextBox,'_x000D_','|'),i,'|');
  end;
  drop i;
run;
1336  data _null_;
1337    set want ;
1338    put (_all_) (=:$quote./);
1339  run;

textbox="IDN:         0742520_x000D_Surname:         JOHN_x000D_Given Names:   Wayen Joe_x000D_Gender:
 Male_x000D_DOB: 01/10/1911_x000D_DOB: _x000D__x000D_"
reason1="IDN:         0742520"
reason2="Surname:         JOHN"
reason3="Given Names:   Wayen Joe"
reason4="Gender: Male"
reason5="DOB: 01/10/1911"
reason6="DOB:"
reason7=""
reason8=""
reason9=""
reason10=""
reason11=""
NOTE: There were 1 observations read from the data set WORK.WANT.

 

 

Grand Advisor
Posts: 17,464

Re: Strange behavior when splitting a column

Note that if you're reading this data from a text file you can use the DLMSTR option on the infile statement to specify a delimiter that's a specific string. 

Super User
Super User
Posts: 6,389

Re: Strange behavior when splitting a column

Here is a way that you could use the DLMSTR= option on the INFILE statement. This is called the "_INFILE_ Trick".

Create a dummy file with at least one line.

filename dummy temp ;
data _null_ ; 
  file dummy;
  put; 
run;

Then create a data step that purports to read from that file, but that really takes the values of TextBox and assigns them to the _INFILE_ automatic variable so that you can use an INPUT statement to parse the string.  The @1 and @@ in the INPUT statements are critical to prevent it from reading past the end of the dummy text file.

data want2 ;
  infile dummy lrecl=32767 truncover dlmstr='_x000D_';
  input @@;
  set have ;
  _infile_=textbox;
  input @1 (reason1-reason10) (:$50.) @@;
run;
Ask a Question
Discussion stats
  • 7 replies
  • 93 views
  • 3 likes
  • 6 in conversation