BookmarkSubscribeRSS Feed
ammarhm
Lapis Lazuli | Level 10

This should have been a stright forward case.

I am trying to split the contents of a column into multiple columns based on the pccurance of the string '_x000D_'

 

 

data want;
  set t;
  length reason1-reason11 $10;
  array reason(11) $;
  do _i = 1 to dim(reason);
    reason[_i] = scan(lowcase(TextBox),_i,"_x000D_");
  end;
run;

The string to split looks like this:

 

 

IDN:         0742520_x000D_Surname:         JOHN_x000D_Given Names:   Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911_x000D_DOB: _x000D__x000D_

 

But the result of the code is erratic and it looks like SAS is not splitting by 

_x000D_

 

Sometimes it is splitting by "D", others by space etc, please see attached screenshot of splitting the string above

 

am i missing something here??

 

 


Screen Shot 2017-07-10 at 9.20.10 pm.pngScreen Shot 2017-07-10 at 9.21.47 pm.png
7 REPLIES 7
Astounding
PROC Star

A couple of things ...

 

The third parameter of SCAN is a list of characters, not a word.  So any instance of any of those characters is considered a delimiter.

 

Also note, your new variables are being defined as 10 characters long.  You might want to switch to a longer length, to hold some of the longer values that your original string contains.

 

Take a look at the INDEX function instead of SCAN.  That will find the location of your key word, and then SUBSTR can pick out the characters before the key word.

 

 

Shmuel
Garnet | Level 18

@Astounding is right in his remarks, but even if the 3rd argument of scan was a string (which is not),

then after using function lowcase thd D will change into and you will not find the string _x000D_

Reeza
Super User

Because your delimiter is a string you may want to use PRXPARSE instead of FIND/INDEXW. 

ballardw
Super User

Perhaps this will give you a different starting place:

data junk;
string ="IDN:         0742520_x000D_Surname:         JOHN_x000D_Given Names:   Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911_x000D_DOB: _x000D__x000D_";
newstr = tranwrd(string,'_x000D_',' ');
  length reason1-reason11 $20;
  array reason(11) $;
  do _i = 1 to dim(reason);
    reason[_i] = scan(newstr,_i);
  end;

run;
Tom
Super User Tom
Super User

If you want to use SCAN() then first use TRANWRD() to convert your string into a single character.  You could just pick any character that is not already in your string.

data have;
  textbox=
  'IDN:         0742520_x000D_Surname:         JOHN_x000D_Given'
||' Names:   Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911'
||'_x000D_DOB: _x000D__x000D_'
  ;
run;

data want;
  set have ;
  array reason(11) $50 ;
  do i = 1 to dim(reason);
    reason[i] = scan(tranwrd(TextBox,'_x000D_','|'),i,'|');
  end;
  drop i;
run;
1336  data _null_;
1337    set want ;
1338    put (_all_) (=:$quote./);
1339  run;

textbox="IDN:         0742520_x000D_Surname:         JOHN_x000D_Given Names:   Wayen Joe_x000D_Gender:
 Male_x000D_DOB: 01/10/1911_x000D_DOB: _x000D__x000D_"
reason1="IDN:         0742520"
reason2="Surname:         JOHN"
reason3="Given Names:   Wayen Joe"
reason4="Gender: Male"
reason5="DOB: 01/10/1911"
reason6="DOB:"
reason7=""
reason8=""
reason9=""
reason10=""
reason11=""
NOTE: There were 1 observations read from the data set WORK.WANT.

 

 

Reeza
Super User

Note that if you're reading this data from a text file you can use the DLMSTR option on the infile statement to specify a delimiter that's a specific string. 

Tom
Super User Tom
Super User

Here is a way that you could use the DLMSTR= option on the INFILE statement. This is called the "_INFILE_ Trick".

Create a dummy file with at least one line.

filename dummy temp ;
data _null_ ; 
  file dummy;
  put; 
run;

Then create a data step that purports to read from that file, but that really takes the values of TextBox and assigns them to the _INFILE_ automatic variable so that you can use an INPUT statement to parse the string.  The @1 and @@ in the INPUT statements are critical to prevent it from reading past the end of the dummy text file.

data want2 ;
  infile dummy lrecl=32767 truncover dlmstr='_x000D_';
  input @@;
  set have ;
  _infile_=textbox;
  input @1 (reason1-reason10) (:$50.) @@;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 971 views
  • 3 likes
  • 6 in conversation