This should have been a stright forward case.
I am trying to split the contents of a column into multiple columns based on the pccurance of the string '_x000D_'
data want;
set t;
length reason1-reason11 $10;
array reason(11) $;
do _i = 1 to dim(reason);
reason[_i] = scan(lowcase(TextBox),_i,"_x000D_");
end;
run;
The string to split looks like this:
IDN: 0742520_x000D_Surname: JOHN_x000D_Given Names: Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911_x000D_DOB: _x000D__x000D_
But the result of the code is erratic and it looks like SAS is not splitting by
_x000D_
Sometimes it is splitting by "D", others by space etc, please see attached screenshot of splitting the string above
am i missing something here??
A couple of things ...
The third parameter of SCAN is a list of characters, not a word. So any instance of any of those characters is considered a delimiter.
Also note, your new variables are being defined as 10 characters long. You might want to switch to a longer length, to hold some of the longer values that your original string contains.
Take a look at the INDEX function instead of SCAN. That will find the location of your key word, and then SUBSTR can pick out the characters before the key word.
@Astounding is right in his remarks, but even if the 3rd argument of scan was a string (which is not),
then after using function lowcase thd D will change into d and you will not find the string _x000D_
Because your delimiter is a string you may want to use PRXPARSE instead of FIND/INDEXW.
Perhaps this will give you a different starting place:
data junk; string ="IDN: 0742520_x000D_Surname: JOHN_x000D_Given Names: Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911_x000D_DOB: _x000D__x000D_"; newstr = tranwrd(string,'_x000D_',' '); length reason1-reason11 $20; array reason(11) $; do _i = 1 to dim(reason); reason[_i] = scan(newstr,_i); end; run;
If you want to use SCAN() then first use TRANWRD() to convert your string into a single character. You could just pick any character that is not already in your string.
data have;
textbox=
'IDN: 0742520_x000D_Surname: JOHN_x000D_Given'
||' Names: Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911'
||'_x000D_DOB: _x000D__x000D_'
;
run;
data want;
set have ;
array reason(11) $50 ;
do i = 1 to dim(reason);
reason[i] = scan(tranwrd(TextBox,'_x000D_','|'),i,'|');
end;
drop i;
run;
1336 data _null_; 1337 set want ; 1338 put (_all_) (=:$quote./); 1339 run; textbox="IDN: 0742520_x000D_Surname: JOHN_x000D_Given Names: Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911_x000D_DOB: _x000D__x000D_" reason1="IDN: 0742520" reason2="Surname: JOHN" reason3="Given Names: Wayen Joe" reason4="Gender: Male" reason5="DOB: 01/10/1911" reason6="DOB:" reason7="" reason8="" reason9="" reason10="" reason11="" NOTE: There were 1 observations read from the data set WORK.WANT.
Note that if you're reading this data from a text file you can use the DLMSTR option on the infile statement to specify a delimiter that's a specific string.
Here is a way that you could use the DLMSTR= option on the INFILE statement. This is called the "_INFILE_ Trick".
Create a dummy file with at least one line.
filename dummy temp ;
data _null_ ;
file dummy;
put;
run;
Then create a data step that purports to read from that file, but that really takes the values of TextBox and assigns them to the _INFILE_ automatic variable so that you can use an INPUT statement to parse the string. The @1 and @@ in the INPUT statements are critical to prevent it from reading past the end of the dummy text file.
data want2 ;
infile dummy lrecl=32767 truncover dlmstr='_x000D_';
input @@;
set have ;
_infile_=textbox;
input @1 (reason1-reason10) (:$50.) @@;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.