07-10-2017 03:22 PM
This should have been a stright forward case.
I am trying to split the contents of a column into multiple columns based on the pccurance of the string '_x000D_'
data want; set t; length reason1-reason11 $10; array reason(11) $; do _i = 1 to dim(reason); reason[_i] = scan(lowcase(TextBox),_i,"_x000D_"); end; run;
The string to split looks like this:
IDN: 0742520_x000D_Surname: JOHN_x000D_Given Names: Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911_x000D_DOB: _x000D__x000D_
But the result of the code is erratic and it looks like SAS is not splitting by
Sometimes it is splitting by "D", others by space etc, please see attached screenshot of splitting the string above
am i missing something here??
07-10-2017 03:34 PM
A couple of things ...
The third parameter of SCAN is a list of characters, not a word. So any instance of any of those characters is considered a delimiter.
Also note, your new variables are being defined as 10 characters long. You might want to switch to a longer length, to hold some of the longer values that your original string contains.
Take a look at the INDEX function instead of SCAN. That will find the location of your key word, and then SUBSTR can pick out the characters before the key word.
07-10-2017 03:42 PM
@Astounding is right in his remarks, but even if the 3rd argument of scan was a string (which is not),
then after using function lowcase thd D will change into d and you will not find the string _x000D_
07-10-2017 04:06 PM
Perhaps this will give you a different starting place:
data junk; string ="IDN: 0742520_x000D_Surname: JOHN_x000D_Given Names: Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911_x000D_DOB: _x000D__x000D_"; newstr = tranwrd(string,'_x000D_',' '); length reason1-reason11 $20; array reason(11) $; do _i = 1 to dim(reason); reason[_i] = scan(newstr,_i); end; run;
07-10-2017 05:11 PM
If you want to use SCAN() then first use TRANWRD() to convert your string into a single character. You could just pick any character that is not already in your string.
data have; textbox= 'IDN: 0742520_x000D_Surname: JOHN_x000D_Given' ||' Names: Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911' ||'_x000D_DOB: _x000D__x000D_' ; run; data want; set have ; array reason(11) $50 ; do i = 1 to dim(reason); reason[i] = scan(tranwrd(TextBox,'_x000D_','|'),i,'|'); end; drop i; run;
1336 data _null_; 1337 set want ; 1338 put (_all_) (=:$quote./); 1339 run; textbox="IDN: 0742520_x000D_Surname: JOHN_x000D_Given Names: Wayen Joe_x000D_Gender: Male_x000D_DOB: 01/10/1911_x000D_DOB: _x000D__x000D_" reason1="IDN: 0742520" reason2="Surname: JOHN" reason3="Given Names: Wayen Joe" reason4="Gender: Male" reason5="DOB: 01/10/1911" reason6="DOB:" reason7="" reason8="" reason9="" reason10="" reason11="" NOTE: There were 1 observations read from the data set WORK.WANT.
07-10-2017 05:18 PM
Note that if you're reading this data from a text file you can use the DLMSTR option on the infile statement to specify a delimiter that's a specific string.
07-10-2017 05:21 PM
Here is a way that you could use the DLMSTR= option on the INFILE statement. This is called the "_INFILE_ Trick".
Create a dummy file with at least one line.
filename dummy temp ; data _null_ ; file dummy; put; run;
Then create a data step that purports to read from that file, but that really takes the values of TextBox and assigns them to the _INFILE_ automatic variable so that you can use an INPUT statement to parse the string. The @1 and @@ in the INPUT statements are critical to prevent it from reading past the end of the dummy text file.
data want2 ; infile dummy lrecl=32767 truncover dlmstr='_x000D_'; input @@; set have ; _infile_=textbox; input @1 (reason1-reason10) (:$50.) @@; run;