12-15-2015 02:03 PM
I currently process strings that contain comma seperated substrings.
The Example below demonstrates that I can succesfully access each of these substrings using the SCAN function...
...unless there is a substring (here, it is the 3rd one) that was enclosed in quotations marks because it contains the delimiter, i.e. the comma. In this case, the substring is only partially read until the comma occurs. Usually, it is suggested to use the q modifier to account for this situation. Unfortunately, the q modifier (just remove the comment symbols from the code below) does not work for me because of the unmatched quotation marks that can occur in the strings.
Is there an option that I'm missing or an elegant alternative to account for both: (1) substrings containing the delimiter enclosed in quotation marks and (2) unmatched quotation marks?
The wanted outcome is:
2: Jack's Place
3: Blue Sea, LA
As always, any help is appreciated!
data _null_; have = "Barcelona,Jack's Place,""Blue Sea, LA"""; put have; do i=1 to 3; want = scan( have, i, ',', /*'q'*/ ); put i ":" want; end; run;
12-15-2015 03:11 PM
This is an assignment in SAS. But how is the data sored at the very source? DBMS table, flat file, XML...?
And secondly, what's the application? I understand that you supply some simple non-relevant data just for simplicity, but it would be good to know what and why you are doing this.
12-15-2015 03:49 PM
It's not that elegant, but it should be workable. Perhaps someone who knows parsing functions can do better.
have_copy = have;
do i=1 to 3;
if have_copy =: ',' then have_copy = left(substr(have_copy, 2));
if have_copy =: '""' then do;
have_copy = substr(have_copy, 3);
end_of_string = index(have_copy, '""') - 1;
want = substr(have_copy, 1, end_of_string);
have_copy = substr(have_copy, end_of_string + 2);
want = scan(have_copy, 1, ',');
have_copy = left(substr(have_copy, length(want) + 1));
put i ':' want;
It's untested, but should be workable (perhaps with a small amount of tweaking). The LEFT function might be overkill, but removes blanks that follow a delimiting comma. Good luck.
12-15-2015 04:18 PM - edited 12-15-2015 04:32 PM
if you can be assured that only the double quote character will be used to quote strings with delimiters then you could just move the single quotes out of the way to make it work. SAS ought to let you specify what quote character you want to use with the 'Q' modifier. Perhaps borrowing the '1' and '2' modifiers from the CATQ() function.
So one way might be to translate the single quote to a vertical bar (or some other invalid character) and then back again.
want = translate(scanq( translate(have,'|',"'"), i, ',', 'qm' ),"'",'|');
You also might want to add a DEQUOTE() function call to remove the quotes.
I also notice that SCAN() works if you add double quotes around the values that contain single quotes.
have = "Barcelona,""Jack's Place"",""Blue Sea, LA""";
12-15-2015 09:49 PM
Really not easy .
data _null_; have = "Barcelona,Jack's Place,""Blue Sea, LA"""; put have; pid=prxparse('/[^",]+|"[^"]+"/'); start = 1; stop = length(have); call prxnext(pid, start, stop, have, position, length); do while (position > 0); i+1; found = dequote(substr(have, position, length)); put i ":" found ; call prxnext(pid, start, stop, have, position, length); end; run;
12-15-2015 11:16 PM
As Ksharp suggested you could use the _INFILE_ trick to convert. Basically you read one dummy record from CARDS or a dummy file and then replace the automatic variable _INFILE_ with the content of your character variable. You can then use INPUT statement to read the data.
data have; have = "Barcelona,Jack's Place,""Blue Sea, LA"""; run; data want ; set have ; length want $50; infile cards dsd truncover ; input @1 @ ; _infile_ = have; do i=1 to 3; input want @; output ; end; cards; ; proc print; run;