DATA Step, Macro, Functions and more

SCAN function dealing w/ unmatched quotation marks

Reply
Contributor
Posts: 38

SCAN function dealing w/ unmatched quotation marks

Hi all,

 

I currently process strings that contain comma seperated substrings.

 

The Example below demonstrates that I can succesfully access each of these substrings using the SCAN function...

...unless there is a substring (here, it is the 3rd one) that was enclosed in quotations marks because it contains the delimiter, i.e. the comma. In this case, the substring is only partially read until the comma occurs. Usually, it is suggested to use the q modifier to account for this situation. Unfortunately, the q modifier (just remove the comment symbols from the code below) does not work for me because of the unmatched quotation marks that can occur in the strings.

 

Is there an option that I'm missing or an elegant alternative to account for both: (1) substrings containing the delimiter enclosed in quotation marks and (2) unmatched quotation marks?

 

The wanted outcome is:

1: Barcelona

2: Jack's Place

3: Blue Sea, LA

 

As always, any help is appreciated!

 

 

data _null_;

	have = "Barcelona,Jack's Place,""Blue Sea, LA""";

	put have;

	do i=1 to 3;
		want = scan( have, i, ',', /*'q'*/ );
		put i ":" want;
	end;

run;

 

 

Super User
Posts: 5,433

Re: SCAN function dealing w/ unmatched quotation marks

Posted in reply to Georg_UPB

This is an assignment in SAS. But how is the data sored at the very source? DBMS table, flat file, XML...?

And secondly, what's the application? I understand that you supply some simple non-relevant data just for simplicity, but it would be good to know what and why you are doing this.

Data never sleeps
Super User
Posts: 5,515

Re: SCAN function dealing w/ unmatched quotation marks

Posted in reply to Georg_UPB

It's not that elegant, but it should be workable.  Perhaps someone who knows parsing functions can do better.

 

have_copy = have;

do i=1 to 3;

   if have_copy =: ',' then have_copy = left(substr(have_copy, 2));

    if have_copy =: '""' then do;

      have_copy = substr(have_copy, 3);

      end_of_string = index(have_copy, '""') - 1;

      want = substr(have_copy, 1, end_of_string);

      have_copy = substr(have_copy, end_of_string + 2);

   end;

   else do;

      want = scan(have_copy, 1, ',');

      have_copy = left(substr(have_copy, length(want) + 1));

   end;

   put i ':' want;

end;

 

It's untested, but should be workable (perhaps with a small amount of tweaking).  The LEFT function might be overkill, but removes blanks that follow a delimiting comma.  Good luck.

Super User
Super User
Posts: 7,065

Re: SCAN function dealing w/ unmatched quotation marks

[ Edited ]
Posted in reply to Georg_UPB

if you can be assured that only the double quote character will be used to quote strings with delimiters then you could just move the single quotes out of the way to make it work.  SAS ought to let you specify what quote character you want to use with the 'Q' modifier.  Perhaps borrowing the '1' and '2' modifiers from the CATQ() function.

 

So one way might be to translate the single quote to a vertical bar (or some other invalid character) and then back again.

 

 want = translate(scanq( translate(have,'|',"'"), i, ',', 'qm' ),"'",'|');

 

You also might want to add a DEQUOTE() function call to remove the quotes.

want=dequote(want);

 

I also notice that SCAN() works if you add double quotes around the values that contain single quotes.

have = "Barcelona,""Jack's Place"",""Blue Sea, LA""";
Super User
Posts: 10,041

Re: SCAN function dealing w/ unmatched quotation marks

Posted in reply to Georg_UPB

Really not easy .

 

data _null_;
	have = "Barcelona,Jack's Place,""Blue Sea, LA""";
	put have;

pid=prxparse('/[^",]+|"[^"]+"/');
start = 1;
stop = length(have);
call prxnext(pid, start, stop, have, position, length);
do while (position > 0);
i+1;
found = dequote(substr(have, position, length));
put i ":" found ;
call prxnext(pid, start, stop, have, position, length);
end;

	
run;

 

Super User
Posts: 10,041

Re: SCAN function dealing w/ unmatched quotation marks

Posted in reply to Georg_UPB

Just one more think , Can you put these into a csv/txt file , and use  

infile ........  dsd 

import it ?

Super User
Super User
Posts: 7,065

Re: SCAN function dealing w/ unmatched quotation marks

Posted in reply to Georg_UPB

As Ksharp suggested you could use the _INFILE_ trick to convert.  Basically you read one dummy record from CARDS or a dummy file and then replace the automatic variable _INFILE_ with the content of your character variable. You can then use INPUT statement to read the data.

 

data have;
  have = "Barcelona,Jack's Place,""Blue Sea, LA""";
run;

data want ;
  set have ;
  length want $50;
  infile cards dsd truncover ;
  input @1 @ ;
  _infile_ = have;
  do i=1 to 3;
    input want @;
    output ;
  end;
cards;

;

proc print; run;

 

Ask a Question
Discussion stats
  • 6 replies
  • 404 views
  • 0 likes
  • 5 in conversation