<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to read nested double quotation in txt files? in SAS Studio</title>
    <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680175#M9368</link>
    <description>Hello, pattern matching is actually working to resolve this issue, so how does pattern matching work? for example p='s/^"|"$|"(?=,)|(?&amp;lt;=,)"'||"/'/" in the code; how can I understand variable? could you plz explain a little bit for this? thanks</description>
    <pubDate>Sat, 29 Aug 2020 04:02:04 GMT</pubDate>
    <dc:creator>wilsonli</dc:creator>
    <dc:date>2020-08-29T04:02:04Z</dc:date>
    <item>
      <title>How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/308211#M1396</link>
      <description>&lt;P&gt;Hi, How do I read nested double quotation in txt files and txt files as such? P.S the highlighted name.&amp;nbsp;&lt;/P&gt;&lt;P&gt;for example the txt is as such:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;"row.names","pclass","survived","name","age","embarked","home.dest","room","ticket","boat","sex"&lt;/P&gt;&lt;P&gt;"37","1st",1,&lt;FONT color="#0000FF"&gt;"Brown, Mrs James Joseph (Margaret "Molly" Tobin)"&lt;/FONT&gt;,44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have attached the file below. Thank you much for help.&lt;/P&gt;</description>
      <pubDate>Mon, 31 Oct 2016 08:51:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/308211#M1396</guid>
      <dc:creator>Arcturuz</dc:creator>
      <dc:date>2016-10-31T08:51:51Z</dc:date>
    </item>
    <item>
      <title>Re: How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/308230#M1397</link>
      <description>&lt;P&gt;DSD option in an infile statement should be fine.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your delimiter is a comma otherwise so it should read properly. I think...&lt;/P&gt;
&lt;P&gt;What errors are you getting?&lt;/P&gt;</description>
      <pubDate>Mon, 31 Oct 2016 10:04:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/308230#M1397</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-10-31T10:04:29Z</dc:date>
    </item>
    <item>
      <title>Re: How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/308236#M1398</link>
      <description>&lt;P&gt;How about this one .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;filename x '/folders/myfolders/titanicExample.txt';
filename y temp;
data _null_;
 infile x lrecl=2000 length=len;
 file y lrecl=2000;
 input x $varying2000. len;
 p='s/^"|"$|"(?=,)|(?&amp;lt;=,)"'||"/'/";
 x=prxchange(p,-1,strip(x));
 put x;
run;


proc import datafile=y dbms=csv out=have replace;&lt;BR /&gt;guessingrows=max;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 31 Oct 2016 10:49:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/308236#M1398</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2016-10-31T10:49:45Z</dc:date>
    </item>
    <item>
      <title>Re: How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/308238#M1399</link>
      <description>&lt;P&gt;Where did you get the file? &amp;nbsp;It looks like something exported from Excel. &amp;nbsp;It has decided that as the delimiter appears in the value that text strings need to be quoted. &amp;nbsp;Unfortunately your string also contains quote marks. &amp;nbsp;Normally I would expect to see either sinlge quotes surrounding the outside, or double quotes inside (also seen tags as well). &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would suggesting fixing the source, i.e. export as pipe delimted or similar:&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;"37"|"1st"|1|&lt;/SPAN&gt;&lt;FONT color="#0000FF"&gt;"Brown, Mrs James Joseph (Margaret "Molly" Tobin)"|&lt;/FONT&gt;&lt;SPAN&gt;44.0000|...&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza﻿&lt;/a&gt;, nope, it doesn't due to the comman in the string also.&lt;/P&gt;</description>
      <pubDate>Mon, 31 Oct 2016 10:52:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/308238#M1399</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2016-10-31T10:52:49Z</dc:date>
    </item>
    <item>
      <title>Re: How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/308248#M1400</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;DSD option in an infile statement should be fine.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your delimiter is a comma otherwise so it should read properly. I think...&lt;/P&gt;
&lt;P&gt;What errors are you getting?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;No errors happen, but something quite peculiar:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data test;
infile cards dlm=',' dsd truncover;
input rownames $ pclass $ survived name :$50. age embarked $ homedest $ room $ ticket $ boat sex $;
cards;
"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"
;
run;

proc print noobs;run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Result:&lt;/P&gt;
&lt;PRE&gt;rownames    pclass    survived     name     age    embarked    homedest     room      ticket    boat    sex

   37        1st          1       "Brown     .     44.0000     Cherbour    Denver,                .      6 
&lt;/PRE&gt;
&lt;P&gt;As you can see, SAS interprets the comma within the double quotes as a delimiter; everything after that up to the next comma (disregards all further double quotes) ends up as input for the next column, causing a shift and missing values.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My tests with 3 different programs (SAS, Excel, LibreOffice calc) have revealed that the only software that reads this correctly is LibreOffice. Not really surprised by that, but SAS should do better.&lt;/P&gt;
&lt;P&gt;I'd put that to SAS TS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Replacing the double quotes around the name with single quotes made SAS read this as intended, BTW.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Edit: replaced "commas" in one place with "double quotes".&lt;/P&gt;</description>
      <pubDate>Mon, 31 Oct 2016 12:01:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/308248#M1400</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-10-31T12:01:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/308369#M1405</link>
      <description>&lt;P&gt;How very interesting. 'dsd' doesn't do what you want exactly, but SAS's behaviour with imbedded quotes is - well - odd.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When I nutted out some code, I got what I expect you were getting: Molly's name confuses everything and the following columns get out of whack. What I didn't expect is actually how 'name' would be treated. I would have thought that name would contain "Brown, Mrs James Joseph (Margaret " and then stop. What actually occurred is name is truncated after the first comma: "Brown - note that it contains the double quote.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm not inordinately proud of this code, but it does work for the example you provided. There may be other errors in following data - but this is a reasonable first attempt:&lt;/P&gt;
&lt;PRE&gt;data titanic;
infile titanic 
       firstobs=2 dsd dlm=',' missover;
attrib row_names length=$ 6 label="Row Names";
attrib pclass length=$ 6 label="Pclass";
attrib survived length=4 label="Survived";
attrib name length=$ 60 label="Name";
attrib age length=4 label="Age";
attrib embarked length=$ 30 label="Embarked";
attrib home_dest length=$ 30 label="Home Dest";
attrib room length=$ 6 label="Room";
attrib ticket length=$ 30 label="Ticket";
attrib boat length=$ 6 label="Boat";
attrib sex length=$ 6 label="Sex";
attrib name2 length=$ 60; &lt;BR /&gt;input row_names @;
if missing(row_names) then 
   delete;
input pclass 
      survived 
      name @;
if name =: '"' then do;
   input name2 @;
   name = strip(substr(name, 2)) || ', ' || substr(name2, 1, length(name2) - 1);
   end;
input age ?? 
      embarked 
      home_dest 
      room 
      ticket 
      boat 
      sex;
drop name2;
run;
&lt;/PRE&gt;
&lt;P&gt;Note the checking of name to see if it starts with a quote. If it does, it reads&amp;nbsp;name2, then concatenates it with name (without its first character). Note also that name2 is suffixed with a quote, but the internal quotes around Molly are retained.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I also put double question-marks after age, to treat NA as missing without an error.&lt;/P&gt;</description>
      <pubDate>Mon, 31 Oct 2016 19:50:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/308369#M1405</guid>
      <dc:creator>LaurieF</dc:creator>
      <dc:date>2016-10-31T19:50:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680175#M9368</link>
      <description>Hello, pattern matching is actually working to resolve this issue, so how does pattern matching work? for example p='s/^"|"$|"(?=,)|(?&amp;lt;=,)"'||"/'/" in the code; how can I understand variable? could you plz explain a little bit for this? thanks</description>
      <pubDate>Sat, 29 Aug 2020 04:02:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680175#M9368</guid>
      <dc:creator>wilsonli</dc:creator>
      <dc:date>2020-08-29T04:02:04Z</dc:date>
    </item>
    <item>
      <title>Re: How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680198#M9369</link>
      <description>SAS documentation explain PRX very well . Check it.&lt;BR /&gt;&lt;BR /&gt;^"|"$|"(?=,)|(?&amp;lt;=,)"&lt;BR /&gt;==&amp;gt;&lt;BR /&gt;^"  stands for start with "&lt;BR /&gt;|    stands for OR&lt;BR /&gt;"$  stands for end with "&lt;BR /&gt;"(?=,)  is forward match like    ",&lt;BR /&gt;(?&amp;lt;=,)"  is backward match like  ,"&lt;BR /&gt;&lt;BR /&gt;/'/  stands for replace all the " above with '</description>
      <pubDate>Sat, 29 Aug 2020 12:13:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680198#M9369</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2020-08-29T12:13:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680227#M9372</link>
      <description>&lt;P&gt;The problem is that the source file is poorly formed so that there is no unambiguous way to parse it.&amp;nbsp; It has a quoted value with embedded quotes that are not doubled up.&amp;nbsp; If you want to put the text&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;Brown, Mrs James Joseph (Margaret "Molly" Tobin)&lt;/PRE&gt;
&lt;P&gt;in quotes you either need to use single quotes.&lt;/P&gt;
&lt;PRE&gt;'Brown, Mrs James Joseph (Margaret "Molly" Tobin)'&lt;/PRE&gt;
&lt;P&gt;Or double the existing quotes.&lt;/P&gt;
&lt;PRE&gt;"Brown, Mrs James Joseph (Margaret ""Molly"" Tobin)"&lt;/PRE&gt;</description>
      <pubDate>Sat, 29 Aug 2020 17:10:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680227#M9372</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2020-08-29T17:10:25Z</dc:date>
    </item>
    <item>
      <title>Re: How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680228#M9373</link>
      <description>&lt;P&gt;What is the source of that file?&amp;nbsp; Looks like some program tried to write a CSV file, but did not follow the rules for how to handle values with quotes in them.&amp;nbsp; There are many popular computer languages (Oracle comes to mind) that frequently generate gibberish like that.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can you get it in a form that is not corrupted?&lt;/P&gt;</description>
      <pubDate>Sat, 29 Aug 2020 17:17:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680228#M9373</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2020-08-29T17:17:41Z</dc:date>
    </item>
    <item>
      <title>Re: How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680587#M9375</link>
      <description>thank you, seems like this solution can't keep consistency due to the corrupted data source. Because it corrupt the other row that have raw values like "Mike's..." after replacing double quotes to single quotes as it becomes 'Mike's...' and it will have the same issue again.&lt;BR /&gt;&lt;BR /&gt;So my question is: in real world workplace, is it possible to manually fix raw data source from "Brown, Mrs James Joseph (Margaret "Molly" Tobin)" to "Brown, Mrs James Joseph (Margaret 'Molly' Tobin)" before reading it?</description>
      <pubDate>Tue, 01 Sep 2020 00:59:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680587#M9375</guid>
      <dc:creator>wilsonli</dc:creator>
      <dc:date>2020-09-01T00:59:04Z</dc:date>
    </item>
    <item>
      <title>Re: How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680588#M9376</link>
      <description>Hi, so do you think "Brown, Mrs James Joseph (Margaret "Molly" Tobin)" is actually corrupted? So do you think it is a good practice to manually fix the raw data from "Molly" to 'Molly' or Molly inside the double quotes before reading it? Thanks.</description>
      <pubDate>Tue, 01 Sep 2020 01:03:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680588#M9376</guid>
      <dc:creator>wilsonli</dc:creator>
      <dc:date>2020-09-01T01:03:42Z</dc:date>
    </item>
    <item>
      <title>Re: How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680601#M9377</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/329093"&gt;@wilsonli&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;Hi, so do you think "Brown, Mrs James Joseph (Margaret "Molly" Tobin)" is actually corrupted? So do you think it is a good practice to manually fix the raw data from "Molly" to 'Molly' or Molly inside the double quotes before reading it? Thanks.&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;No.&amp;nbsp; &lt;STRONG&gt;It is best practice to write the text file properly the first time.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;It might be possible to fix your existing file. But usually that is only possible when you know how many fields are on each line and there is only one field that could possibly have embedded quotes.&amp;nbsp; Then you can parse the fields before from the left and the fields after from the right and whatever is left over is what goes into the problem field.&lt;/P&gt;</description>
      <pubDate>Tue, 01 Sep 2020 02:43:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680601#M9377</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2020-09-01T02:43:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to read nested double quotation in txt files?</title>
      <link>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680659#M9378</link>
      <description>Please start a new session to discuss this question.&lt;BR /&gt;And other sas users could offer you a good idea / code !</description>
      <pubDate>Tue, 01 Sep 2020 10:54:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/m-p/680659#M9378</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2020-09-01T10:54:04Z</dc:date>
    </item>
  </channel>
</rss>

