Write and run SAS programs in your web browser

How to read nested double quotation in txt files?

Reply
New Contributor
Posts: 3

How to read nested double quotation in txt files?

Hi, How do I read nested double quotation in txt files and txt files as such? P.S the highlighted name. 

for example the txt is as such: 

 

"row.names","pclass","survived","name","age","embarked","home.dest","room","ticket","boat","sex"

"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"

 

I have attached the file below. Thank you much for help.

Grand Advisor
Posts: 17,338

Re: How to read nested double quotation in txt files?

DSD option in an infile statement should be fine. 

Your delimiter is a comma otherwise so it should read properly. I think...

What errors are you getting?

Esteemed Advisor
Posts: 6,656

Re: How to read nested double quotation in txt files?

[ Edited ]

Reeza wrote:

DSD option in an infile statement should be fine. 

Your delimiter is a comma otherwise so it should read properly. I think...

What errors are you getting?


No errors happen, but something quite peculiar:

data test;
infile cards dlm=',' dsd truncover;
input rownames $ pclass $ survived name :$50. age embarked $ homedest $ room $ ticket $ boat sex $;
cards;
"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"
;
run;

proc print noobs;run;

Result:

rownames    pclass    survived     name     age    embarked    homedest     room      ticket    boat    sex

   37        1st          1       "Brown     .     44.0000     Cherbour    Denver,                .      6 

As you can see, SAS interprets the comma within the double quotes as a delimiter; everything after that up to the next comma (disregards all further double quotes) ends up as input for the next column, causing a shift and missing values.

 

My tests with 3 different programs (SAS, Excel, LibreOffice calc) have revealed that the only software that reads this correctly is LibreOffice. Not really surprised by that, but SAS should do better.

I'd put that to SAS TS.

 

Replacing the double quotes around the name with single quotes made SAS read this as intended, BTW.

 

Edit: replaced "commas" in one place with "double quotes".

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Grand Advisor
Posts: 9,576

Re: How to read nested double quotation in txt files?

[ Edited ]

How about this one .

 

filename x '/folders/myfolders/titanicExample.txt';
filename y temp;
data _null_;
 infile x lrecl=2000 length=len;
 file y lrecl=2000;
 input x $varying2000. len;
 p='s/^"|"$|"(?=,)|(?<=,)"'||"/'/";
 x=prxchange(p,-1,strip(x));
 put x;
run;


proc import datafile=y dbms=csv out=have replace;
guessingrows=max; run;
Esteemed Advisor
Esteemed Advisor
Posts: 7,211

Re: How to read nested double quotation in txt files?

Where did you get the file?  It looks like something exported from Excel.  It has decided that as the delimiter appears in the value that text strings need to be quoted.  Unfortunately your string also contains quote marks.  Normally I would expect to see either sinlge quotes surrounding the outside, or double quotes inside (also seen tags as well).  

 

I would suggesting fixing the source, i.e. export as pipe delimted or similar:

"37"|"1st"|1|"Brown, Mrs James Joseph (Margaret "Molly" Tobin)"|44.0000|...

 

@Reeza, nope, it doesn't due to the comman in the string also.

Super Contributor
Posts: 251

Re: How to read nested double quotation in txt files?

How very interesting. 'dsd' doesn't do what you want exactly, but SAS's behaviour with imbedded quotes is - well - odd.

 

When I nutted out some code, I got what I expect you were getting: Molly's name confuses everything and the following columns get out of whack. What I didn't expect is actually how 'name' would be treated. I would have thought that name would contain "Brown, Mrs James Joseph (Margaret " and then stop. What actually occurred is name is truncated after the first comma: "Brown - note that it contains the double quote.

 

I'm not inordinately proud of this code, but it does work for the example you provided. There may be other errors in following data - but this is a reasonable first attempt:

data titanic;
infile titanic 
       firstobs=2 dsd dlm=',' missover;
attrib row_names length=$ 6 label="Row Names";
attrib pclass length=$ 6 label="Pclass";
attrib survived length=4 label="Survived";
attrib name length=$ 60 label="Name";
attrib age length=4 label="Age";
attrib embarked length=$ 30 label="Embarked";
attrib home_dest length=$ 30 label="Home Dest";
attrib room length=$ 6 label="Room";
attrib ticket length=$ 30 label="Ticket";
attrib boat length=$ 6 label="Boat";
attrib sex length=$ 6 label="Sex";
attrib name2 length=$ 60; 
input row_names @; if missing(row_names) then delete; input pclass survived name @; if name =: '"' then do; input name2 @; name = strip(substr(name, 2)) || ', ' || substr(name2, 1, length(name2) - 1); end; input age ?? embarked home_dest room ticket boat sex; drop name2; run;

Note the checking of name to see if it starts with a quote. If it does, it reads name2, then concatenates it with name (without its first character). Note also that name2 is suffixed with a quote, but the internal quotes around Molly are retained.

 

I also put double question-marks after age, to treat NA as missing without an error.

Ask a Question
Discussion stats
  • 5 replies
  • 316 views
  • 0 likes
  • 6 in conversation