Write and run SAS programs in your web browser

reading quotation marks from a text files

Reply
Occasional Contributor
Posts: 10

reading quotation marks from a text files

hi,  below are my codes of importing a text files. One of the value from the name variabes consists of " " quotation mark which i need to output it. But i cant seems to get that output using the ~ function. As the example below,  my name variables must read the quotation mark --> Brown, Mrs James Joseph (Margaret "Molly" Tobin) .

 

 

example this row :

"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"

------------------------------------------------------------------------------------------------------

length row_names 3 pclass $3 survived 3 name $82 _age $3 age 3
embarked $3 home_dest $70 room $5 ticket $20 boat $3 sex $10;

 

infile '/home/yewkeong0/assignemnt/titanic.txt' dsd dlm=',' missover;
input row_names pclass $ survived name ~ $82. _age
embarked $ home_dest $ room $ ticket $ boat $ sex $;
if _age = 'NA' then age = . ;
else age = input(_age , 3.);
drop _age;

run;

 

any suggestion guys??? =/

Grand Advisor
Posts: 9,682

Re: reading quotation marks from a text files

If you don't have many of these issues it may be better to just change the "Molly" to 'Molly' to read the file.  Then if you really really need double quotes you could replace the ' with ".

Another option would be if the input file was exported from another program to see if there would be if the data could be exported witha  different delimiter such as | or even connect using ODBC.

 

Or use _infile_ and spend hours if not days writing code to parse a file line by line to implement rules that resolve to fields are separated by comma unless the comma occurs between two " but if there is more words without a comma but have more" keep reading.

I wouldn't spend the time.

 

If you have a lot of files doing this then you need to go to the source and find out why and fix it there. Perhaps have them actually use left and right quotes instead of ".

Occasional Contributor
Posts: 10

Re: reading quotation marks from a text files

hi. i also dont mind but i asked my lecture, she said cannot touch the data sets. so we need to output excaly the same as the text files given. my output example is this :

"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"

rowname = 37
pclass = 1st
survived = 1
name = Brown, Mrs James Joseph (Margaret "Molly" Tobin)
age = 44
embarked = Cherbourg
homedest = Denver, CO
room = .
ticket = 17610 L27 15s 5d
boat = 6
sex = female

wow. i would'nt spend so much time also but btw thanks for the suggestion.
Super User
Posts: 1,073

Re: reading quotation marks from a text files

[ Edited ]

your row example is:

 

"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"

 

your problem is because there is a comma in the name field which splits it into two variables.

 

using DLM=',' is not good enough for your data and input method.

Super User
Posts: 1,073

Re: reading quotation marks from a text files

did you try import your data by PROC IMPORT or directly from the excel file ?
Occasional Contributor
Posts: 10

Re: reading quotation marks from a text files

hi. u said using dlm= ',' is not good enough for my data input method? then what is the best for this??? and no i didnt try the proc import method, i straight import directly from text files
Super User
Super User
Posts: 5,960

Re: reading quotation marks from a text files

The issue is the embedded double quote characters. Like:

 

 

"Margaret "Molly" Tobin"

 

 If you want to include the quoting characters into a quoted string they need to be doubled so SAS can tell that they do not mark the end of the string.

 

"Margaret ""Molly"" Tobin"

 

Can you get the generator of the file to fix that bug?  

 

You might be able to fix the issue. Probably it would take a human to figure out what was intended for some of the more messy strings. But if you are limited to just this type of issue and only for one particular field you might  be able to make a program to fix it.  For example you could use the COUNTW() function to see how many fields SAS sees in the line and if it is more than expected then you know there is a quoting issue.

 

 

 

Occasional Contributor
Posts: 10

Re: reading quotation marks from a text files

hi. I will show u the example of the output.

"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"

rowname = 37
pclass = 1st
survived = 1
name = Brown, Mrs James Joseph (Margaret "Molly" Tobin)
age = 44
embarked = Cherbourg
homedest = Denver, CO
room = .
ticket = 17610 L27 15s 5d
boat = 6
sex = female

And yes this is the only field that consist of quotation mark in the name variables.
Super User
Super User
Posts: 5,960

Re: reading quotation marks from a text files

The SCAN() and COUNTW() function can handle those incorrectly quoted fields.  So use them to generate a new version of the file that properly doubles up the embedded quotes.  This code will do that. It will also get rid of the many unneeded quotes that are in the original.  You can then read the converted file using data step or even PROC IMPORT.

 

data _null_ ;
  infile 'badfile.csv' ;
  file 'goodfile.csv' dsd ;
  length word $200 ;
  input ;
  do i=1 to countw(_infile_,',','qm');
    word = scan(_infile_,i,',','qm');
    if word =: '"' then word=substrn(word,2,length(word)-2);
    put word @;
  end;
  put;
run;

 

 

Super User
Posts: 1,073

Re: reading quotation marks from a text files

Based on @Tom suggestion, attached is a tested code to read and create desired output:

 

data want;
       length row_names 3 pclass $3 survived 3 name $82 _age $3 age 3
                 embarked $3 home_dest $70 room $5 ticket $20 boat $3 sex $10;
      infile cards truncover;
       length a_row $250 word $90 var1-var11 $90;
       a_row = ' ';
       input a_row $250.;

 

      array varx $ var1 - var11;
     do i=1 to countw(a_row,',','qm');
               word = scan(a_row,i,',','qm');
               if word =: '"' then word=substrn(word,2,length(word)-2);
               varx(i) = word;
     end ;

/* 1*/ row_names = input(left(varx(1)),best3.);
/* 2*/ pclass = left(varx(2));
/* 3*/ survived = input(left(varx(3)),best3.);
/* 4*/ name = left(varx(4));
/* 5*/ _age = left(varx(5));
       if _age = 'NA' then age = . ;
       else age = input(_age , 3.);

/* 6*/ embarked = left(varx(6));
/* 7*/ home_dest = left(varx(7));
/* 8*/ room = left(varx(8));
/* 9*/ ticket = left(varx(9));
/*10*/ boat = left(varx(10));
/*11*/ sex = left(varx(11));

drop _age a_row word var1-var11;

/*-------------*/
cards;
"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"
run;

 

Grand Advisor
Posts: 9,307

Re: reading quotation marks from a text files

Did you check this ?


https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/td-p/308211

Occasional Contributor
Posts: 10

Re: reading quotation marks from a text files

yes i did. is the same data sets surprisingly, but i think is not the output i want. My output is :

"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"

rowname = 37
pclass = 1st
survived = 1
name = Brown, Mrs James Joseph (Margaret "Molly" Tobin)
age = 44
embarked = Cherbourg
homedest = Denver, CO
room = .
ticket = 17610 L27 15s 5d
boat = 6
sex = female
Grand Advisor
Posts: 9,307

Re: reading quotation marks from a text files

OK. Appearntly Someone made such a file purposely .

Try this one.

 

filename x '/folders/myfolders/titanic.txt';
filename y temp;
data _null_;
 infile x lrecl=2000 length=len;
 file y lrecl=2000;
 input x $varying2000. len;
 p='s/,(?=")|(?<="),'||"/|/";
 x=prxchange(p,-1,strip(x));
 put x;
run;


proc import datafile=y dbms=csv out=have replace;
delimiter='|';
guessingrows=max;
run;
data have;
 set have;
 array x{*} $ _character_;
 do i=1 to dim(x);
  x{i}=prxchange('s/^"|"$//',-1,strip(x{i}));
 end;
drop i;
run;
proc print noobs;run;

 

Post a Question
Discussion Stats
  • 12 replies
  • 291 views
  • 0 likes
  • 5 in conversation