BookmarkSubscribeRSS Feed
yewkeong
Calcite | Level 5

hi,  below are my codes of importing a text files. One of the value from the name variabes consists of " " quotation mark which i need to output it. But i cant seems to get that output using the ~ function. As the example below,  my name variables must read the quotation mark --> Brown, Mrs James Joseph (Margaret "Molly" Tobin) .

 

 

example this row :

"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"

------------------------------------------------------------------------------------------------------

length row_names 3 pclass $3 survived 3 name $82 _age $3 age 3
embarked $3 home_dest $70 room $5 ticket $20 boat $3 sex $10;

 

infile '/home/yewkeong0/assignemnt/titanic.txt' dsd dlm=',' missover;
input row_names pclass $ survived name ~ $82. _age
embarked $ home_dest $ room $ ticket $ boat $ sex $;
if _age = 'NA' then age = . ;
else age = input(_age , 3.);
drop _age;

run;

 

any suggestion guys??? =/

12 REPLIES 12
ballardw
Super User

If you don't have many of these issues it may be better to just change the "Molly" to 'Molly' to read the file.  Then if you really really need double quotes you could replace the ' with ".

Another option would be if the input file was exported from another program to see if there would be if the data could be exported witha  different delimiter such as | or even connect using ODBC.

 

Or use _infile_ and spend hours if not days writing code to parse a file line by line to implement rules that resolve to fields are separated by comma unless the comma occurs between two " but if there is more words without a comma but have more" keep reading.

I wouldn't spend the time.

 

If you have a lot of files doing this then you need to go to the source and find out why and fix it there. Perhaps have them actually use left and right quotes instead of ".

yewkeong
Calcite | Level 5
hi. i also dont mind but i asked my lecture, she said cannot touch the data sets. so we need to output excaly the same as the text files given. my output example is this :

"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"

rowname = 37
pclass = 1st
survived = 1
name = Brown, Mrs James Joseph (Margaret "Molly" Tobin)
age = 44
embarked = Cherbourg
homedest = Denver, CO
room = .
ticket = 17610 L27 15s 5d
boat = 6
sex = female

wow. i would'nt spend so much time also but btw thanks for the suggestion.
Shmuel
Garnet | Level 18

your row example is:

 

"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"

 

your problem is because there is a comma in the name field which splits it into two variables.

 

using DLM=',' is not good enough for your data and input method.

Shmuel
Garnet | Level 18
did you try import your data by PROC IMPORT or directly from the excel file ?
yewkeong
Calcite | Level 5
hi. u said using dlm= ',' is not good enough for my data input method? then what is the best for this??? and no i didnt try the proc import method, i straight import directly from text files
Tom
Super User Tom
Super User

The issue is the embedded double quote characters. Like:

 

 

"Margaret "Molly" Tobin"

 

 If you want to include the quoting characters into a quoted string they need to be doubled so SAS can tell that they do not mark the end of the string.

 

"Margaret ""Molly"" Tobin"

 

Can you get the generator of the file to fix that bug?  

 

You might be able to fix the issue. Probably it would take a human to figure out what was intended for some of the more messy strings. But if you are limited to just this type of issue and only for one particular field you might  be able to make a program to fix it.  For example you could use the COUNTW() function to see how many fields SAS sees in the line and if it is more than expected then you know there is a quoting issue.

 

 

 

yewkeong
Calcite | Level 5
hi. I will show u the example of the output.

"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"

rowname = 37
pclass = 1st
survived = 1
name = Brown, Mrs James Joseph (Margaret "Molly" Tobin)
age = 44
embarked = Cherbourg
homedest = Denver, CO
room = .
ticket = 17610 L27 15s 5d
boat = 6
sex = female

And yes this is the only field that consist of quotation mark in the name variables.
Tom
Super User Tom
Super User

The SCAN() and COUNTW() function can handle those incorrectly quoted fields.  So use them to generate a new version of the file that properly doubles up the embedded quotes.  This code will do that. It will also get rid of the many unneeded quotes that are in the original.  You can then read the converted file using data step or even PROC IMPORT.

 

data _null_ ;
  infile 'badfile.csv' ;
  file 'goodfile.csv' dsd ;
  length word $200 ;
  input ;
  do i=1 to countw(_infile_,',','qm');
    word = scan(_infile_,i,',','qm');
    if word =: '"' then word=substrn(word,2,length(word)-2);
    put word @;
  end;
  put;
run;

 

 

Shmuel
Garnet | Level 18

Based on @Tom suggestion, attached is a tested code to read and create desired output:

 

data want;
       length row_names 3 pclass $3 survived 3 name $82 _age $3 age 3
                 embarked $3 home_dest $70 room $5 ticket $20 boat $3 sex $10;
      infile cards truncover;
       length a_row $250 word $90 var1-var11 $90;
       a_row = ' ';
       input a_row $250.;

 

      array varx $ var1 - var11;
     do i=1 to countw(a_row,',','qm');
               word = scan(a_row,i,',','qm');
               if word =: '"' then word=substrn(word,2,length(word)-2);
               varx(i) = word;
     end ;

/* 1*/ row_names = input(left(varx(1)),best3.);
/* 2*/ pclass = left(varx(2));
/* 3*/ survived = input(left(varx(3)),best3.);
/* 4*/ name = left(varx(4));
/* 5*/ _age = left(varx(5));
       if _age = 'NA' then age = . ;
       else age = input(_age , 3.);

/* 6*/ embarked = left(varx(6));
/* 7*/ home_dest = left(varx(7));
/* 8*/ room = left(varx(8));
/* 9*/ ticket = left(varx(9));
/*10*/ boat = left(varx(10));
/*11*/ sex = left(varx(11));

drop _age a_row word var1-var11;

/*-------------*/
cards;
"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"
run;

 

Ksharp
Super User
Did you check this ?


https://communities.sas.com/t5/SAS-Studio/How-to-read-nested-double-quotation-in-txt-files/td-p/308211

yewkeong
Calcite | Level 5
yes i did. is the same data sets surprisingly, but i think is not the output i want. My output is :

"37","1st",1,"Brown, Mrs James Joseph (Margaret "Molly" Tobin)",44.0000,"Cherbourg","Denver, CO","","17610 L27 15s 5d","6","female"

rowname = 37
pclass = 1st
survived = 1
name = Brown, Mrs James Joseph (Margaret "Molly" Tobin)
age = 44
embarked = Cherbourg
homedest = Denver, CO
room = .
ticket = 17610 L27 15s 5d
boat = 6
sex = female
Ksharp
Super User

OK. Appearntly Someone made such a file purposely .

Try this one.

 

filename x '/folders/myfolders/titanic.txt';
filename y temp;
data _null_;
 infile x lrecl=2000 length=len;
 file y lrecl=2000;
 input x $varying2000. len;
 p='s/,(?=")|(?<="),'||"/|/";
 x=prxchange(p,-1,strip(x));
 put x;
run;


proc import datafile=y dbms=csv out=have replace;
delimiter='|';
guessingrows=max;
run;
data have;
 set have;
 array x{*} $ _character_;
 do i=1 to dim(x);
  x{i}=prxchange('s/^"|"$//',-1,strip(x{i}));
 end;
drop i;
run;
proc print noobs;run;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 4357 views
  • 0 likes
  • 5 in conversation