BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
wwwz
Calcite | Level 5

Hi, I have trouble in reading a csv file in SAS, because comma exists in a string, which should be one column. 

The CSV file is like below:

id,x1,x2,text,dt,x3,x4,x5,x6
101,"","","VISTOR said, \"good."\",2022-11-03,0,0,x,y
310380,2,Inf,"Flu, but not get it yet, but said \"don't want it\". Education provided.",2022-11-04,"","",0,0

My code:

data  aa;
infile 
"c:\test.csv" 
MISSOVER DSD lrecl=40000 firstobs=2 TERMSTR=CRLF delimiter= ',' ;
informat id $20. ;
informat x1 $10. ;
informat x2 $10. ;
informat text $300. ;
informat dt $10. ;
informat x3 $2. ;
informat x4 $2. ;
informat x5 $2.;
informat x6 $2.;

format id $20. ;
format x1 $10.  ;
format x2 $10. ;
format text $300. ;
format dt $10. ;
format x3 $2. ;
format x4 $2. ;
format x5 $2. ;
format x6 $2.;

input
id  $
x1 $ 
x2  $
text  $
dt $ 
x3 $ 
x4 $ 
x5 $ 
x6 $
;
run;
 
Then the output is messed up and like :
wwwz_0-1683224371295.png

How do I fix this?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
yabwon
Onyx | Level 15

Thing is that for SAS  (in contrary to C, R or Python) backslash is not an escape character so (\") does not escape "quotes inside quotes".

For sas escape symbol id double doublequote ("")

 

Translate \" into "" and your file should read-in ok.

Example how to translate:

filename T TEMP;

data  _null_;
infile "c:\test.csv" 
lrecl=40000 firstobs=2;
file T lrecl=40000 TERMSTR=CRLF;
input;
length x $ 32767; 
x=tranwrd(_infile_,'\"','""');
put x;
run;


data  aa;
infile 
T 
DSD lrecl=40000 /*firstobs=2*/ TERMSTR=CRLF delimiter= ',' MISSOVER;
informat id $20. ;
informat x1 $10. ;
informat x2 $10. ;
informat text $300. ;
informat dt $10. ;
informat x3 $2. ;
informat x4 $2. ;
informat x5 $2.;
informat x6 $2.;
 
format id $20. ;
format x1 $10.  ;
format x2 $10. ;
format text $300. ;
format dt $10. ;
format x3 $2. ;
format x4 $2. ;
format x5 $2. ;
format x6 $2.;

input
id  $
x1 $ 
x2  $
text  $
dt $ 
x3 $ 
x4 $ 
x5 $ 
x6 $
;

run;
proc print;
run;

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



View solution in original post

3 REPLIES 3
yabwon
Onyx | Level 15

Thing is that for SAS  (in contrary to C, R or Python) backslash is not an escape character so (\") does not escape "quotes inside quotes".

For sas escape symbol id double doublequote ("")

 

Translate \" into "" and your file should read-in ok.

Example how to translate:

filename T TEMP;

data  _null_;
infile "c:\test.csv" 
lrecl=40000 firstobs=2;
file T lrecl=40000 TERMSTR=CRLF;
input;
length x $ 32767; 
x=tranwrd(_infile_,'\"','""');
put x;
run;


data  aa;
infile 
T 
DSD lrecl=40000 /*firstobs=2*/ TERMSTR=CRLF delimiter= ',' MISSOVER;
informat id $20. ;
informat x1 $10. ;
informat x2 $10. ;
informat text $300. ;
informat dt $10. ;
informat x3 $2. ;
informat x4 $2. ;
informat x5 $2.;
informat x6 $2.;
 
format id $20. ;
format x1 $10.  ;
format x2 $10. ;
format text $300. ;
format dt $10. ;
format x3 $2. ;
format x4 $2. ;
format x5 $2. ;
format x6 $2.;

input
id  $
x1 $ 
x2  $
text  $
dt $ 
x3 $ 
x4 $ 
x5 $ 
x6 $
;

run;
proc print;
run;

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

Fix the process that created the file so that makes a valid CSV file.

 

You need to add quotes around values that include the delimiter.  That means that if the value also contains quotes you need to double those quotes so they do not cause the quoted string to end too soon.

 

So your text file should look this:

id,x1,x2,text,dt,x3,x4,x5,x6
101,,,"VISTOR said, ""good.""",2022-11-03,0,0,x,y
310380,2,Inf,"Flu, but not get it yet, but said ""don't want it"". Education provided.",2022-11-04,,,0,0

 

Looks your file is something intended to be read by the Unix shell instead of an actual CSV file.  Unix shell is an example of system that uses the backslash character as an "ESCAPE" character instead of following the normal process for quoting.

 

If the lines in your file are short enough (less than 32K bytes) you can fix it on the fly by modifying the _INFILE_ buffer character before trying to read the fields.   For example you could use TRANWRD() function to convert the \" combinations into "" instead.

 

Let's convert your posted lines into an actual file we can test with.

filename csv temp;
options parmcards=csv;
parmcards4;
id,x1,x2,text,dt,x3,x4,x5,x6
101,"","","VISTOR said, \"good."\",2022-11-03,0,0,x,y
310380,2,Inf,"Flu, but not get it yet, but said \"don't want it\". Education provided.",2022-11-04,"","",0,0
;;;;

 For your file this simple data step will work:

data aa;
  infile csv dsd truncover firstobs=2;
  length id $20 X1 X2 $10 text $300 dt 8 x3-x6 $2 ;
  informat dt yymmdd.;
  format dt yymmdd10.;
  input @;
  _infile_=tranwrd(_infile_,'\"','""');
  input id -- x6 ;
run;

Result

Obs   id       X1   X2    text                                                                             dt   x3   x4   x5   x6

 1    101                 VISTOR said, "good."                                                     2022-11-03   0    0    x    y
 2    310380   2    Inf   Flu, but not get it yet, but said "don't want it". Education provided.   2022-11-04             0    0

If you also have some literal backslash characters in the data you will need to use slightly more complicated logic to avoid making the mistake of converting \\" to \"" instead of \".

  _infile_=tranwrd(_infile_,'\\','00'x);
  _infile_=tranwrd(_infile_,'\"','""');
  _infile_=tranwrd(_infile_,'00'x,'\');

 

 

wwwz
Calcite | Level 5
Thank both, both are working well!!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1293 views
  • 1 like
  • 3 in conversation