DATA Step, Macro, Functions and more

INFILE problem in combination with DSD when reading CSV with quotes

Reply
Occasional Contributor
Posts: 9

INFILE problem in combination with DSD when reading CSV with quotes

 

I have troubles reading CSV files in with the following situation occurs:

<xx><delim><quote><……..><quote><delim><delim><xx>. So some ware in a recordline there is a delimiter, followed with a quote, then one or more fields, then some characters closed with a quote before the next delimiter. I tried to solve it with the DSD option but that doesn’t work. The DSD skips the delimiter between the quotes. Without DSD two delimiters behind each others are transformed to one delimiter. Double delimiters can occur because it represents a missing value for that variable. The problem is the last quote directly followed by a delimiter.

Who can help me to solve this problem?

Following below are two identical examples. The first is with DSD, the second without.

 

* Infile with DSD. Record 3 (KEY='C') fails because the string 's Gravenhage,Mataw' leads to one variable;
data work.OPER_A_DSD; 
   infile datalines  delimiter = ","  MISSOVER DSD;
   attrib KEY length = $12; 
   attrib CITY length = $60; 
   attrib NAME length = $50;
   attrib EMPTY length =$1;
   attrib EX length = $1; 
   input KEY CITY NAME EMPTY EX $; 
   put _all_;
datalines;
A,Utrecht,James,,X
B,'s Gravenhage,Scott,N,X
C,'s Gravenhage,Mataw',N,X
D,'s Gravenhage,Mataw',,X
E,'s Gravenhage,Mata'w,,X
;
run; 
* Infile without DSD. Record 1 (KEY='A') fails because 2 cancatenated delimitters (,,) is converted to one. The value X is placed in the variable EMPTY instead of EX;
data work.OPER_A_NODSD; 
   infile datalines  delimiter = ","  MISSOVER ;
   attrib KEY length = $12; 
   attrib CITY length = $60; 
   attrib NAME length = $50;
   attrib EMPTY length =$1;
   attrib EX length = $1; 
   input KEY CITY NAME EMPTY EX $; 
   put _all_;
datalines;
A,Utrecht,James,,X
B,'s Gravenhage,Scott,N,X
C,'s Gravenhage,Mataw',N,X
D,'s Gravenhage,Mataw',,X
E,'s Gravenhage,Mata'w,,X
;
run;
Super User
Super User
Posts: 9,406

Re: INFILE problem in combination with DSD when reading CSV with quotes

Posted in reply to RonaldHof

Neither of these are valid CSV files.  If a CSV file's data item contains a quote, then the whole column should be enclosed in quotes (preferably double to avoid single quotes):

A,"Utrecht,James",,X
B,"'s Gravenhage,Scott",N,X
C,"'s Gravenhage,Mataw'",N,X
D,"'s Gravenhage,Mataw'",,X
E,"'s Gravenhage,Mata'w",,X

 Fix the export of data would be my suggestion.  Otherwise you would need to do some parsing on the file yourself.

Occasional Contributor
Posts: 9

Re: INFILE problem in combination with DSD when reading CSV with quotes

Thanks RW9 but I do not have the possibility to change the export to a 'valid' CSV. It is delivered by an external party. What is a 'valid' CSV file. I don't think there is a formal definition regulating the form of a CSV-file. Therefore there are a lot of variations in CSV-files. Parsing is an alternative but that's the last rescue I want to use. SAS can read 'everything' so I hope there is a better solution.

Ronald

Super User
Super User
Posts: 9,406

Re: INFILE problem in combination with DSD when reading CSV with quotes

Posted in reply to RonaldHof

https://en.wikipedia.org/wiki/Comma-separated_values

RFC-4180

 

Anyways, irrespective of a standard or not, you need to have in the data some clear way of identifying when a quote or delimiter is part of a variable or is a quote or delimiter.

In this instance:

A,Utrecht,James,,X
B,'s Gravenhage,Scott,N,X
C,'s Gravenhage,Mataw',N,X
D,'s Gravenhage,Mataw',,X
E,'s Gravenhage,Mata'w,,X

's Gravenhage,Scott - is the comma part of the text or a delimiter?  You may be able to look at this and make some decision, however to write a generic import to cover these types of things is pretty difficult - your assumption is not going to be universal.   

Occasional Contributor
Posts: 9

Re: INFILE problem in combination with DSD when reading CSV with quotes

Hi Tom,

I see, there is some kind of standard. Next problem is how quick I can motivate the data-deliverer to change this export. That takes a lot of time because this export-format is used by a lot of customers.

'S Gravenhage  (with the quote) is the name of a city in the Netherlands. The comma is the delimiter so Scott is the value of the third column.

Ronald

Super User
Super User
Posts: 7,932

Re: INFILE problem in combination with DSD when reading CSV with quotes

Posted in reply to RonaldHof

There is a standard, RCF-4180.   https://www.ietf.org/rfc/rfc4180.txt

 

The only change from that you need to tell them to make is to not send any end of line markers in the value of character variables. 

PROC Star
Posts: 8,146

Re: INFILE problem in combination with DSD when reading CSV with quotes

Posted in reply to RonaldHof

One possibility would be to correct the file on the fly. e.g.:

data work.OPER_A_NODSD;
  input @;
  _infile_= translate(_infile_,'~',"'");
   infile datalines  delimiter = "," dsd MISSOVER ;
   attrib KEY length = $12; 
   attrib CITY length = $60; 
   attrib NAME length = $50;
   attrib EMPTY length =$1;
   attrib EX length = $1; 
   input KEY CITY NAME EMPTY EX $; 
   city=translate(city,"'",'~');
   name=translate(name,"'",'~');
   put _all_;
datalines;
A,Utrecht,James,,X
B,'s Gravenhage,Scott,N,X
C,'s Gravenhage,Mataw',N,X
D,'s Gravenhage,Mataw',,X
E,'s Gravenhage,Mata'w,,X
;
run;

Art, CEO, AnalystFinder.com

 

 

SAS Super FREQ
Posts: 9,320

Re: INFILE problem in combination with DSD when reading CSV with quotes

Posted in reply to RonaldHof
Hi:
Aside from all other issues, how do you envision the row for E to be read? Do you envision it like this:
Key=E
City='s Granvenhage
Name=Mata'w
Empty=(blank)
EX = X

Do you eventually want to strip the single quotes out of the variable values, are they correct? Are they incorrect? Are the quotes in the file like this ALWAYS single quotes?

cynthia

Occasional Contributor
Posts: 9

Re: INFILE problem in combination with DSD when reading CSV with quotes

Posted in reply to Cynthia_sas

Hey Cyntia,

 

Your assumption is correct. Variable EMPTY shout be missing and EX the value 'X'. The first dataset with DSD reads this record correct in because there is a character between quote and the delimiter (letter w). Without DSD it goes wrong in my example because the value 'X' appears in EMPTY.

Occasional Contributor
Posts: 9

Re: INFILE problem in combination with DSD when reading CSV with quotes

Posted in reply to RonaldHof

I forgot something. The single quotes are part of the delivered data and shoud not be skipped.

PROC Star
Posts: 8,146

Re: INFILE problem in combination with DSD when reading CSV with quotes

Posted in reply to RonaldHof

Did you try the solution I proposed? I think it does what you want!

 

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 9

Re: INFILE problem in combination with DSD when reading CSV with quotes

Hi Art,

 

Parsing every file, what you suggest,will work but is not what I want. Only as a last rescue. Reason for that is that the solution must be implemented in the File Reader transformation in SAS/DI Studio. That is possible but than you have created for every External File that you ant to read a user writen code. Within the File Reader it is not possible to parse then _infile_ lines. That is why I hope that there is an other option in f.i. the infile statement that I can use.

Ronald

Super User
Super User
Posts: 7,932

Re: INFILE problem in combination with DSD when reading CSV with quotes

[ Edited ]
Posted in reply to RonaldHof

If you want to process without DSD option then add a period between the adjacent commas.  Here is how you could do it in the data step that reads the data.

data one ;
  infile cards dlm=',' truncover ;
  input @;
  if _infile_=:',' then _infile_='.'||_infile_;
  do while (index(_infile_,',,'));_infile_=tranwrd(_infile_,',,',',.,'); end;
  input (x1-x6) (:$20.) ;
  put _infile_ @40 (x1-x6) (=) ;
cards4;
A,Utrecht,James,,X
B,'s Gravenhage,Scott,N,X
C,'s Gravenhage,Mataw',N,X
D,'s Gravenhage,Mataw',,X
E,'s Gravenhage,Mata'w,,X
;;;;

If you want to use the DSD option then you need to convert those single quotes into something else so that values that start with a single quote don't cause it hunt for the closing quote. You could translate the quotes to something else and then process all of your character variables to reverse the translation. Make sure to use a character that is not in your data file.

data one ;
  infile cards dsd  truncover ;
  input @;
  _infile_=translate(_infile_,"`","'");
  input (x1-x6) (:$20.) ;
  array _c _character_;
  do over _c; _c=translate(_c,"'","`"); end;
  put _infile_ @40 (x1-x6) (=) ;
cards4;
A,Utrecht,James,,X
B,'s Gravenhage,Scott,N,X
C,'s Gravenhage,Mataw',N,X
D,'s Gravenhage,Mataw',,X
E,'s Gravenhage,Mata'w,,X
;;;;
Occasional Contributor
Posts: 9

Re: INFILE problem in combination with DSD when reading CSV with quotes

Hi Tom,

This is a solution but difficult to implement in the File Reader in SAS/DI Studio. Most of the replies I get suggest to parse the file with the _infile_ buffer. Not ideal but maybe there is not an other solution.

Ronald

Super User
Super User
Posts: 7,932

Re: INFILE problem in combination with DSD when reading CSV with quotes

Posted in reply to RonaldHof

@RonaldHof wrote:

Hi Tom,

This is a solution but difficult to implement in the File Reader in SAS/DI Studio. Most of the replies I get suggest to parse the file with the _infile_ buffer. Not ideal but maybe there is not an other solution.

Ronald


Will DI studio allow you to create a code step that you could use to create a corrected version of the file that you could then read using normal process?

Ask a Question
Discussion stats
  • 14 replies
  • 142 views
  • 1 like
  • 5 in conversation