BookmarkSubscribeRSS Feed
RonaldHof
Calcite | Level 5

 

I have troubles reading CSV files in with the following situation occurs:

<xx><delim><quote><……..><quote><delim><delim><xx>. So some ware in a recordline there is a delimiter, followed with a quote, then one or more fields, then some characters closed with a quote before the next delimiter. I tried to solve it with the DSD option but that doesn’t work. The DSD skips the delimiter between the quotes. Without DSD two delimiters behind each others are transformed to one delimiter. Double delimiters can occur because it represents a missing value for that variable. The problem is the last quote directly followed by a delimiter.

Who can help me to solve this problem?

Following below are two identical examples. The first is with DSD, the second without.

 

* Infile with DSD. Record 3 (KEY='C') fails because the string 's Gravenhage,Mataw' leads to one variable;
data work.OPER_A_DSD; 
   infile datalines  delimiter = ","  MISSOVER DSD;
   attrib KEY length = $12; 
   attrib CITY length = $60; 
   attrib NAME length = $50;
   attrib EMPTY length =$1;
   attrib EX length = $1; 
   input KEY CITY NAME EMPTY EX $; 
   put _all_;
datalines;
A,Utrecht,James,,X
B,'s Gravenhage,Scott,N,X
C,'s Gravenhage,Mataw',N,X
D,'s Gravenhage,Mataw',,X
E,'s Gravenhage,Mata'w,,X
;
run; 
* Infile without DSD. Record 1 (KEY='A') fails because 2 cancatenated delimitters (,,) is converted to one. The value X is placed in the variable EMPTY instead of EX;
data work.OPER_A_NODSD; 
   infile datalines  delimiter = ","  MISSOVER ;
   attrib KEY length = $12; 
   attrib CITY length = $60; 
   attrib NAME length = $50;
   attrib EMPTY length =$1;
   attrib EX length = $1; 
   input KEY CITY NAME EMPTY EX $; 
   put _all_;
datalines;
A,Utrecht,James,,X
B,'s Gravenhage,Scott,N,X
C,'s Gravenhage,Mataw',N,X
D,'s Gravenhage,Mataw',,X
E,'s Gravenhage,Mata'w,,X
;
run;
14 REPLIES 14
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Neither of these are valid CSV files.  If a CSV file's data item contains a quote, then the whole column should be enclosed in quotes (preferably double to avoid single quotes):

A,"Utrecht,James",,X
B,"'s Gravenhage,Scott",N,X
C,"'s Gravenhage,Mataw'",N,X
D,"'s Gravenhage,Mataw'",,X
E,"'s Gravenhage,Mata'w",,X

 Fix the export of data would be my suggestion.  Otherwise you would need to do some parsing on the file yourself.

RonaldHof
Calcite | Level 5

Thanks RW9 but I do not have the possibility to change the export to a 'valid' CSV. It is delivered by an external party. What is a 'valid' CSV file. I don't think there is a formal definition regulating the form of a CSV-file. Therefore there are a lot of variations in CSV-files. Parsing is an alternative but that's the last rescue I want to use. SAS can read 'everything' so I hope there is a better solution.

Ronald

RW9
Diamond | Level 26 RW9
Diamond | Level 26

https://en.wikipedia.org/wiki/Comma-separated_values

RFC-4180

 

Anyways, irrespective of a standard or not, you need to have in the data some clear way of identifying when a quote or delimiter is part of a variable or is a quote or delimiter.

In this instance:

A,Utrecht,James,,X
B,'s Gravenhage,Scott,N,X
C,'s Gravenhage,Mataw',N,X
D,'s Gravenhage,Mataw',,X
E,'s Gravenhage,Mata'w,,X

's Gravenhage,Scott - is the comma part of the text or a delimiter?  You may be able to look at this and make some decision, however to write a generic import to cover these types of things is pretty difficult - your assumption is not going to be universal.   

RonaldHof
Calcite | Level 5

Hi Tom,

I see, there is some kind of standard. Next problem is how quick I can motivate the data-deliverer to change this export. That takes a lot of time because this export-format is used by a lot of customers.

'S Gravenhage  (with the quote) is the name of a city in the Netherlands. The comma is the delimiter so Scott is the value of the third column.

Ronald

Tom
Super User Tom
Super User

There is a standard, RCF-4180.   https://www.ietf.org/rfc/rfc4180.txt

 

The only change from that you need to tell them to make is to not send any end of line markers in the value of character variables. 

art297
Opal | Level 21

One possibility would be to correct the file on the fly. e.g.:

data work.OPER_A_NODSD;
  input @;
  _infile_= translate(_infile_,'~',"'");
   infile datalines  delimiter = "," dsd MISSOVER ;
   attrib KEY length = $12; 
   attrib CITY length = $60; 
   attrib NAME length = $50;
   attrib EMPTY length =$1;
   attrib EX length = $1; 
   input KEY CITY NAME EMPTY EX $; 
   city=translate(city,"'",'~');
   name=translate(name,"'",'~');
   put _all_;
datalines;
A,Utrecht,James,,X
B,'s Gravenhage,Scott,N,X
C,'s Gravenhage,Mataw',N,X
D,'s Gravenhage,Mataw',,X
E,'s Gravenhage,Mata'w,,X
;
run;

Art, CEO, AnalystFinder.com

 

 

Cynthia_sas
SAS Super FREQ
Hi:
Aside from all other issues, how do you envision the row for E to be read? Do you envision it like this:
Key=E
City='s Granvenhage
Name=Mata'w
Empty=(blank)
EX = X

Do you eventually want to strip the single quotes out of the variable values, are they correct? Are they incorrect? Are the quotes in the file like this ALWAYS single quotes?

cynthia

RonaldHof
Calcite | Level 5

Hey Cyntia,

 

Your assumption is correct. Variable EMPTY shout be missing and EX the value 'X'. The first dataset with DSD reads this record correct in because there is a character between quote and the delimiter (letter w). Without DSD it goes wrong in my example because the value 'X' appears in EMPTY.

RonaldHof
Calcite | Level 5

I forgot something. The single quotes are part of the delivered data and shoud not be skipped.

art297
Opal | Level 21

Did you try the solution I proposed? I think it does what you want!

 

Art, CEO, AnalystFinder.com

 

RonaldHof
Calcite | Level 5

Hi Art,

 

Parsing every file, what you suggest,will work but is not what I want. Only as a last rescue. Reason for that is that the solution must be implemented in the File Reader transformation in SAS/DI Studio. That is possible but than you have created for every External File that you ant to read a user writen code. Within the File Reader it is not possible to parse then _infile_ lines. That is why I hope that there is an other option in f.i. the infile statement that I can use.

Ronald

Tom
Super User Tom
Super User

If you want to process without DSD option then add a period between the adjacent commas.  Here is how you could do it in the data step that reads the data.

data one ;
  infile cards dlm=',' truncover ;
  input @;
  if _infile_=:',' then _infile_='.'||_infile_;
  do while (index(_infile_,',,'));_infile_=tranwrd(_infile_,',,',',.,'); end;
  input (x1-x6) (:$20.) ;
  put _infile_ @40 (x1-x6) (=) ;
cards4;
A,Utrecht,James,,X
B,'s Gravenhage,Scott,N,X
C,'s Gravenhage,Mataw',N,X
D,'s Gravenhage,Mataw',,X
E,'s Gravenhage,Mata'w,,X
;;;;

If you want to use the DSD option then you need to convert those single quotes into something else so that values that start with a single quote don't cause it hunt for the closing quote. You could translate the quotes to something else and then process all of your character variables to reverse the translation. Make sure to use a character that is not in your data file.

data one ;
  infile cards dsd  truncover ;
  input @;
  _infile_=translate(_infile_,"`","'");
  input (x1-x6) (:$20.) ;
  array _c _character_;
  do over _c; _c=translate(_c,"'","`"); end;
  put _infile_ @40 (x1-x6) (=) ;
cards4;
A,Utrecht,James,,X
B,'s Gravenhage,Scott,N,X
C,'s Gravenhage,Mataw',N,X
D,'s Gravenhage,Mataw',,X
E,'s Gravenhage,Mata'w,,X
;;;;
RonaldHof
Calcite | Level 5

Hi Tom,

This is a solution but difficult to implement in the File Reader in SAS/DI Studio. Most of the replies I get suggest to parse the file with the _infile_ buffer. Not ideal but maybe there is not an other solution.

Ronald

Tom
Super User Tom
Super User

@RonaldHof wrote:

Hi Tom,

This is a solution but difficult to implement in the File Reader in SAS/DI Studio. Most of the replies I get suggest to parse the file with the _infile_ buffer. Not ideal but maybe there is not an other solution.

Ronald


Will DI studio allow you to create a code step that you could use to create a corrected version of the file that you could then read using normal process?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 3163 views
  • 1 like
  • 5 in conversation