SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

CSV file contains embedded CR and LF inside double quotes. How to handle in DI 3.4?

Reply
Contributor SYN
Contributor
Posts: 39

CSV file contains embedded CR and LF inside double quotes. How to handle in DI 3.4?

CSV file contains embedded carriage return (CR) and linefeed characters (LF) inside double quotes.

Please guide me how to handle these fields in DI Studio 3.4

Thanks in advance.

Super Contributor
Posts: 349

Re: CSV file contains embedded CR and LF inside double quotes. How to handle in DI 3.4?

Hi Syn,

Just check if your CSV file is corrupted,if not just get original CSV file and try...

Are you converting your files to CSV format from different format?

Thanks,

Shiva

Contributor SYN
Contributor
Posts: 39

Re: CSV file contains embedded CR and LF inside double quotes. How to handle in DI 3.4?

CSV file contains embedded carriage return (CR) and linefeed characters (LF) inside double quotes.

Please guide me how to handle these fields in DI Studio 3.4

Thanks in advance.

Super User
Super User
Posts: 6,502

Re: CSV file contains embedded CR and LF inside double quotes. How to handle in DI 3.4?

I have no idea about DI Studio but here are some things to test using SAS.

Check that whether it has embedded CRLF or just one or the other.  On the INFILE statement you can tell SAS to treat CRLF as the end of line marker (standard used by Windows) using the TERMSTR=CRLF option.  Then embedded CR or LF characters that do not bump up next to each other to form a CRLF pair will be read.

If you still have trouble then convert the file using a SAS data step or other programming language.

Read the file as bytes and keep your own counter so the program can tell when it is inside a quoted string and then replace existing CRLF with something else.

Frequent Contributor
Posts: 78

Re: CSV file contains embedded CR and LF inside double quotes. How to handle in DI 3.4?

Tom,

You've replied about termstr=CRLF many times.  I know, becuase I just searched for "embedded linefeed" (a problem I needed help with) and found them.  It solved my problem!

Respected Advisor
Posts: 3,895

Re: CSV file contains embedded CR and LF inside double quotes. How to handle in DI 3.4?

You could implement a user written transformation which has as input your external file, removes all CR and LF which are inside of quotes and then writes the result to a cleaned new .csv. You then use the file reader on this cleaned version.

You could use a global macro variable as part of the filename of the external file and you then change the value of this macro variable inside the user written transformation (e.g. from myfile.csv to myfile_clean.csv). This way you can use the same external file metadata for both .csv versions.

As for Tom's suggestion: I believe you have to go with the second approach ("read the file as bytes") as TERMSTR=CRLF has only been introduced with SAS9.2

Respected Advisor
Posts: 3,895

Re: CSV file contains embedded CR and LF inside double quotes. How to handle in DI 3.4?

DIS 3.4 means you are still using SAS 9.1.3

Make sure you test the effect of termstr thoroughly as according to this note it's not properly working with 9.1.3 37159 - TERMSTR does not work correctly prior to SAS 9.2

...but may be there has been a patch and the note is outdated.

It's a bit of a hack but what worked for me in the past for removing unwanted characters from the input buffer before mapping variables against it, is to "inject" code as below:

In the external file definition add the following (the ';' at the beginning terminates the INFILE statement):

Capture.PNG

Using the File Reader you then get code as below:

data <table>;

   infile '<external file>'

          lrecl = 1000

          delimiter = ','

          dsd

          missover

          firstobs = 6

             ;

      input @;

      _infile_=translate(_infile_,' ','0A'x,' ','0D'x);

   ;

   ;

<attrib statements>

 

   input <list of input variables mapped agains cleansed input buffer>;

 

run;

Super User
Posts: 9,682

Re: CSV file contains embedded CR and LF inside double quotes. How to handle in DI 3.4?

Or you can use recfm= option to let SAS take this csv file as a stream file( only has one row) .

infile '.......'  recfm=n ;

Xia Keshan

Ask a Question
Discussion stats
  • 7 replies
  • 2629 views
  • 1 like
  • 6 in conversation