DATA Step, Macro, Functions and more

Problem reading csv files. (Some records are wrapped).

Accepted Solution Solved
Reply
Regular Contributor
Posts: 151
Accepted Solution

Problem reading csv files. (Some records are wrapped).

Hello SAS-users.

I have to import a lot of .csv files to SAS dataset. The csv delimiter is Tab ('09'x).

The files were imported from a Business Objects query.

Some records are wrapped to two strings in csv files (by Alt+Enter in a description character field).

How can i import these files correctly with infile statement?

I use SAS EG 4.3 under Windows server.

I've tried option termstr="crlf" but the output dataset contains no observations in this case. The log is below:

14 data test;

15 infile "\\sasabm01.region.vtb.ru\UOLib\PLI_FX\Input\2012\Loans_cnt\MSK_20120201_20120215.csv"

16 firstobs=2 dlm='09'x dsd truncover encoding='utf-8' termstr="crlf";

17 informat dt ymddttm19. debet $20. credit $20. sum_rur commax17. txt $100.;

18 input dt -- txt;

19 run;

NOTE: The infile "\\sasabm01.region.vtb.ru\UOLib\PLI_FX\Input\2012\Loans_cnt\MSK_20120201_20120215.csv" is:

Filename=\\sasabm01.region.vtb.ru\UOLib\PLI_FX\Input\2012\Loans_cnt\MSK_20120201_20120215.csv,

RECFM=V,LRECL=1536,File Size (bytes)=1434841,

Last Modified=26Mar2012:11:37:42,

Create Time=26Mar2012:12:06:55

NOTE: 0 records were read from the infile "\\sasabm01.region.vtb.ru\UOLib\PLI_FX\Input\2012\Loans_cnt\MSK_20120201_20120215.csv".

NOTE: The data set WORK.TEST has 0 observations and 5 variables.

NOTE: DATA statement used (Total process time):

real time 0.02 seconds

user cpu time 0.03 seconds

system cpu time 0.00 seconds

Memory 22971k

OS Memory 33096k

Timestamp 26.03.2012 14:58:41

I've also tried termstr=crlf (without quotes) but this cause an error:

ERROR: Invalid string.

FATAL: Unrecoverable I/O error detected in the execution of the DATA step program. Aborted during the EXECUTION phase.


Accepted Solutions
Solution
‎03-26-2012 01:24 PM
Super Contributor
Posts: 358

Problem reading csv files. (Some records are wrapped).

When I read a csv (or any variable length record) I always use:

infile myinfile  lrecl=32000 missover pad;

input @001 text $char256.;

seems to prevent any missed or skipped records.

Then I can examine variable "text" and rebuilld the input statement with individual variables.  The length of "text" can be adjusted to suit the situation. 

View solution in original post


All Replies
Super User
Super User
Posts: 6,502

Problem reading csv files. (Some records are wrapped).

What version of SAS is running on your SAS server (NOT the version of Enterprise Guide)?

The the TERMSTR option does not really care if you use quotes or not.

I suspect that your CSV file does not have CRLF ('0D0A'x) to mark the end of lines.

Where was the file generated?  PC's use CRLF for end of line, but Unix uses just LF.

You change check by getting SAS to dump the file to the SAS log for you.

data _null_;

   infile "......" lrecl=50 recfm=f;

    input ;

    list;

    if _n_ > 20 then stop;

run;

Regular Contributor
Posts: 151

Problem reading csv files. (Some records are wrapped).

Tom,

thanks for your reply.

The version of SAS is 9.2 under Windows server 64x.

Files were generated on my PC workstation under Windows XP from Business Objects query and saved through Business Objects interface on my PC.

If i use default option TERMSTR then i have errors in a few records only (where records are wrapped).

Oleg.

PROC Star
Posts: 7,364

Problem reading csv files. (Some records are wrapped).

Oleg: Can you attach a copy of the file here?

Regular Contributor
Posts: 151

Problem reading csv files. (Some records are wrapped).

Art,

i can not attach the copy of the file here. The information is confidential.

OS2Rules: thanks for you reply. I'll try your approach.

Oleg.

Solution
‎03-26-2012 01:24 PM
Super Contributor
Posts: 358

Problem reading csv files. (Some records are wrapped).

When I read a csv (or any variable length record) I always use:

infile myinfile  lrecl=32000 missover pad;

input @001 text $char256.;

seems to prevent any missed or skipped records.

Then I can examine variable "text" and rebuilld the input statement with individual variables.  The length of "text" can be adjusted to suit the situation. 

Super User
Super User
Posts: 6,502

Problem reading csv files. (Some records are wrapped).

Note that the "new" TRUNCOVER option can be used instead of MISSOVER PAD.

Regular Contributor
Posts: 151

Problem reading csv files. (Some records are wrapped).

OS2Rules,

thanks for your advise.

I find a solution by using your code, retain and scan function.

Oleg.

Super User
Super User
Posts: 6,502

Problem reading csv files. (Some records are wrapped).

You never explained what your problem was.

Did your lines NOT end with CRLF?

Or did the embedded line breaks include CRLF?

Regular Contributor
Posts: 151

Problem reading csv files. (Some records are wrapped).

Tom,

how can i determine that?

I suppose that description field was wrapped by entering Alt+Enter in the system where data was generated.

I see that wrapped lines contain square symbol.

I don't know how to explain this more definitely.

Oleg.

Frequent Contributor
Posts: 101

Problem reading csv files. (Some records are wrapped).

I encountered a similar situation where the last field in a text file contained a long comment with embedded crlf's or lf's. This solution begins by reading the variables of the first record as you normally would except that I add a line pointer hold (@). At this point, the problem is that you do not know if the value of txt var is complete or if parts of it are in subsequent lines of the text file. You need a way to determine if the next line is a new record or a continuation of txt var. Since each record begins with a datetime field, you could easily test if the next line begins with a datetime. Here's the code:

data test;
infile "c:\data\EmbeddedLineBreak.txt"
   firstobs=2 dlm='09'x dsd end=eof;
informat dt ymddttm19. debet $20. credit $20. sum_rur comma17. txt $100.;
format dt datetime.;
if eof then stop;
input @1 dt -- txt @;

* this section reads the next line to determine if the first
* variable is a datetime. if so, then the next record is found. if not,
* then another part of the txt var is found.;
length _temp $100;
do while ( not eof );
   input _temp :$100. @@; * <-- this is important. holds the line pointer.
   * if the input function is not missing then you found the
   * beginning of the next record;
   if input( _temp, ymddttm19. ) ne . then leave; * exit the loop;
   * if the input func returns mising, the _temp var must contain part of the txt var.
   * concatenate _temp to txt;
   txt = dequote( catx( ' ', txt, _temp ));
   _error_ = 0;
end;
drop _:;
run;

I created a tab-delimited file in Excel. The records look like this:

dt debit credit sum_rur txt

2012-03-20 09:30:30 client a client b " 10,000 " "This field has an embedded linefeed here

causing a line break in the text file."

2012-03-21 09:30:30 client c client d " 20,000 " "This field has no embedded linefeeds."

2012-03-22 21:30:30 client e client f " 30,000 " "2 embedded linefeeds. The first is here

and another one is here

causing another line break."

Regular Contributor
Posts: 151

Problem reading csv files. (Some records are wrapped).

Thank you very much for your help.

It works.

I removed the line from your code:

    if eof then stop;

Oleg.

Super User
Super User
Posts: 6,502

Problem reading csv files. (Some records are wrapped).

As I said before in the first reply to your thread you can use SAS to get a picture of what is in your file using the LIST statement.

Regular Contributor
Posts: 151

Re: Problem reading csv files. (Some records are wrapped).

Tom,

the embedded line breaks include '0D0A'x and the lines end with '0A'x

Oleg.

Super User
Super User
Posts: 6,502

Re: Problem reading csv files. (Some records are wrapped).

Because the end-of-line and middle of line strings are different you might be able to use the file.

For example here is a sample program to convert those embedded CRLF into |.

%let lf='0A'x;

%let cr='0D'x;

filename tmpfile1 temp;

filename tmpfile2 temp;

* Generate a sample file with an embedded CRLF ;

data _null_;

  file tmpfile1 recfm=n;

  put '1,Line is fine' &LF

      '2,Line has ' &CR &LF 'a line break' &LF

      '3,Line is fine' &LF

      &LF

      '5,Previous line is blank' &LF

  @;

run;

* Look at it ;

data _null_;

  infile tmpfile1 recfm=f ;

  input;

  list;

run;

* Replace terminal CRLF with | instead ;

data _null_;

  infile tmpfile1 length=ls;

  file tmpfile2;

  input;

  if ls then if substr(_infile_,ls)=&CR then do;

    substr(_infile_,ls,1)='|';

    put _infile_ $varying. ls @;

    return;

  end;

  put _infile_;

run;

* Look at the new file;

data _null_;

  infile tmpfile2;

  input;

  list;

run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 18 replies
  • 6328 views
  • 2 likes
  • 6 in conversation