BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Oleg_L
Obsidian | Level 7

Hello SAS-users.

I have to import a lot of .csv files to SAS dataset. The csv delimiter is Tab ('09'x).

The files were imported from a Business Objects query.

Some records are wrapped to two strings in csv files (by Alt+Enter in a description character field).

How can i import these files correctly with infile statement?

I use SAS EG 4.3 under Windows server.

I've tried option termstr="crlf" but the output dataset contains no observations in this case. The log is below:

14 data test;

15 infile "\\sasabm01.region.vtb.ru\UOLib\PLI_FX\Input\2012\Loans_cnt\MSK_20120201_20120215.csv"

16 firstobs=2 dlm='09'x dsd truncover encoding='utf-8' termstr="crlf";

17 informat dt ymddttm19. debet $20. credit $20. sum_rur commax17. txt $100.;

18 input dt -- txt;

19 run;

NOTE: The infile "\\sasabm01.region.vtb.ru\UOLib\PLI_FX\Input\2012\Loans_cnt\MSK_20120201_20120215.csv" is:

Filename=\\sasabm01.region.vtb.ru\UOLib\PLI_FX\Input\2012\Loans_cnt\MSK_20120201_20120215.csv,

RECFM=V,LRECL=1536,File Size (bytes)=1434841,

Last Modified=26Mar2012:11:37:42,

Create Time=26Mar2012:12:06:55

NOTE: 0 records were read from the infile "\\sasabm01.region.vtb.ru\UOLib\PLI_FX\Input\2012\Loans_cnt\MSK_20120201_20120215.csv".

NOTE: The data set WORK.TEST has 0 observations and 5 variables.

NOTE: DATA statement used (Total process time):

real time 0.02 seconds

user cpu time 0.03 seconds

system cpu time 0.00 seconds

Memory 22971k

OS Memory 33096k

Timestamp 26.03.2012 14:58:41

I've also tried termstr=crlf (without quotes) but this cause an error:

ERROR: Invalid string.

FATAL: Unrecoverable I/O error detected in the execution of the DATA step program. Aborted during the EXECUTION phase.

1 ACCEPTED SOLUTION

Accepted Solutions
OS2Rules
Obsidian | Level 7

When I read a csv (or any variable length record) I always use:

infile myinfile  lrecl=32000 missover pad;

input @001 text $char256.;

seems to prevent any missed or skipped records.

Then I can examine variable "text" and rebuilld the input statement with individual variables.  The length of "text" can be adjusted to suit the situation. 

View solution in original post

18 REPLIES 18
Tom
Super User Tom
Super User

What version of SAS is running on your SAS server (NOT the version of Enterprise Guide)?

The the TERMSTR option does not really care if you use quotes or not.

I suspect that your CSV file does not have CRLF ('0D0A'x) to mark the end of lines.

Where was the file generated?  PC's use CRLF for end of line, but Unix uses just LF.

You change check by getting SAS to dump the file to the SAS log for you.

data _null_;

   infile "......" lrecl=50 recfm=f;

    input ;

    list;

    if _n_ > 20 then stop;

run;

Oleg_L
Obsidian | Level 7

Tom,

thanks for your reply.

The version of SAS is 9.2 under Windows server 64x.

Files were generated on my PC workstation under Windows XP from Business Objects query and saved through Business Objects interface on my PC.

If i use default option TERMSTR then i have errors in a few records only (where records are wrapped).

Oleg.

art297
Opal | Level 21

Oleg: Can you attach a copy of the file here?

Oleg_L
Obsidian | Level 7

Art,

i can not attach the copy of the file here. The information is confidential.

OS2Rules: thanks for you reply. I'll try your approach.

Oleg.

OS2Rules
Obsidian | Level 7

When I read a csv (or any variable length record) I always use:

infile myinfile  lrecl=32000 missover pad;

input @001 text $char256.;

seems to prevent any missed or skipped records.

Then I can examine variable "text" and rebuilld the input statement with individual variables.  The length of "text" can be adjusted to suit the situation. 

Tom
Super User Tom
Super User

Note that the "new" TRUNCOVER option can be used instead of MISSOVER PAD.

Oleg_L
Obsidian | Level 7

OS2Rules,

thanks for your advise.

I find a solution by using your code, retain and scan function.

Oleg.

Tom
Super User Tom
Super User

You never explained what your problem was.

Did your lines NOT end with CRLF?

Or did the embedded line breaks include CRLF?

Oleg_L
Obsidian | Level 7

Tom,

how can i determine that?

I suppose that description field was wrapped by entering Alt+Enter in the system where data was generated.

I see that wrapped lines contain square symbol.

I don't know how to explain this more definitely.

Oleg.

FloydNevseta
Pyrite | Level 9

I encountered a similar situation where the last field in a text file contained a long comment with embedded crlf's or lf's. This solution begins by reading the variables of the first record as you normally would except that I add a line pointer hold (@). At this point, the problem is that you do not know if the value of txt var is complete or if parts of it are in subsequent lines of the text file. You need a way to determine if the next line is a new record or a continuation of txt var. Since each record begins with a datetime field, you could easily test if the next line begins with a datetime. Here's the code:

data test;
infile "c:\data\EmbeddedLineBreak.txt"
   firstobs=2 dlm='09'x dsd end=eof;
informat dt ymddttm19. debet $20. credit $20. sum_rur comma17. txt $100.;
format dt datetime.;
if eof then stop;
input @1 dt -- txt @;

* this section reads the next line to determine if the first
* variable is a datetime. if so, then the next record is found. if not,
* then another part of the txt var is found.;
length _temp $100;
do while ( not eof );
   input _temp :$100. @@; * <-- this is important. holds the line pointer.
   * if the input function is not missing then you found the
   * beginning of the next record;
   if input( _temp, ymddttm19. ) ne . then leave; * exit the loop;
   * if the input func returns mising, the _temp var must contain part of the txt var.
   * concatenate _temp to txt;
   txt = dequote( catx( ' ', txt, _temp ));
   _error_ = 0;
end;
drop _:;
run;

I created a tab-delimited file in Excel. The records look like this:

dt debit credit sum_rur txt

2012-03-20 09:30:30 client a client b " 10,000 " "This field has an embedded linefeed here

causing a line break in the text file."

2012-03-21 09:30:30 client c client d " 20,000 " "This field has no embedded linefeeds."

2012-03-22 21:30:30 client e client f " 30,000 " "2 embedded linefeeds. The first is here

and another one is here

causing another line break."

Oleg_L
Obsidian | Level 7

Thank you very much for your help.

It works.

I removed the line from your code:

    if eof then stop;

Oleg.

Tom
Super User Tom
Super User

As I said before in the first reply to your thread you can use SAS to get a picture of what is in your file using the LIST statement.

Oleg_L
Obsidian | Level 7

Tom,

the embedded line breaks include '0D0A'x and the lines end with '0A'x

Oleg.

Tom
Super User Tom
Super User

Because the end-of-line and middle of line strings are different you might be able to use the file.

For example here is a sample program to convert those embedded CRLF into |.

%let lf='0A'x;

%let cr='0D'x;

filename tmpfile1 temp;

filename tmpfile2 temp;

* Generate a sample file with an embedded CRLF ;

data _null_;

  file tmpfile1 recfm=n;

  put '1,Line is fine' &LF

      '2,Line has ' &CR &LF 'a line break' &LF

      '3,Line is fine' &LF

      &LF

      '5,Previous line is blank' &LF

  @;

run;

* Look at it ;

data _null_;

  infile tmpfile1 recfm=f ;

  input;

  list;

run;

* Replace terminal CRLF with | instead ;

data _null_;

  infile tmpfile1 length=ls;

  file tmpfile2;

  input;

  if ls then if substr(_infile_,ls)=&CR then do;

    substr(_infile_,ls,1)='|';

    put _infile_ $varying. ls @;

    return;

  end;

  put _infile_;

run;

* Look at the new file;

data _null_;

  infile tmpfile2;

  input;

  list;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 10857 views
  • 2 likes
  • 6 in conversation