Hello SAS-users.
I have to import a lot of .csv files to SAS dataset. The csv delimiter is Tab ('09'x).
The files were imported from a Business Objects query.
Some records are wrapped to two strings in csv files (by Alt+Enter in a description character field).
How can i import these files correctly with infile statement?
I use SAS EG 4.3 under Windows server.
I've tried option termstr="crlf" but the output dataset contains no observations in this case. The log is below:
14 data test;
15 infile "\\sasabm01.region.vtb.ru\UOLib\PLI_FX\Input\2012\Loans_cnt\MSK_20120201_20120215.csv"
16 firstobs=2 dlm='09'x dsd truncover encoding='utf-8' termstr="crlf";
17 informat dt ymddttm19. debet $20. credit $20. sum_rur commax17. txt $100.;
18 input dt -- txt;
19 run;
NOTE: The infile "\\sasabm01.region.vtb.ru\UOLib\PLI_FX\Input\2012\Loans_cnt\MSK_20120201_20120215.csv" is:
Filename=\\sasabm01.region.vtb.ru\UOLib\PLI_FX\Input\2012\Loans_cnt\MSK_20120201_20120215.csv,
RECFM=V,LRECL=1536,File Size (bytes)=1434841,
Last Modified=26Mar2012:11:37:42,
Create Time=26Mar2012:12:06:55
NOTE: 0 records were read from the infile "\\sasabm01.region.vtb.ru\UOLib\PLI_FX\Input\2012\Loans_cnt\MSK_20120201_20120215.csv".
NOTE: The data set WORK.TEST has 0 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
user cpu time 0.03 seconds
system cpu time 0.00 seconds
Memory 22971k
OS Memory 33096k
Timestamp 26.03.2012 14:58:41
I've also tried termstr=crlf (without quotes) but this cause an error:
ERROR: Invalid string.
FATAL: Unrecoverable I/O error detected in the execution of the DATA step program. Aborted during the EXECUTION phase.
When I read a csv (or any variable length record) I always use:
infile myinfile lrecl=32000 missover pad;
input @001 text $char256.;
seems to prevent any missed or skipped records.
Then I can examine variable "text" and rebuilld the input statement with individual variables. The length of "text" can be adjusted to suit the situation.
What version of SAS is running on your SAS server (NOT the version of Enterprise Guide)?
The the TERMSTR option does not really care if you use quotes or not.
I suspect that your CSV file does not have CRLF ('0D0A'x) to mark the end of lines.
Where was the file generated? PC's use CRLF for end of line, but Unix uses just LF.
You change check by getting SAS to dump the file to the SAS log for you.
data _null_;
infile "......" lrecl=50 recfm=f;
input ;
list;
if _n_ > 20 then stop;
run;
Tom,
thanks for your reply.
The version of SAS is 9.2 under Windows server 64x.
Files were generated on my PC workstation under Windows XP from Business Objects query and saved through Business Objects interface on my PC.
If i use default option TERMSTR then i have errors in a few records only (where records are wrapped).
Oleg.
Oleg: Can you attach a copy of the file here?
Art,
i can not attach the copy of the file here. The information is confidential.
OS2Rules: thanks for you reply. I'll try your approach.
Oleg.
When I read a csv (or any variable length record) I always use:
infile myinfile lrecl=32000 missover pad;
input @001 text $char256.;
seems to prevent any missed or skipped records.
Then I can examine variable "text" and rebuilld the input statement with individual variables. The length of "text" can be adjusted to suit the situation.
Note that the "new" TRUNCOVER option can be used instead of MISSOVER PAD.
OS2Rules,
thanks for your advise.
I find a solution by using your code, retain and scan function.
Oleg.
You never explained what your problem was.
Did your lines NOT end with CRLF?
Or did the embedded line breaks include CRLF?
Tom,
how can i determine that?
I suppose that description field was wrapped by entering Alt+Enter in the system where data was generated.
I see that wrapped lines contain square symbol.
I don't know how to explain this more definitely.
Oleg.
I encountered a similar situation where the last field in a text file contained a long comment with embedded crlf's or lf's. This solution begins by reading the variables of the first record as you normally would except that I add a line pointer hold (@). At this point, the problem is that you do not know if the value of txt var is complete or if parts of it are in subsequent lines of the text file. You need a way to determine if the next line is a new record or a continuation of txt var. Since each record begins with a datetime field, you could easily test if the next line begins with a datetime. Here's the code:
data test;
infile "c:\data\EmbeddedLineBreak.txt"
firstobs=2 dlm='09'x dsd end=eof;
informat dt ymddttm19. debet $20. credit $20. sum_rur comma17. txt $100.;
format dt datetime.;
if eof then stop;
input @1 dt -- txt @;
* this section reads the next line to determine if the first
* variable is a datetime. if so, then the next record is found. if not,
* then another part of the txt var is found.;
length _temp $100;
do while ( not eof );
input _temp :$100. @@; * <-- this is important. holds the line pointer.
* if the input function is not missing then you found the
* beginning of the next record;
if input( _temp, ymddttm19. ) ne . then leave; * exit the loop;
* if the input func returns mising, the _temp var must contain part of the txt var.
* concatenate _temp to txt;
txt = dequote( catx( ' ', txt, _temp ));
_error_ = 0;
end;
drop _:;
run;
I created a tab-delimited file in Excel. The records look like this:
dt debit credit sum_rur txt
2012-03-20 09:30:30 client a client b " 10,000 " "This field has an embedded linefeed here
causing a line break in the text file."
2012-03-21 09:30:30 client c client d " 20,000 " "This field has no embedded linefeeds."
2012-03-22 21:30:30 client e client f " 30,000 " "2 embedded linefeeds. The first is here
and another one is here
causing another line break."
Thank you very much for your help.
It works.
I removed the line from your code:
if eof then stop;
Oleg.
As I said before in the first reply to your thread you can use SAS to get a picture of what is in your file using the LIST statement.
Tom,
the embedded line breaks include '0D0A'x and the lines end with '0A'x
Oleg.
Because the end-of-line and middle of line strings are different you might be able to use the file.
For example here is a sample program to convert those embedded CRLF into |.
%let lf='0A'x;
%let cr='0D'x;
filename tmpfile1 temp;
filename tmpfile2 temp;
* Generate a sample file with an embedded CRLF ;
data _null_;
file tmpfile1 recfm=n;
put '1,Line is fine' &LF
'2,Line has ' &CR &LF 'a line break' &LF
'3,Line is fine' &LF
&LF
'5,Previous line is blank' &LF
@;
run;
* Look at it ;
data _null_;
infile tmpfile1 recfm=f ;
input;
list;
run;
* Replace terminal CRLF with | instead ;
data _null_;
infile tmpfile1 length=ls;
file tmpfile2;
input;
if ls then if substr(_infile_,ls)=&CR then do;
substr(_infile_,ls,1)='|';
put _infile_ $varying. ls @;
return;
end;
put _infile_;
run;
* Look at the new file;
data _null_;
infile tmpfile2;
input;
list;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.