BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Would the infile option TERMSTR= CRLF solve the problem of embedded line feeds in CSV files created by excel being read in a DATA step ?

The option is highlighted on latest edition of "The Missing Semicolon" newsletter from http://www.sys-seminar.com/sas_publications.php.

Not running SAS on a windows platform right now, I'm unable to test such a simple alternative to this old data line parsing problem.

PeterC
11 REPLIES 11
deleted_user
Not applicable
follow-up
no success.
Despite setting the INFILE option TERMSTR=CRLF, the line is still split into _infile_ buffers based on the LF 😞

Is that a defect ?

If not, it implies that CRLF may as well be LF .... imho


PeterC
deleted_user
Not applicable
What do you mean by "embedded" and why is that buggering you up?
Peter_C
Rhodochrosite | Level 12
some "people" like to have long strings formatted on more than one line in an excel worksheet cell - not with default line breaks, but breaking at points of their choosing.
One way to insert linefeed is to press alt+enter when editing the cell in excel.

When that worksheet is exported as a text or CSV file, the cell with that linefeed is delivered with a '0A'x where alt+enter was pressed.
This is the embedded linefeed I referred to.
That '0A'x causes infile processing to recognise end-of-line. This is before normal input processing allows us to translate the '0A'x into something meaningful, or blank. That '0A'x does not go into the _infile_ buffer.

There are few satisfactory ways of cleaning up the data. The most reliable is to get the originator to replace those embedded linefeeds with character(s) or blank.

In a SAS program the neatest solution is to treat the file as having no record structure (recfm = U or N) and look for a '0A'x without the preceeding '0D'x (CR).

There is a reasonable argument to be made that text files should not contain control characters like linefeed in their data, so these should be removed by excel when creating the CSV file (formulas and macro features and other such bits are removed, so why should that embedded LF not be removed?). SAS Institute seem to listen more than other suppliers.

Rather than wait for Microsoft to accept my suggestion, I think it will be easier to suggest to SAS Institute that the CRLF value for the TERMSTR= infile option means that (and that a stand-alone LF is not the TERMSTR value so should not be treatred as indicating end-of-line.

Am I being unreasonable ?

PeterC
deleted_user
Not applicable
No, you are not being unreasonable.

But, it is what it is right now.

Could you not use exported text files and just read the excel spreadsheet directly? We do.
Peter_C
Rhodochrosite | Level 12
do any of your workbooks contain embedded linefeeds?
Peter_C
Rhodochrosite | Level 12
my preference is to recommend the excel library engine that is part of SAS/Access to PC Files.
Then the embedded line feed remains a problem, among others.
However it is the cleanest connection I've seen.

have you something better?

PeterC
deleted_user
Not applicable
I have embedded line feeds in a tab delimited text file in Unix land that I "fix" with an "awk" script.

I just tested some stuff with embedded linefeeds in the Excel world.

First was to save the spreadsheet as a .csv file.
When viewed with notepad, the cell values were quoted -- " " -- and there were square symbols where the embedded linefeeds were supposed to be.
But, when I tried to read it in, with the DSD and TERMSTR options set, it was as your frustration is.

Second experiment read the Excel spreadsheet directly -- first row = column1, column2, column3. It read the data properly. "column1" has a linefeed in the midst of its character string, as does "column3", as they were supposed to be.

Third experiment attempts to translate the embedded linefeed = success. It is in fact a linefeed and not a carriage return, and I successfully translated the '0A'x to an 'L'.
deleted_user
Not applicable
sounds good
are you able to demonstrate code, either here, or in San Antonio next week?
deleted_user
Not applicable
?????

I'm in Wilson, NC.

My "play" code is as follows:

[pre]
data dummy;
length in_line $256;
infile 'book1.csv' termstr=CRLF DSD;
input;
in_line = _infile_;
run;
quit;

proc print;
run;
quit;

libname book1 excel "book1.xls";

data dummy;
set book1."book1$"n;
column1 = translate(column1,'CR','0D'x,'LF','0A'x);
run;
quit;

proc print;
run;
quit;
[/pre]


=====================================================

Why doesn't the [ pre ] ... [ / pre ] retain blank lines? The non-pre lines do.
deleted_user
Not applicable
Thank you Chuck

as I posted earlier, I recommend using that excel libname engine for access to excel data. ( the working method of your second data step)

Just sometimes that is not available....

My reference to San Antonio was because that's where SAS Global Forum will be, next week.

PeterC
deleted_user
Not applicable
In response to
=====================================================
Why doesn't the [ pre ] ... [ / pre ] retain blank lines? The non-pre lines do.
=====================================================

[ pre ] will retain empty lines if they have at least one blank

I do not know if that is intentional or accidental


PeterC

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2032 views
  • 0 likes
  • 2 in conversation