DATA Step, Macro, Functions and more

embedded lineFeed from excel

Reply
N/A
Posts: 0

embedded lineFeed from excel

Would the infile option TERMSTR= CRLF solve the problem of embedded line feeds in CSV files created by excel being read in a DATA step ?

The option is highlighted on latest edition of "The Missing Semicolon" newsletter from http://www.sys-seminar.com/sas_publications.php.

Not running SAS on a windows platform right now, I'm unable to test such a simple alternative to this old data line parsing problem.

PeterC
N/A
Posts: 0

Re: embedded lineFeed from excel

Posted in reply to deleted_user
follow-up
no success.
Despite setting the INFILE option TERMSTR=CRLF, the line is still split into _infile_ buffers based on the LF :-(

Is that a defect ?

If not, it implies that CRLF may as well be LF .... imho


PeterC
N/A
Posts: 0

Re: embedded lineFeed from excel

Posted in reply to deleted_user
What do you mean by "embedded" and why is that buggering you up?
Valued Guide
Posts: 2,177

Re: embedded lineFeed from excel

Posted in reply to deleted_user
some "people" like to have long strings formatted on more than one line in an excel worksheet cell - not with default line breaks, but breaking at points of their choosing.
One way to insert linefeed is to press alt+enter when editing the cell in excel.

When that worksheet is exported as a text or CSV file, the cell with that linefeed is delivered with a '0A'x where alt+enter was pressed.
This is the embedded linefeed I referred to.
That '0A'x causes infile processing to recognise end-of-line. This is before normal input processing allows us to translate the '0A'x into something meaningful, or blank. That '0A'x does not go into the _infile_ buffer.

There are few satisfactory ways of cleaning up the data. The most reliable is to get the originator to replace those embedded linefeeds with character(s) or blank.

In a SAS program the neatest solution is to treat the file as having no record structure (recfm = U or N) and look for a '0A'x without the preceeding '0D'x (CR).

There is a reasonable argument to be made that text files should not contain control characters like linefeed in their data, so these should be removed by excel when creating the CSV file (formulas and macro features and other such bits are removed, so why should that embedded LF not be removed?). SAS Institute seem to listen more than other suppliers.

Rather than wait for Microsoft to accept my suggestion, I think it will be easier to suggest to SAS Institute that the CRLF value for the TERMSTR= infile option means that (and that a stand-alone LF is not the TERMSTR value so should not be treatred as indicating end-of-line.

Am I being unreasonable ?

PeterC
N/A
Posts: 0

Re: embedded lineFeed from excel

No, you are not being unreasonable.

But, it is what it is right now.

Could you not use exported text files and just read the excel spreadsheet directly? We do.
Valued Guide
Posts: 2,177

Re: embedded lineFeed from excel

Posted in reply to deleted_user
do any of your workbooks contain embedded linefeeds?
Valued Guide
Posts: 2,177

Re: embedded lineFeed from excel

Posted in reply to deleted_user
my preference is to recommend the excel library engine that is part of SAS/Access to PC Files.
Then the embedded line feed remains a problem, among others.
However it is the cleanest connection I've seen.

have you something better?

PeterC
N/A
Posts: 0

Re: embedded lineFeed from excel

I have embedded line feeds in a tab delimited text file in Unix land that I "fix" with an "awk" script.

I just tested some stuff with embedded linefeeds in the Excel world.

First was to save the spreadsheet as a .csv file.
When viewed with notepad, the cell values were quoted -- " " -- and there were square symbols where the embedded linefeeds were supposed to be.
But, when I tried to read it in, with the DSD and TERMSTR options set, it was as your frustration is.

Second experiment read the Excel spreadsheet directly -- first row = column1, column2, column3. It read the data properly. "column1" has a linefeed in the midst of its character string, as does "column3", as they were supposed to be.

Third experiment attempts to translate the embedded linefeed = success. It is in fact a linefeed and not a carriage return, and I successfully translated the '0A'x to an 'L'.
N/A
Posts: 0

Re: embedded lineFeed from excel

Posted in reply to deleted_user
sounds good
are you able to demonstrate code, either here, or in San Antonio next week?
N/A
Posts: 0

Re: embedded lineFeed from excel

Posted in reply to deleted_user
?????

I'm in Wilson, NC.

My "play" code is as follows:

[pre]
data dummy;
length in_line $256;
infile 'book1.csv' termstr=CRLF DSD;
input;
in_line = _infile_;
run;
quit;

proc print;
run;
quit;

libname book1 excel "book1.xls";

data dummy;
set book1."book1$"n;
column1 = translate(column1,'CR','0D'x,'LF','0A'x);
run;
quit;

proc print;
run;
quit;
[/pre]


=====================================================

Why doesn't the [ pre ] ... [ / pre ] retain blank lines? The non-pre lines do.
N/A
Posts: 0

Re: embedded lineFeed from excel

Posted in reply to deleted_user
Thank you Chuck

as I posted earlier, I recommend using that excel libname engine for access to excel data. ( the working method of your second data step)

Just sometimes that is not available....

My reference to San Antonio was because that's where SAS Global Forum will be, next week.

PeterC
N/A
Posts: 0

Re: embedded lineFeed from excel

Posted in reply to deleted_user
In response to
=====================================================
Why doesn't the [ pre ] ... [ / pre ] retain blank lines? The non-pre lines do.
=====================================================

[ pre ] will retain empty lines if they have at least one blank

I do not know if that is intentional or accidental


PeterC
Ask a Question
Discussion stats
  • 11 replies
  • 327 views
  • 0 likes
  • 2 in conversation