some "people" like to have long strings formatted on more than one line in an excel worksheet cell - not with default line breaks, but breaking at points of their choosing.
One way to insert linefeed is to press alt+enter when editing the cell in excel.
When that worksheet is exported as a text or CSV file, the cell with that linefeed is delivered with a '0A'x where alt+enter was pressed.
This is the embedded linefeed I referred to.
That '0A'x causes infile processing to recognise end-of-line. This is before normal input processing allows us to translate the '0A'x into something meaningful, or blank. That '0A'x does not go into the _infile_ buffer.
There are few satisfactory ways of cleaning up the data. The most reliable is to get the originator to replace those embedded linefeeds with character(s) or blank.
In a SAS program the neatest solution is to treat the file as having no record structure (recfm = U or N) and look for a '0A'x without the preceeding '0D'x (CR).
There is a reasonable argument to be made that text files should not contain control characters like linefeed in their data, so these should be removed by excel when creating the CSV file (formulas and macro features and other such bits are removed, so why should that embedded LF not be removed?). SAS Institute seem to listen more than other suppliers.
Rather than wait for Microsoft to accept my suggestion, I think it will be easier to suggest to SAS Institute that the CRLF value for the TERMSTR= infile option means that (and that a stand-alone LF is not the TERMSTR value so should not be treatred as indicating end-of-line.
my preference is to recommend the excel library engine that is part of SAS/Access to PC Files.
Then the embedded line feed remains a problem, among others.
However it is the cleanest connection I've seen.
I have embedded line feeds in a tab delimited text file in Unix land that I "fix" with an "awk" script.
I just tested some stuff with embedded linefeeds in the Excel world.
First was to save the spreadsheet as a .csv file.
When viewed with notepad, the cell values were quoted -- " " -- and there were square symbols where the embedded linefeeds were supposed to be.
But, when I tried to read it in, with the DSD and TERMSTR options set, it was as your frustration is.
Second experiment read the Excel spreadsheet directly -- first row = column1, column2, column3. It read the data properly. "column1" has a linefeed in the midst of its character string, as does "column3", as they were supposed to be.
Third experiment attempts to translate the embedded linefeed = success. It is in fact a linefeed and not a carriage return, and I successfully translated the '0A'x to an 'L'.
In response to
Why doesn't the [ pre ] ... [ / pre ] retain blank lines? The non-pre lines do.
[ pre ] will retain empty lines if they have at least one blank
I do not know if that is intentional or accidental