I encountered a similar situation where the last field in a text file contained a long comment with embedded crlf's or lf's. This solution begins by reading the variables of the first record as you normally would except that I add a line pointer hold (@). At this point, the problem is that you do not know if the value of txt var is complete or if parts of it are in subsequent lines of the text file. You need a way to determine if the next line is a new record or a continuation of txt var. Since each record begins with a datetime field, you could easily test if the next line begins with a datetime. Here's the code: data test; infile "c:\data\EmbeddedLineBreak.txt" firstobs=2 dlm='09'x dsd end=eof; informat dt ymddttm19. debet $20. credit $20. sum_rur comma17. txt $100.; format dt datetime.; if eof then stop; input @1 dt -- txt @; * this section reads the next line to determine if the first * variable is a datetime. if so, then the next record is found. if not, * then another part of the txt var is found.; length _temp $100; do while ( not eof ); input _temp :$100. @@; * <-- this is important. holds the line pointer. * if the input function is not missing then you found the * beginning of the next record; if input( _temp, ymddttm19. ) ne . then leave; * exit the loop; * if the input func returns mising, the _temp var must contain part of the txt var. * concatenate _temp to txt; txt = dequote( catx( ' ', txt, _temp )); _error_ = 0; end; drop _:; run; I created a tab-delimited file in Excel. The records look like this: dt debit credit sum_rur txt 2012-03-20 09:30:30 client a client b " 10,000 " "This field has an embedded linefeed here causing a line break in the text file." 2012-03-21 09:30:30 client c client d " 20,000 " "This field has no embedded linefeeds." 2012-03-22 21:30:30 client e client f " 30,000 " "2 embedded linefeeds. The first is here and another one is here causing another line break."
... View more