02-06-2017 02:23 PM
I'm trying to remove soft returns that are embedded within some records included in a .csv file. I've read several threads about dealing with this issue and I've tried different options (including TERMSTR=CR and TERMSTR=LF) but the problem records always end with the soft return, putting the remaining portion of the record into the next iteration. Any help is appreciated. See my code below. I had to add the truncover option because I was only able to read in every other record. Maybe truncover negates the TERMSTR option?
infile "c:\temp\test.csv" lrecl=2000 truncover TERMSTR=CRLF;* firstobs=1;
length lines $2000;
input lines $1-2000;
keep lines tag;
if _N_ eq 1 then do;
find_CR_re=prxparse('/\n/'); /* Find carriage returns */
if prxmatch(find_CR_re,lines) then tag=1;
if substr(strip(lines),1,13) eq '5555555555555'; /* Looking for a specific record known to have an embedded carriage return */
02-06-2017 03:24 PM
Did your CSV file orginate as Excel where people were randomly entering text and used the Alt-Enter to create rows of text in a cell?
If so your best bet may be to go back to Excel, hightlight the column(s), go to format cells, alignment and remove the check mark in "wrap text". If it is grayed, then click it twice. Repeat for each afflicted column. The re-save as CSV.
02-06-2017 03:34 PM
That's a fine suggestion. However, this is an extract from a database that can't be altered. Extracts are generated from the raw data entered without modification by rule.
These are archived data and probably either entered at some time just as you mentioned, or it was a cut and paste job from a similar document, retaining the carriage returns.
I'm really searching for something that can be used repeatedly and automated using SAS. Maybe there's not a straigtforward solution to this problem. It's interesting these files can be opened correctly in Excel but not in SAS. It seems like there would be a way to ask SAS to behave like excel when opening these files.
Thanks for your reply.
02-06-2017 05:37 PM
I find the statement
. However, this is an extract from a database that can't be altered. Extracts are generated from the raw data entered without modification by rule.
And a requirement to remove somthing, the linefeed, a tad incompatible.
Are you going to be doing this with many files or is this a one time shot? If this is going to be repeated then I strongly suggest working with whoever provides the extract to remove those characters before exporting them, possibly replacing them with some other character to indicate where the original break was which may allow you to put them back in after reading. Or do the Open in Excel, format cells and save as CSV. Every time.
02-06-2017 06:25 PM
Yeah. These are historical records entered via an older system that need to remain intact. For now I'd like to solve the problem under the constraints that I've been given. It would be better for me if I could find a solution. This is going to be needed every time an extract is created which uses records from the older system.
I found a work-around using the code below, but it's not as clean as I'd like it to be. It appears to work for at least one of the extracts.
Thanks for the feedback.
infile "&input_file" lrecl=2000 truncover;
length lines holdover $2000;
input lines $1-2000;
keep lines ;
if _N_ eq 1 then holdover=''; * Initialize holdover;
if holdover ne '' then lines=catx('',holdover,lines); * Append this record to the previous one if there were fewer than 185 commas in the previous record;
if countc(lines,',') ge 185 then holdover=''; * If there are fewer than 185 commas then record has likely been truncated. Assign to holdover.;