DATA Step, Macro, Functions and more

Removing embedded carriage returns

Reply
Contributor
Posts: 29

Removing embedded carriage returns

I'm trying to remove soft returns that are embedded within some records included in a .csv file.  I've read several threads about dealing with this issue and I've tried different options (including TERMSTR=CR and TERMSTR=LF) but the problem records always end with the soft return, putting the remaining portion of the record into the next iteration.  Any help is appreciated.  See my code below.  I had to add the truncover option because I was only able to read in every other record.  Maybe truncover negates the TERMSTR option?

 

data cleaned;
missing;
infile "c:\temp\test.csv" lrecl=2000 truncover TERMSTR=CRLF;* firstobs=1;
length lines $2000;
input lines $1-2000;
keep lines tag;

if _N_ eq 1 then do;
find_CR_re=prxparse('/\n/'); /* Find carriage returns */
end;
retain find_CR_re;

if prxmatch(find_CR_re,lines) then tag=1;

if substr(strip(lines),1,13) eq '5555555555555'; /* Looking for a specific record known to have an embedded carriage return */
run;

Super Contributor
Posts: 252

Re: Removing embedded carriage returns

Can you attach your test.csv file? It'd be much easier to track things down with that.

Contributor
Posts: 29

Re: Removing embedded carriage returns

Thanks for the reply.  Unfortunately it's a confidential file so I can't attach it.  

Super User
Posts: 11,343

Re: Removing embedded carriage returns

Did your CSV file orginate as Excel where people were randomly entering text and used the Alt-Enter to create rows of text in a cell?

 

If so your best bet may be to go back to Excel, hightlight the column(s), go to format cells, alignment and remove the check mark in "wrap text". If it is grayed, then click it twice. Repeat for each afflicted column. The re-save as CSV.

Contributor
Posts: 29

Re: Removing embedded carriage returns

That's a fine suggestion. However, this is an extract from a database that can't be altered.  Extracts are generated from the raw data entered without modification by rule.  

 

These are archived data and probably either entered at some time just as you mentioned, or it was a cut and paste job from a similar document, retaining the carriage returns.

 

I'm really searching for something that can be used repeatedly and automated using SAS.  Maybe there's not a straigtforward solution to this problem.  It's interesting these files can be opened correctly in Excel but not in SAS.  It seems like there would be a way to ask SAS to behave like excel when opening these files.

 

Thanks for your reply.

Super User
Posts: 11,343

Re: Removing embedded carriage returns

I find the statement


Ryanb2 wrote:

. However, this is an extract from a database that can't be altered.  Extracts are generated from the raw data entered without modification by rule.  

 


And a requirement to remove somthing, the linefeed, a tad incompatible.

 

Are you going to be doing this with many files or is this a one time shot? If this is going to be repeated then I strongly suggest working with whoever provides the extract to remove those characters before exporting them, possibly replacing them with some other character to indicate where the original break was which may allow you to put them back in after reading. Or do the Open in Excel, format cells and save as CSV. Every time.

Contributor
Posts: 29

Re: Removing embedded carriage returns

Yeah.  These are historical records entered via an older system that need to remain intact.  For now I'd like to solve the problem under the constraints that I've been given.  It would be better for me if I could find a solution.  This is going to be needed every time an extract is created which uses records from the older system.

 

I found a work-around using the code below, but it's not as clean as I'd like it to be.  It appears to work for at least one of the extracts.

 

Thanks for the feedback.

 

data cleaned;
missing;
infile "&input_file" lrecl=2000 truncover;
length lines holdover $2000;
input lines $1-2000;
keep lines ;
retain holdover;

if _N_ eq 1 then holdover=''; * Initialize holdover;

if holdover ne '' then lines=catx('',holdover,lines); * Append this record to the previous one if there were fewer than 185 commas in the previous record;

if countc(lines,',') ge 185 then holdover=''; * If there are fewer than 185 commas then record has likely been truncated. Assign to holdover.;
else do;
holdover=lines; delete;
end;
run;

Ask a Question
Discussion stats
  • 6 replies
  • 197 views
  • 3 likes
  • 3 in conversation