BookmarkSubscribeRSS Feed
Ryanb2
Quartz | Level 8

I'm trying to remove soft returns that are embedded within some records included in a .csv file.  I've read several threads about dealing with this issue and I've tried different options (including TERMSTR=CR and TERMSTR=LF) but the problem records always end with the soft return, putting the remaining portion of the record into the next iteration.  Any help is appreciated.  See my code below.  I had to add the truncover option because I was only able to read in every other record.  Maybe truncover negates the TERMSTR option?

 

data cleaned;
missing;
infile "c:\temp\test.csv" lrecl=2000 truncover TERMSTR=CRLF;* firstobs=1;
length lines $2000;
input lines $1-2000;
keep lines tag;

if _N_ eq 1 then do;
find_CR_re=prxparse('/\n/'); /* Find carriage returns */
end;
retain find_CR_re;

if prxmatch(find_CR_re,lines) then tag=1;

if substr(strip(lines),1,13) eq '5555555555555'; /* Looking for a specific record known to have an embedded carriage return */
run;

6 REPLIES 6
LaurieF
Barite | Level 11

Can you attach your test.csv file? It'd be much easier to track things down with that.

Ryanb2
Quartz | Level 8

Thanks for the reply.  Unfortunately it's a confidential file so I can't attach it.  

ballardw
Super User

Did your CSV file orginate as Excel where people were randomly entering text and used the Alt-Enter to create rows of text in a cell?

 

If so your best bet may be to go back to Excel, hightlight the column(s), go to format cells, alignment and remove the check mark in "wrap text". If it is grayed, then click it twice. Repeat for each afflicted column. The re-save as CSV.

Ryanb2
Quartz | Level 8

That's a fine suggestion. However, this is an extract from a database that can't be altered.  Extracts are generated from the raw data entered without modification by rule.  

 

These are archived data and probably either entered at some time just as you mentioned, or it was a cut and paste job from a similar document, retaining the carriage returns.

 

I'm really searching for something that can be used repeatedly and automated using SAS.  Maybe there's not a straigtforward solution to this problem.  It's interesting these files can be opened correctly in Excel but not in SAS.  It seems like there would be a way to ask SAS to behave like excel when opening these files.

 

Thanks for your reply.

ballardw
Super User

I find the statement


@Ryanb2 wrote:

. However, this is an extract from a database that can't be altered.  Extracts are generated from the raw data entered without modification by rule.  

 


And a requirement to remove somthing, the linefeed, a tad incompatible.

 

Are you going to be doing this with many files or is this a one time shot? If this is going to be repeated then I strongly suggest working with whoever provides the extract to remove those characters before exporting them, possibly replacing them with some other character to indicate where the original break was which may allow you to put them back in after reading. Or do the Open in Excel, format cells and save as CSV. Every time.

Ryanb2
Quartz | Level 8

Yeah.  These are historical records entered via an older system that need to remain intact.  For now I'd like to solve the problem under the constraints that I've been given.  It would be better for me if I could find a solution.  This is going to be needed every time an extract is created which uses records from the older system.

 

I found a work-around using the code below, but it's not as clean as I'd like it to be.  It appears to work for at least one of the extracts.

 

Thanks for the feedback.

 

data cleaned;
missing;
infile "&input_file" lrecl=2000 truncover;
length lines holdover $2000;
input lines $1-2000;
keep lines ;
retain holdover;

if _N_ eq 1 then holdover=''; * Initialize holdover;

if holdover ne '' then lines=catx('',holdover,lines); * Append this record to the previous one if there were fewer than 185 commas in the previous record;

if countc(lines,',') ge 185 then holdover=''; * If there are fewer than 185 commas then record has likely been truncated. Assign to holdover.;
else do;
holdover=lines; delete;
end;
run;

sas-innovate-2024.png

Today is the last day to save with the early bird rate! Register today for just $695 - $100 off the standard rate.

 

Plus, pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1871 views
  • 3 likes
  • 3 in conversation