Carriage Returns - Need to be removed

Accepted Solution Solved
Reply
Super Contributor
Posts: 418
Accepted Solution

Carriage Returns - Need to be removed

Hello everyone. I have come across numerous csv files that have carraige returns, and line feeds, and the combination of Carraige Returns+Line Feeds  within large sections of comment data.

Does anyone know a process that can be used to properly read in these files that does not use the popular infile "sharebuffers" trick?

I am operating in a windows environment. My current process is to open the file within excel, add a column of smiley faces to the end of the csv file, and then open cygwin to strip out all CR , LF, or CR+LF and then replace the smiley faces with CR+LF. This process works fine, however I'd like to remove excel and Cygwin completely from the equation for obvious reasons.

Thanks!

Brandon



Accepted Solutions
Solution
‎02-04-2013 07:05 PM
Super User
Super User
Posts: 6,323

Re: Carriage Returns - Need to be removed

One question is how messy are the files.  If the problem is just that the text in some cells contain CR or LF (but not any CR+LF combinations) then you should be able to read the file properly with SAS by using the TERMSTR=CRLF option on the INFILE statement.

Next level is if the file does contain CR+LF embedded in the middle of a cell, but those cells are properly quoted then you might be able to fix it by pre-processing the files with SAS.  See this thread. 

View solution in original post


All Replies
Super Contributor
Posts: 418

Re: Carriage Returns - Need to be removed

If anyone can help with this problem it would be greatly appreciated.

The smiley face method is now causing issues because users are adding free string text to the end of their excel files, however the business would like these files to be identified so they can go in and concatenate the string variables into the final column (like they're supposed to be).

However since the files have carriage return and line feeds in them, I cannot read them using any standard fashion. This combination of problems has basically rendered the business request impossible, unless I can get some assistance.

Thank you kindly!

Brandon

Solution
‎02-04-2013 07:05 PM
Super User
Super User
Posts: 6,323

Re: Carriage Returns - Need to be removed

One question is how messy are the files.  If the problem is just that the text in some cells contain CR or LF (but not any CR+LF combinations) then you should be able to read the file properly with SAS by using the TERMSTR=CRLF option on the INFILE statement.

Next level is if the file does contain CR+LF embedded in the middle of a cell, but those cells are properly quoted then you might be able to fix it by pre-processing the files with SAS.  See this thread. 

Super Contributor
Posts: 418

Re: Carriage Returns - Need to be removed

Hello Tom. Sadly I cannot use the TERMSTR option (this is what I did when I was taught SAS recently), and I also cannot use an infile - file option, as the software my company has does not have either of these options.

My question is basically is this possible without the use of the TERMSTR option?

Thanks!


Super User
Super User
Posts: 6,323

Re: Carriage Returns - Need to be removed

Read the answer to the thread I linked. It should work for any CSV file that Excel would write (or be able to read).

Super Contributor
Posts: 418

Re: Carriage Returns - Need to be removed

Hiya Tom. I did read your thread and I tried the code and it did not work sadly. I am not using base SAS, but a SAS emulator, and sadly this emulator does not have the TERMSTR option or the ability to read in a csv file and then output to it using the infile , file option (Which it looks like your code did). to make sure we're on the same page, when you say the code that should work, you're refering to the one that was selected as "best answer" correct?

Thanks again!

Brandon

Super User
Super User
Posts: 6,323

Re: Carriage Returns - Need to be removed

You do not need TERMSTR, or use shared buffers to modify the original file.  Just read from the original file and create a new file that has the extra line breaks removed.


filename old 'original.csv';

filename new 'fixed.csv';


data _null_ ;

  if eof then put 'NOTE: Records read=' newn 'Records with missing quotes=' missq ;

  infile old lrecl=10000 end=eof ;

  file new lrecl=10000;

  nq=0;

  do until (mod(nq,2)=0 or eof );

     input;

     newn+1;

     nq = nq + countc(_infile_,'"');

     put _infile_ @;

     if mod(nq,2) then do;

       missq+1;

       put '|' @;

     end;

  end;

  put;

run;

Super Contributor
Posts: 418

Re: Carriage Returns - Need to be removed

Hello again Tom. Okay so that's good news, however this code looks like it only removes line feeds. I actually need a code that will remove line feeds AND carriage returns (plus the combination). It appears people are inputting ALT-ENTER into their excel files, which the csv will read as (I believe) CR+LF.

Actually below is the copy of the Word Document version of the csv I need fixed. Basically Var2 in row 1 (key variable 1) are the words hey How, however there is an Alt Enter between them. I need sas to put those onto one line (or ast least one variable) but still created a second row after var3. So my final output should look like the dataset below it.

  ORIGINAL

key,var1,var2,var3

1,a,"hey

how",a

2,a,a,a

3,a,a,a

NEED

key,var1,var2,var3

1,a,"hey how",a

2,a,a,a

3,a,a,a

Sorry if my explanation is or was not correct. I am by no means a SAS expert and this was the first computer language I taught myself about a year ago, so still have a lot left to learn.

Please let me know if you need any more information from me! Thanks!

Super User
Super User
Posts: 6,323

Re: Carriage Returns - Need to be removed

Should work fine. You might end up with CR ('0D'X) appearing as part of the text (rather than as line breaks) as SAS reads the original file. You could convert them to spaces by adding this line right after the INPUT statement.

_INFILE_ = TRANSLATE(_INFILE_,' ' ,'0D'x);

Super Contributor
Posts: 418

Re: Carriage Returns - Need to be removed

Hurray! Thank you for that extremely detailed answer. I do have one last question to mark this a complete solution. If someone actually entered a CRLF combination into a quoted box would I be able to add the following line right after the new _infile_ statement to "fix" this occurance?  Users are sometimes copying complete values from PDF documents which span multiple lines, so that distinct combination is making the boxes with smiley faces appear within Excel. (I believe the boxes with smiley faces are CRLF, but please correct me if I am incorrect).

_INFILE_ = TRANSLATE(_INFILE_,' ' ,'0DA'x);

Thank you very much Tom!

Super User
Super User
Posts: 6,323

Re: Carriage Returns - Need to be removed

Basically yes.  You need to include both HEX digits for each character specifying the literal.  '0D0A'x

The TRANSLATE function is a little confusing.  It translates character by character. Plus the order of the arguments is backwards as the replacement characters are listed first.  So TRANSLATE(X,'  ','0D0A'x) will convert all CR to spaces and all LF to spaces.  Whether they are adjacent in the source string or not.  In general if you just want to convert the garbage into something useful that should be good enough.  If however you want to preserve the concept of having a line break embedded into the string they you might want to convert them to something other than a space. If you want to specifically convert CR+LF and not just convert all CR and all LF then you would want to use TRANWRD function instead.  But if you are reading the data from text file then I do not believe you will ever get the CR+LF into your character variables.   If you use PROC IMPORT directly from the Excel file then you might be able to have CR+LF appear in the character strings in SAS.

Super Contributor
Posts: 418

Re: Carriage Returns - Need to be removed

In general I do not want to preserve the idea of a line break within my comments. I want them to just be replaced with a space... So to confirm, if this is the case I can just put the _infile_=TRANSLATE(_infile,' ','0D0A'x);.  after the input statement.

Would i be able to remove the other portion of the code than, the portion after the put _infile_ @; statement (or would leaving it cause any distinct problems)?

Thanks again, I am marking your answer as correct!

Super User
Super User
Posts: 6,323

Re: Carriage Returns - Need to be removed

You need the rest of the code. That is the part that is gluing together the lines that SAS's normal INFILE processing is treating as separate lines.

The basic idea of the program is that when the currently constructed line has unbalanced quotes then it is an indication that the line has been split.  So by outputting the line to the new file with a trailing @ the line break is removed.

The reason that you will also need to include the TRANSLATE function call is that when SAS sees just a CR by itself it will not treat that as a new line.  So the trailing @ trick will not effect those embedded CR characters.  Then when you try to print the value of those variables in your final dataset they will cause strange results.  From my experiments when reading files SAS will always treat the LF as marking a new line.  On Windows it will also remove any CR that appear right before the LF (that is the normal end of line on Windows).  On Unix the CR before the LF will stay in the data, appearing as the last character on the line.  Any CR by itself will also stay in the data rather than being treated as a line break.  So by converting the CR to spaces before writing them out the new file they will be gone.  In my experience the LF will not appear in the _INFILE_ automatic variable, but there is no harm is trying to convert them to spaces in case I am wrong.

Super Contributor
Posts: 418

Re: Carriage Returns - Need to be removed

Thanks a bunch Tom. This problem has haunted me for the last year. I have tested the code on several production files and it does appears to working exactly the way I need it too.

Thank you for showing me that trick as well, I really like the simplicity in the Do Until loop testing for the double quotes!

Brandon

Frequent Contributor
Posts: 124

Re: Carriage Returns - Need to be removed

Hi Tom,

 

I tried to use your code to get rid of linefeed characters.  Somehow it is not working for me.

 

I have Pipe delimited csv file and it has linefeed characters.    I would appreciate if you can help me how to read pipe delimited file with Linefeed characters.

 

Thanks

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 18 replies
  • 16535 views
  • 2 likes
  • 3 in conversation