BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Anotherdream
Quartz | Level 8

Hello everyone. I have come across numerous csv files that have carraige returns, and line feeds, and the combination of Carraige Returns+Line Feeds  within large sections of comment data.

Does anyone know a process that can be used to properly read in these files that does not use the popular infile "sharebuffers" trick?

I am operating in a windows environment. My current process is to open the file within excel, add a column of smiley faces to the end of the csv file, and then open cygwin to strip out all CR , LF, or CR+LF and then replace the smiley faces with CR+LF. This process works fine, however I'd like to remove excel and Cygwin completely from the equation for obvious reasons.

Thanks!

Brandon


1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

One question is how messy are the files.  If the problem is just that the text in some cells contain CR or LF (but not any CR+LF combinations) then you should be able to read the file properly with SAS by using the TERMSTR=CRLF option on the INFILE statement.

Next level is if the file does contain CR+LF embedded in the middle of a cell, but those cells are properly quoted then you might be able to fix it by pre-processing the files with SAS.  See this thread. 

View solution in original post

18 REPLIES 18
Anotherdream
Quartz | Level 8

If anyone can help with this problem it would be greatly appreciated.

The smiley face method is now causing issues because users are adding free string text to the end of their excel files, however the business would like these files to be identified so they can go in and concatenate the string variables into the final column (like they're supposed to be).

However since the files have carriage return and line feeds in them, I cannot read them using any standard fashion. This combination of problems has basically rendered the business request impossible, unless I can get some assistance.

Thank you kindly!

Brandon

Tom
Super User Tom
Super User

One question is how messy are the files.  If the problem is just that the text in some cells contain CR or LF (but not any CR+LF combinations) then you should be able to read the file properly with SAS by using the TERMSTR=CRLF option on the INFILE statement.

Next level is if the file does contain CR+LF embedded in the middle of a cell, but those cells are properly quoted then you might be able to fix it by pre-processing the files with SAS.  See this thread. 

Anotherdream
Quartz | Level 8

Hello Tom. Sadly I cannot use the TERMSTR option (this is what I did when I was taught SAS recently), and I also cannot use an infile - file option, as the software my company has does not have either of these options.

My question is basically is this possible without the use of the TERMSTR option?

Thanks!


Tom
Super User Tom
Super User

Read the answer to the thread I linked. It should work for any CSV file that Excel would write (or be able to read).

Anotherdream
Quartz | Level 8

Hiya Tom. I did read your thread and I tried the code and it did not work sadly. I am not using base SAS, but a SAS emulator, and sadly this emulator does not have the TERMSTR option or the ability to read in a csv file and then output to it using the infile , file option (Which it looks like your code did). to make sure we're on the same page, when you say the code that should work, you're refering to the one that was selected as "best answer" correct?

Thanks again!

Brandon

Tom
Super User Tom
Super User

You do not need TERMSTR, or use shared buffers to modify the original file.  Just read from the original file and create a new file that has the extra line breaks removed.


filename old 'original.csv';

filename new 'fixed.csv';


data _null_ ;

  if eof then put 'NOTE: Records read=' newn 'Records with missing quotes=' missq ;

  infile old lrecl=10000 end=eof ;

  file new lrecl=10000;

  nq=0;

  do until (mod(nq,2)=0 or eof );

     input;

     newn+1;

     nq = nq + countc(_infile_,'"');

     put _infile_ @;

     if mod(nq,2) then do;

       missq+1;

       put '|' @;

     end;

  end;

  put;

run;

Anotherdream
Quartz | Level 8

Hello again Tom. Okay so that's good news, however this code looks like it only removes line feeds. I actually need a code that will remove line feeds AND carriage returns (plus the combination). It appears people are inputting ALT-ENTER into their excel files, which the csv will read as (I believe) CR+LF.

Actually below is the copy of the Word Document version of the csv I need fixed. Basically Var2 in row 1 (key variable 1) are the words hey How, however there is an Alt Enter between them. I need sas to put those onto one line (or ast least one variable) but still created a second row after var3. So my final output should look like the dataset below it.

  ORIGINAL

key,var1,var2,var3

1,a,"hey

how",a

2,a,a,a

3,a,a,a

NEED

key,var1,var2,var3

1,a,"hey how",a

2,a,a,a

3,a,a,a

Sorry if my explanation is or was not correct. I am by no means a SAS expert and this was the first computer language I taught myself about a year ago, so still have a lot left to learn.

Please let me know if you need any more information from me! Thanks!

Tom
Super User Tom
Super User

Should work fine. You might end up with CR ('0D'X) appearing as part of the text (rather than as line breaks) as SAS reads the original file. You could convert them to spaces by adding this line right after the INPUT statement.

_INFILE_ = TRANSLATE(_INFILE_,' ' ,'0D'x);

Anotherdream
Quartz | Level 8

Hurray! Thank you for that extremely detailed answer. I do have one last question to mark this a complete solution. If someone actually entered a CRLF combination into a quoted box would I be able to add the following line right after the new _infile_ statement to "fix" this occurance?  Users are sometimes copying complete values from PDF documents which span multiple lines, so that distinct combination is making the boxes with smiley faces appear within Excel. (I believe the boxes with smiley faces are CRLF, but please correct me if I am incorrect).

_INFILE_ = TRANSLATE(_INFILE_,' ' ,'0DA'x);

Thank you very much Tom!

Tom
Super User Tom
Super User

Basically yes.  You need to include both HEX digits for each character specifying the literal.  '0D0A'x

The TRANSLATE function is a little confusing.  It translates character by character. Plus the order of the arguments is backwards as the replacement characters are listed first.  So TRANSLATE(X,'  ','0D0A'x) will convert all CR to spaces and all LF to spaces.  Whether they are adjacent in the source string or not.  In general if you just want to convert the garbage into something useful that should be good enough.  If however you want to preserve the concept of having a line break embedded into the string they you might want to convert them to something other than a space. If you want to specifically convert CR+LF and not just convert all CR and all LF then you would want to use TRANWRD function instead.  But if you are reading the data from text file then I do not believe you will ever get the CR+LF into your character variables.   If you use PROC IMPORT directly from the Excel file then you might be able to have CR+LF appear in the character strings in SAS.

Anotherdream
Quartz | Level 8

In general I do not want to preserve the idea of a line break within my comments. I want them to just be replaced with a space... So to confirm, if this is the case I can just put the _infile_=TRANSLATE(_infile,' ','0D0A'x);.  after the input statement.

Would i be able to remove the other portion of the code than, the portion after the put _infile_ @; statement (or would leaving it cause any distinct problems)?

Thanks again, I am marking your answer as correct!

Tom
Super User Tom
Super User

You need the rest of the code. That is the part that is gluing together the lines that SAS's normal INFILE processing is treating as separate lines.

The basic idea of the program is that when the currently constructed line has unbalanced quotes then it is an indication that the line has been split.  So by outputting the line to the new file with a trailing @ the line break is removed.

The reason that you will also need to include the TRANSLATE function call is that when SAS sees just a CR by itself it will not treat that as a new line.  So the trailing @ trick will not effect those embedded CR characters.  Then when you try to print the value of those variables in your final dataset they will cause strange results.  From my experiments when reading files SAS will always treat the LF as marking a new line.  On Windows it will also remove any CR that appear right before the LF (that is the normal end of line on Windows).  On Unix the CR before the LF will stay in the data, appearing as the last character on the line.  Any CR by itself will also stay in the data rather than being treated as a line break.  So by converting the CR to spaces before writing them out the new file they will be gone.  In my experience the LF will not appear in the _INFILE_ automatic variable, but there is no harm is trying to convert them to spaces in case I am wrong.

Anotherdream
Quartz | Level 8

Thanks a bunch Tom. This problem has haunted me for the last year. I have tested the code on several production files and it does appears to working exactly the way I need it too.

Thank you for showing me that trick as well, I really like the simplicity in the Do Until loop testing for the double quotes!

Brandon

pp2014
Fluorite | Level 6

Hi Tom,

 

I tried to use your code to get rid of linefeed characters.  Somehow it is not working for me.

 

I have Pipe delimited csv file and it has linefeed characters.    I would appreciate if you can help me how to read pipe delimited file with Linefeed characters.

 

Thanks

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 36234 views
  • 7 likes
  • 3 in conversation