Hi,
I have a CSV file with line feeds ('0A'x) appearing occasionally (I think someone typed in "hard" carriage returns during data entry). The lines correctly end with a carriage return line feed (CRLF '0D'x '0A'x) pair. I want to remove the "stand alone" line feeds, and keep the carriage return line feed pairs intact.
I have tried using the code described here (reproduced below):
https://support.sas.com/kb/26/065.html
This works, but some of the line feeds are missed, as they are not all within double quotation marks. Can someone help, please?
/* Create sample .csv file containing CR/LF bytes. '0D'x is the */
/* hexadecimal representation of CR and '0A'x is the hexadecimal */
/* representation of LF. */
data _null_;
file "c:\sample.csv";
/* Code the PUT statement so the hex values will be inside the */
/* the double quotes in the output file SAMPLE.CSV. */
put '"field1","field2","field3'
'0D'x
'",'
'"field4","field5","field6'
'0A'x
'",'
'"field7","field8","field9"'
;
run;
/* Read in the test file created above. */
data out;
/* The DSD option assumes the delimiter is a comma. You can override */
/* the delimiter with the DLM= option if needed. */
infile "c:\sample.csv" dsd truncover;
length var1 - var9 $15;
input var1 - var9 $;
run;
proc print;
title 'File is read incorrectly due to embedded CR/LF';
run;
/************************** CAUTION ***************************/
/* */
/* This program UPDATES IN PLACE, create a backup copy before */
/* running. */
/* */
/************************** CAUTION ***************************/
/* Replace carriage return and linefeed characters inside */
/* double quotes with a specified character. This sample */
/* uses '@' and '$', but any character can be used, including */
/* spaces. CR/LFs not in double quotes will not be replaced. */
%let repA='@'; /* replacement character LF */
%let repD='$'; /* replacement character CR */
%let dsnnme="c:\sample.csv"; /* use full path of CSV file */
data _null_;
/* RECFM=N reads the file in binary format. The file consists */
/* of a stream of bytes with no record boundaries. SHAREBUFFERS */
/* specifies that the FILE statement and the INFILE statement */
/* share the same buffer. */
infile &dsnnme recfm=n sharebuffers;
file &dsnnme recfm=n;
/* OPEN is a flag variable used to determine if the CR/LF is within */
/* double quotes or not. Retain this value. */
retain open 0;
input a $char1.;
/* If the character is a double quote, set OPEN to its opposite value. */
if a = '"' then open = ^(open);
/* If the CR or LF is after an open double quote, replace the byte with */
/* the appropriate value. */
if open then do;
if a = '0D'x then put &repD;
else if a = '0A'x then put &repA;
end;
run;
/* Read in new version of file to check for success. */
data outp;
infile "c:\sample.csv" dsd dlm=',' truncover;
length var1 - var9 $ 15;
input var1 - var9 $ ;
run;
proc print;
title1 'File is read correctly after transformation ';
title2 "Look for printable characters &repa &repd in variable values ";
run;
/* OPTIONAL -- Delete external file */
data _null_;
fname="tempfile";
rc=filename(fname,"c:\sample.csv");
if rc = 0 and fexist(fname) then rc=fdelete(fname);
rc=filename(fname);
run;
Since you are working in Windows you might try the Infile option TRMSTR to specify that you need a combination of CR and LF to terminate a line.
data out;
/* The DSD option assumes the delimiter is a comma. You can override */
/* the delimiter with the DLM= option if needed. */
infile "c:\sample.csv" dsd truncover TRMSTR=CRLF;
length var1 - var9 $15;
input var1 - var9 $;
run;
Your statement
'File is read incorrectly due to embedded CR/LF';
Is not quite correct. CR/LF is actualy TWO characters, you are typically encountering only the LF, a single character.
Since you are working in Windows you might try the Infile option TRMSTR to specify that you need a combination of CR and LF to terminate a line.
data out;
/* The DSD option assumes the delimiter is a comma. You can override */
/* the delimiter with the DLM= option if needed. */
infile "c:\sample.csv" dsd truncover TRMSTR=CRLF;
length var1 - var9 $15;
input var1 - var9 $;
run;
Your statement
'File is read incorrectly due to embedded CR/LF';
Is not quite correct. CR/LF is actualy TWO characters, you are typically encountering only the LF, a single character.
Further to your advice, I used the following:
TERMSTR=CRLF
It looks like it has worked!
Thanks for the swift reply.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.