BookmarkSubscribeRSS Feed
mcook
Quartz | Level 8

I am aware that there are many other posts about this, but those solutions do not appear to work for me. 

 

I am trying to import csv's that have carriage returns in them. up to this point we have been manually removing them via the CTRL+H,  CTRL+J replace.   The tables I am working with are confidential so i cannot post their output.  But i have created a small table that, while not misbehaving in exactly the same ways, hopefully will give me some insight as to what is going on.  

 

I have this table.

CSV_Test.PNG

Notice the CR's in B3 and A6.  

 

Running this code, with TERMSTR=CRLF

filename CSV "&FILEPATH.\FileName.csv" termstr=CRLF;

proc import
	datafile=CSV
	out=NOCR
	replace
	dbms=csv;
	getnames=no;
run;

This outputs 

CSV_CRLF.PNG

Notice that the word after the carriage return from B3 (Smith) and A6 (Williams) are gone.  I wanted the cells to read "Allen Smith" and "Brayden Williams" , not remove the 2nd word.  

 

Then Running the same code except with TERMSTR=CR

filename CSV "&FILEPATH.\FileName.csv" termstr=CR;

proc import
	datafile=CSV
	out=NOCR
	replace
	dbms=csv;
	getnames=no;
run;

It outputs the following.  

CSV_CR.PNG

Notice the missing FName data, and again, the missing "Smith" after "Allen".  (the smith below in row4 is from the original csv, I should have named it differently to avoid confusion), and notice the addition of blank row 8.  the previous picture did not show it, but row 7 was the last row when TERMSTR=CRLF,  there was no row 8.  

 

Any hints or tips as to what is going on?  

13 REPLIES 13
Shmuel
Garnet | Level 18

I have created my own test file with one row and used '13'x as ENTER character.

check next code and adapt to your input file.

I suggest not to use proc import but write your own data step to read the csv file.

/* 1 - creating test csv file */
filename mytest '/folders/myfolders/flat/test.csv';
data _null_;
   fname = 'barry';
   lname = 'Allen'||'13'x||'Smith';
   mi = 'Q';
   line = catx(',',fname,lname,mi);
   file mytest;
   put line;
run;
/* importing the csv file by a data step */
data test;
   length fname lname $20;
   infile mytest dlm=',' termstr=CRLF truncover;
   input fname $ lname $ mi $;
   lname = translate(lname,'-','13'x);
    put lname=;
run;
Tom
Super User Tom
Super User

Looks to me like your PROC IMPORT worked.  If it had really seen the linebreaks as marking the end of the line your photograph of the resulting SAS datasets would look different because the inserted linebreaks would have caused the columns to be out of order.

Check what is actually in the dataset instead of taking a photograph of how the GUI displays it on the screen.

 

Try using code like this to replace any carriage return or line feed characters in your character fields with spaces.

data nocr;
  set nocr ;
  array _c _character_;
  do over _c;
     _c=translate(_c,'  ','0D0A'x);
  end;
run;
mcook
Quartz | Level 8

Ahhh. I see, that does solve that problem. The data was there, just not being displayed.

Unfortunately my hope that solving that problem would shed light on my original problem was dashed.

 

I have a table with roughly 110 columns and 195  obs.   when i run the TERMSTR=CRLF code

filename CSV "&FilePath.\FileName.csv" termstr=CRLF;

proc import
	datafile=CSV
	out=NOCR
	replace
	dbms=csv;
	getnames=no;
run;

I get this in the log

 

Errors detected in submitted DATA step. Examine log.
9 rows created in WORK.NOCR from CSV.

ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.

though output table NOCR is created.   but instead of 110 columns and 195 obs, it has 7316 columns and 9 obs.  

like it read through 7316 entries before it reached its first End of Line Character.  And at least some (but possibly not all) carriage returns that i wanted to remove are still in there and causing an end of line at each one.  

 

I'm completely befuddled.

Shmuel
Garnet | Level 18

Can you post the generated code by proc import from the log?

As you have a lot of variables it may be convenient to adapt the generated code

and use it as  a data step with infile statement.

mcook
Quartz | Level 8
That would require a different adaptation for each CSV file correct? I am trying to figure a way to import many CSV files without having to manually attend to the Carriage returns in each.
Tom
Super User Tom
Super User

@mcook wrote:
That would require a different adaptation for each CSV file correct? I am trying to figure a way to import many CSV files without having to manually attend to the Carriage returns in each.

No.  It is that SAS requires that the CSV is capable of being parsed.  It does not natively know how to handle fields that have embedded end of line characters in them.  So since SAS does not seem to working very hard on fixing this it is up to each programmer to make sure the files are structured in a way that SAS can read them.

 

Here is a link to one of the many solutions to this issue on this Forum:

https://communities.sas.com/t5/New-SAS-User/Reading-CSV-file-with-double-quotes-line-breaks-and-spac...

 

You can run the fix on each CSV file you want to read if you are uncertain whether the source is providing compatible CSV files since it does not harm files that don't have the issue.

Shmuel
Garnet | Level 18

@mcook wrote:
That would require a different adaptation for each CSV file correct? I am trying to figure a way to import many CSV files without having to manually attend to the Carriage returns in each.

Alternative code may be used with any csv file containing breaklines:

/* 1 - creating test csv file */
filename mytest '/folders/myfolders/flat/test.csv';
data _null_;
   fname = 'barry';
   lname = 'Allen'||'13'x||'Smith';
   mi = 'Q';
   line = catx(',',fname,lname,mi);
   file mytest;
   put line;
run;

/* 2 - remove breaklines */ filename newtest '/folders/myfolders/flat/testnw.csv'; data _NULL_; infile mytest; file newtest; input ; infile = translate(_infile_,' ','13'x); put infile; run; /* 3 - importing the csv file by a data step */ data test; length fname lname $20; infile newtest dlm=',' termstr=CRLF truncover; input fname $ lname $ mi $; put lname=; run;
Tom
Super User Tom
Super User

You told it that lines are marked with CR+LF pair.  It found only 9 (or possibly only eight) so there are only 9 observations.

 

If the problem is that your embedded text is not distinct from the actual end of line markers then you will be forced to first pre-process the text file to fix that before SAS can properly parse the file.

 

This is frequently asked question on this site, search for answers.  With data from Excel the values with embedded end of line characters should be enclosed in quotes.  So you should be able to fix it by counting the number of quote characters you have seen and removing any end of line characters seen when the number of quotes is odd.

mcook
Quartz | Level 8

If I set TERMSTR=LF and just accept Line Feeds as End of line characters.  It gives me a nearly identical table as the one i get when i just import the csv without regard to any EOL characters or CTRLH CTRLJ replacements.  

 

 

 

running a proc compare with outbase outdif outcomp outnoequal options.  The only differences are that in 10 cells, the space after the last character in the cell is different, though visibly the same. So i assume a CR, LF or CRLF that was there was altered somehow?  

 

filename CSV "&FilePath.\FileName.csv" termstr=LF;

proc import
	datafile=CSV
	out=NOCR
	replace
	dbms=csv;
	getnames=no;
run;

proc import datafile="&FilePath.\FileName.csv."
	out=NOCR_2
	replace
	dbms=csv;
	getnames=no;
run;

proc compare base=NOCR compare=NOCR_2 out=NOCR_Comp outbase outdif outcomp outnoequal; run;

 

And am I correct that this seems to show that the CSV files I am working with have an LF as the end of line character, not the CRLF.  Since termstr= LF correctly reads most Rows, except those with embedded carriage returns?

 

 

mcook
Quartz | Level 8

I was finally able to track down some code on the sas website, to remove carriage returns from CSV prior to importing. 

 

https://support.sas.com/kb/26/065.html

 

/************************** CAUTION ***************************/
/*                                                            */
/* This program UPDATES IN PLACE, create a backup copy before */
/* running.                                                   */
/*                                                            */
/************************** CAUTION ***************************/
/* Replace carriage return and linefeed characters inside     */
/* double quotes with a specified character.  This sample     */
/* uses '@' and '$', but any character can be used, including */
/* spaces.  CR/LFs not in double quotes will not be replaced. */

%let repA='@';                    /* replacement character LF */
%let repD='$';                    /* replacement character CR */
%let dsnnme="c:\sample.csv";      /* use full path of CSV file */

data _null_;
	/* RECFM=N reads the file in binary format. The file consists    */
	/* of a stream of bytes with no record boundaries.  SHAREBUFFERS */
	/* specifies that the FILE statement and the INFILE statement    */
	/* share the same buffer.                                        */
	infile &dsnnme recfm=n sharebuffers;
	file &dsnnme recfm=n;

	/* OPEN is a flag variable used to determine if the CR/LF is within */
	/* double quotes or not.  Retain this value.                        */
	retain open 0;
	input a $char1.;

	/* If the character is a double quote, set OPEN to its opposite value. */
	if a = '"' then
		open = ^(open);

	/* If the CR or LF is after an open double quote, replace the byte with */
	/* the appropriate value.                                               */
	if open then
		do;
			if a = '0D'x then
				put &repD;
			else if a = '0A'x then
				put &repA;
		end;
run;
Tom
Super User Tom
Super User

Make sure to make a backup of the original CSV file before trying to use that code.  It is a using the sharebuffers option so it is modifying the original file.

mcook
Quartz | Level 8
Any quick tips on how to create a copy of a file, and then run this code on the copy?

are proc copy, and fcopy the only options?
Tom
Super User Tom
Super User

And am I correct that this seems to show that the CSV files I am working with have an LF as the end of line character, not the CRLF.  Since termstr= LF correctly reads most Rows, except those with embedded carriage returns?

Not quite. 

 

If using TERMSTR=LF does not read the right number of rows/observations then there are embedded LINE FEED characters (LF or '0A'x) in some of the field values.  Any line that just had an embedded carriage return would be read fine (unless the CR or '0D'x appeared in a field that was being read as a number) and you would have read the right number of observations.  It might be that CR +LF pairs are embedded, in which case the LF will cause a split line and the CR will become part of the last value read from the first of the resulting two lines.

 

Another thing can happen if you use TERMSTR=LF on a file that is using carriage return plus line feed as the end of line markers is that the carriage return character (CR or '0D'x) will become part of the value for the last field in the line.  If that field is read as character then the '0D'x will be included in the value.  If it is being read as a number then the '0D'x character will cause an error and result in a missing value for the numeric variable.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 8481 views
  • 1 like
  • 3 in conversation