BookmarkSubscribeRSS Feed
mcook
Quartz | Level 8

In a previous post I inquired about code to remove carriage returns from CSV files, rather than using the CTRL+J replacement trick.  

(https://communities.sas.com/t5/SAS-Programming/Import-CSV-with-Carriage-Returns/m-p/726808#M225905).

 

I found some code on the sas support site, here : https://support.sas.com/kb/26/065.html

 

A big chunk of that code is creating a dummy table and then checking that the code works. The pertinent part of the code is this:

/************************** CAUTION ***************************/
/*                                                            */
/* This program UPDATES IN PLACE, create a backup copy before */
/* running.                                                   */
/*                                                            */
/************************** CAUTION ***************************/
/* Replace carriage return and linefeed characters inside     */
/* double quotes with a specified character.  This sample     */
/* uses '@' and '$', but any character can be used, including */
/* spaces.  CR/LFs not in double quotes will not be replaced. */
%let repA='@';                    /* replacement character LF */
%let repD='$';                    /* replacement character CR */
%let dsnnme="c:\sample.csv";      /* use full path of CSV file */

data _null_;
	/* RECFM=N reads the file in binary format. The file consists    */
	/* of a stream of bytes with no record boundaries.  SHAREBUFFERS */
	/* specifies that the FILE statement and the INFILE statement    */
	/* share the same buffer.                                        */
	infile &dsnnme recfm=n sharebuffers;
	file &dsnnme recfm=n;

	/* OPEN is a flag variable used to determine if the CR/LF is within */
	/* double quotes or not.  Retain this value.                        */
	retain open 0;
	input a $char1.;

	/* If the character is a double quote, set OPEN to its opposite value. */
	if a = '"' then
		open = ^(open);

	/* If the CR or LF is after an open double quote, replace the byte with */
	/* the appropriate value.                                               */
	if open then
		do;
			if a = '0D'x then
				put &repD;
			else if a = '0A'x then
				put &repA;
		end;
run;

this code works. but doing a compare between identical  tables, the only change being that one had the carriage returns removed with CTRL+J, and the other had this code run on it, shows that formats are being changed.  

I pasted a few screenshots below, of the compare output.   The Base Table (Test_J) had the CTRL+J done,  the compare table (Test_C) had the SAS code run on it.  

Notice the change in formats, and the changing of the case from FALSE to False.  

 

What in the code is causing these changes?   

Compare2.PNGCompare3.PNGCompare1.PNG

 

 

 

 

4 REPLIES 4
ballardw
Super User

Since the shown code does not create a SAS data set you are hiding an important step.

 

I am going to guess that you are comparing two data set made by using Proc Import (or a wizard) to read the "before" and "after" text file.

Proc IMPORT guesses properties separately for every single file "read". So if you remove a character or two it is very likely that the result will be shorter because the missing characters were not there to influence the length "guess" for that column.

 

Which is one reason why production work should not rely on import. Or you spend a lot of time validating and fixing data when using multiple files that should have the same structure.

mcook
Quartz | Level 8
Yes I am using proc import after removing the carriage returns.

I understand that removing or replacing characters would change the length, and format upon import.
but the change in the Date, and uppercase formats, have me stumped. There should have been no carriage returns to be replaced in either of those columns, So i would assume they would import identically with both tables.

Reeza
Super User
PROC IMPORT is a guessing proc. I would expect the fields run on the same file to generate the same results again, but I wouldn't trust that with a different file even if you can assume the structure is the exact same. Generally, you cannot rely on your types and formats unless you explicitly specify them and this is true of all data imports.

ballardw
Super User

@mcook wrote:
Yes I am using proc import after removing the carriage returns.

I understand that removing or replacing characters would change the length, and format upon import.
but the change in the Date, and uppercase formats, have me stumped. There should have been no carriage returns to be replaced in either of those columns, So i would assume they would import identically with both tables.


Really would need to see the files, before and after, as well as the Proc Import code and anything else done to the data sets before running Proc Compare.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 562 views
  • 2 likes
  • 3 in conversation