BookmarkSubscribeRSS Feed
BozJ3
Fluorite | Level 6

Hello, 

 

I am having an issue with the INFILE section in the Data step using Base SAS 9.4 by way of Enterprise Guide. The file that I am importing in question has 1523 records (not including column names on first row), but the statement is only pulling in 144 from the csv. When I look at the dataset that the code has produced, there exists a field where the quotation marks that surround each field in a csv pop out for whatever reason. When I check the csv, there is an odd character (an arrow) that is in the field where the issue starts.

 

Now, I wouldn't mind tossing the record itself later in the program, but the file completely stops reading at that claim. I am confident that the informants and lengths and such are correct because I used the SAS EG Import option to do the heavy lifting on that end. When I read the logs, the following is found, suggesting that there was no error (I am using dummy names to obscure potentially sensitive information):

 

NOTE: The infile "location\file.csv" is:

Filename=location,

RECFM=V,LRECL=32767,File Size (bytes)=1734349,

Last Modified=16May2018:05:59:01,

Create Time=16May2018:05:41:48

NOTE: 144 records were read from the infile "location\file.csv".

The minimum record length was 876.

The maximum record length was 1491.

NOTE: The data set WORK.DATASET has 144 observations and 154 variables.

NOTE: DATA statement used (Total process time):

real time 0.42 seconds

cpu time 0.07 seconds

 

Here is a truncated version of the Data step code that I am using. X is the field preceding the problem field, and Y is the problem field. I will note that field x is not populated anywhere in the document, so there should be nothing in field X to cause field Y to break.

 

data work.dataset;

length

X $ 1

Y $ 1492

;

label

X = "X"

Y = "Y"

;

format

X $CHAR1.

Y $CHAR1492.

;

informat

X $CHAR1.

Y $CHAR1492.

;

INFILE "location\file.csv"

firstobs=2

MISSOVER

DSD

;

input

X : $CHAR1.

Y : $CHAR1492.

;

run;

 

So, to summarize my question: does anyone know why the csv may be breaking, and what can I do to get all of the records in, even if some of them are broken?

 

 

4 REPLIES 4
ballardw
Super User

You have likely already identified the problem:

When I check the csv, there is an odd character (an arrow) that is in the field where the issue starts.

You don't say how you examined the CSV but unusual characters either from foreign language encoding or entry may be getting treated as some special character by your operating system, such as a ctrl-z may be treated as an "end of file".

 

 

If there is only once of these you might try deleting the odd character using a plain text editor such as Notepad or similar.

BozJ3
Fluorite | Level 6

I appreciate the reply.

 

I deleted the character in the file, and everything loaded correctly. However, this is a program that is going to be used on new data per a monthly basis, so is there anything that can be done on the programming side to avoid the issue? I would like to avoid manually deleting a few characters every time that new data is pulled.

Reeza
Super User

1. Figure out why character is in file - is it encoding, different language, etc?

2. Ask data provider to modify data - this does actually work if the file has inconsistencies, such as this.

3. Use SAS or PowerShell or any language to read in the file, find and replace the character. There's examples of this on here, usually from Tom :). Then you can use the code above to read it. 

 

 

ballardw
Super User

@BozJ3 wrote:

I appreciate the reply.

 

I deleted the character in the file, and everything loaded correctly. However, this is a program that is going to be used on new data per a monthly basis, so is there anything that can be done on the programming side to avoid the issue? I would like to avoid manually deleting a few characters every time that new data is pulled.


If the data field corresponds to something that users enter free-form text into, such as a comment box, case notes or similar this may be an issue that doesn't occur often. It may be that letting the source of the file know that "unexpected" characters are appearing in one or more fields may help them adjust the creation of the extract to exclude such or filter at an earlier stage of processing.

 

If the field is the result of some operation then it may help to file provider clean up a process.

 

And sometimes files get corrupted from any number of causes. You might verify file size, number of records and bytes both with the file provider to see if what you see is what was intended.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 4028 views
  • 0 likes
  • 3 in conversation