BookmarkSubscribeRSS Feed
plllu
Calcite | Level 5

I used the following line of codes to import a large CSV files ( > 10 million of rows):

 

proc import datafile = "\path\file.csv" dmbs=csv out=file replace; run;

 

In the log I received: 

 

Note: "Invalid data for ID in line 10158465 1-24. ID =. TYPE = .

Error: "Import unsuccessful."

 

The first 13,402,830 rows were imported successfully. Based on the log, my understanding is the next few rows in the file are empty. Since the file is too large to open in Notepad or Excel, I can't verify if there are additional data beyond the empty rows.

 

How can I ensure that all rows, including any empty rows or rows that may follow, are imported properly?

 

4 REPLIES 4
Tom
Super User Tom
Super User

Empty lines do not normally cause errors like that.  Instead it would just cause the variables to all have missing values.  

 

There is no need to open the file in Excel or an Editor.  You can just use a data step to read the file and see if your hypothesis about the empty lines is correct.  For example you could use the FIRSTOBS= and OBS= options on the INFILE statement to look around near the line you mentioned.

data _null_;
  infile "\path\file.csv" firstobs=10158464 obs=10158470;
  input;
  list;
run;

 

I would never use PROC IMPORT to read such a large file.

You might use it to help you GUESS how to read the file, but once you know what the variables are (and what type and storage length they need) just write the DATA step to read the file.

 

 

Kurt_Bremser
Super User

Do not use PROC IMPORT if you don't have to. Write the DATA step to read the file yourself, according to the documentation of the file. That way you do not have to fix all the mistakes caused by the guessing of PROC IMPORT.

Kurt_Bremser
Super User

Looks like IMPORT incorrectly guessed ID to be numeric. Given that this column seems to be 24 characters long, this would on its own cause problems as SAS can only store numbers correctly up to 15 decimal digits.

ID values should always be read as character.

Tom
Super User Tom
Super User

Note: "Invalid data for ID in line 10158465 1-24. ID =. TYPE = .

Does the CSV file really only have two columns?  Definitely do NOT use PROC IMPORT in that case.

data want;
  infile "\path\file.csv" dsd truncover firstobs=2;
  length id $30 type $20 ;
  input id type;
run;

And if it does have more columns the data step code is not really any harder.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 184 views
  • 6 likes
  • 3 in conversation