BookmarkSubscribeRSS Feed
jody4001
Calcite | Level 5
I have a large csv file. I only want to load a 10,000 rows at a time. I don’t know how exactly how large it is, but it is over 4 million rows. I played around with proc import but I can’t get that to work for a range of rows.
3 REPLIES 3
Tom
Super User Tom
Super User

You don't need to use PROC IMPORT to read a CSV file.  Those are just simple TEXT files.  You can just write your own data step to read them.  

 

You do know what is in the file right?  or is someone expecting you to GUESS what it contains?  Seems like strange request for a file with that many rows.

 

If you are forced to guess how to read it you might want to use this macro instead. https://github.com/sasutils/macros/blob/master/csv2ds.sas

 

In addition to working for some files that PROC IMPORT cannot handle it will also let you use a random sample of the data rows to use to make the guessing of how to read it faster.

 

It also writes cleaner data step code to read the file.  And you can ask it to save the generated code to a file for you.  Which will make it easier for you to use that code as your starting point for reading only a few of the rows, if you really still need to do that.

 

To read only some of the observations from a text file use the FIRSTOBS= and OBS= options on the INFILE statement.

data part1;
  infile 'myfile.csv' dsd truncover firstobs=2 obs=10001;
  ....
run;

data part2;
  infile 'myfile.csv' dsd truncover firstobs=10002 obs=20001;
  ....
run;

 

Sajid01
Meteorite | Level 14

Hello
In case you want to use only SAS then then the approach by @Tom is what one needs to follow.
However in case you have access to bash shell, the large file can be split into a number of smaller  files having 10,000 lines each.
The command would be as follows. Please do test it.

 split -l 10000 --numeric-suffixes input_filename output_prefix

You will get out put  files with the name output_prefix01, output_prefix02.....
The next step would be to write a code to read these files one file at a time.

Reeza
Super User
Why?

Proc import is slow, so if you read it in using a data step it's quite easy. 4 million isn't much for SAS to process at all.

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1662 views
  • 2 likes
  • 4 in conversation