Help using Base SAS procedures

Effective way to import huge CSV file

Accepted Solution Solved
Reply
Regular Contributor
Posts: 168
Accepted Solution

Effective way to import huge CSV file

What is the best way to import the .csv effectively with less time consumption?

I've several thousands of records in my .CSV and I used to import the file as follows. If the file is large (say 200 MB) it is taking hours to complete. Can it be complete in minutes by tweaking the import process?

filename rawdata "/data/04001_Compa_12312014.csv" termstr=crlf lrecl=32760;

proc import datafile=rawdata out=rawdata1 dbms=csv replace;

          guessingrows=3000000;

run;

Thanks in advance for your inputs.


Accepted Solutions
Solution
‎03-05-2015 02:40 AM
SAS Employee
Posts: 340

Re: Effective way to import huge CSV file

Posted in reply to KurtBremser

After the input statement we need to put a variable name. Otherwise the created dataset (test) does not have columns.

data test;

  infile rawdata obs = 10;

  input someColName $1000.;

  put _infile_;

run;

But I think @KurtBremser  original suggestion was to use the data step generated by the PROC IMPORT: modify it by including an obs=10 to the infile statement.

View solution in original post


All Replies
Respected Advisor
Posts: 3,799

Re: Effective way to import huge CSV file

Just write a data step to read it.  You will need to know the field attributes.

Regular Contributor
Posts: 168

Re: Effective way to import huge CSV file

Posted in reply to data_null__

Even reading the .csv file is also time consumes. I used to get warning message like 'File not loaded completely' when I try to open large volume file. I don't know the field attributes without reading the file though.

Any other suggestions?

Respected Advisor
Posts: 3,799

Re: Effective way to import huge CSV file

RamKumar wrote:

I don't know the field attributes without reading the file though.

Any other suggestions?

You're kidding right?  You have 200MB of data and you know nothing about the contents and have no documentation. 

SAS Employee
Posts: 340

Re: Effective way to import huge CSV file

- Decreasing guessingrows (if possible)

- compress=yes

How much time does it take to simply copy this file with operating system command? The speed of import depends largely on the speed of the hard drive.

Regular Contributor
Posts: 168

Re: Effective way to import huge CSV file

Posted in reply to gergely_batho

How much time does it take to simply copy this file with operating system command? It takes only 5-7 seconds to copy a file in UNIX box. I did this using cp command.

Super User
Posts: 7,782

Re: Effective way to import huge CSV file

Forget proc import. Take the documented structure(!!!), write a data step, and it won't take much longer than a simple cp.

I read files up to > 10 GB, and I rarely drop below 50MB/sec throughput during the data step. And that's a p520, almost 10 years old.

If you get 200 MB of undocumented data, print it out on paper and use that stack to whack the idiot that delivered it over the head. Maybe that gets him(her) thinking.

OTOH, how do you technically read the file? You stated that it takes 5-7 secs for the copy. Is SAS running on the same UNIX machine, or do you read the file over the network?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Regular Contributor
Posts: 168

Re: Effective way to import huge CSV file

Posted in reply to KurtBremser

Is SAS running on the same UNIX machine, or do you read the file over the network? Yes, SAS running on UNIX server.


I've used my proc import in SAS EG. I could see a difference after I put a compress option and trimming one zero's from guessing zeros as Gergely suggests.

Super User
Super User
Posts: 7,042

Re: Effective way to import huge CSV file

If you are using EG where is the SAS server relative to the file?  If your SAS server is remote from the machine running EG then EG might be uploading the file across the network to the SAS server.

Much better to store the file on a disk that the SAS server can see and read it from there.

Do not use PROC IMPORT to read a CSV file.  It will have to guess at how to read the file.  If you really have no idea what is in the file then read the first few lines and look at them.  Then write your DATA step to read the file.

Regular Contributor
Posts: 168

Re: Effective way to import huge CSV file

If you really have no idea what is in the file then read the first few lines and look at them - how to read first few lines in SAS?

Super User
Posts: 7,782

Re: Effective way to import huge CSV file

It is a .csv in UNIX, so head -5 filename from the commandline where you did the cp will show you the first 5 lines. This way you can make an educated guess about the record length.

Then do

data test;

infile "your_file" lrecl=your_guess obs=how_may_you_want;

length dataline $ your_guess;

input;

dataline = _infile_;

run;

This will give you a file with the complete lines to look at.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 3,252

Re: Effective way to import huge CSV file

Look at the SAS code being generated in the SAS log by CSV PROC IMPORT as I have explained above...you can use that directly to read in the file.

To read the first 10 lines:

data test;

  infile rawdata obs = 10;

  input;

  put _infile_;

run;

Regular Contributor
Posts: 168

Re: Effective way to import huge CSV file

can we create a dataset with first few observations instead of reading it in log?

Super User
Posts: 7,782

Re: Effective way to import huge CSV file

Look at my answer; the obs= option in the infile statement does the trick.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Solution
‎03-05-2015 02:40 AM
SAS Employee
Posts: 340

Re: Effective way to import huge CSV file

Posted in reply to KurtBremser

After the input statement we need to put a variable name. Otherwise the created dataset (test) does not have columns.

data test;

  infile rawdata obs = 10;

  input someColName $1000.;

  put _infile_;

run;

But I think @KurtBremser  original suggestion was to use the data step generated by the PROC IMPORT: modify it by including an obs=10 to the infile statement.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 23 replies
  • 4075 views
  • 8 likes
  • 7 in conversation