BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RamKumar
Fluorite | Level 6

What is the best way to import the .csv effectively with less time consumption?

I've several thousands of records in my .CSV and I used to import the file as follows. If the file is large (say 200 MB) it is taking hours to complete. Can it be complete in minutes by tweaking the import process?

filename rawdata "/data/04001_Compa_12312014.csv" termstr=crlf lrecl=32760;

proc import datafile=rawdata out=rawdata1 dbms=csv replace;

          guessingrows=3000000;

run;

Thanks in advance for your inputs.

1 ACCEPTED SOLUTION

Accepted Solutions
gergely_batho
SAS Employee

After the input statement we need to put a variable name. Otherwise the created dataset (test) does not have columns.

data test;

  infile rawdata obs = 10;

  input someColName $1000.;

  put _infile_;

run;

But I think @KurtBremser  original suggestion was to use the data step generated by the PROC IMPORT: modify it by including an obs=10 to the infile statement.

View solution in original post

23 REPLIES 23
data_null__
Jade | Level 19

Just write a data step to read it.  You will need to know the field attributes.

RamKumar
Fluorite | Level 6

Even reading the .csv file is also time consumes. I used to get warning message like 'File not loaded completely' when I try to open large volume file. I don't know the field attributes without reading the file though.

Any other suggestions?

data_null__
Jade | Level 19

RamKumar wrote:

I don't know the field attributes without reading the file though.

Any other suggestions?

You're kidding right?  You have 200MB of data and you know nothing about the contents and have no documentation. 

gergely_batho
SAS Employee

- Decreasing guessingrows (if possible)

- compress=yes

How much time does it take to simply copy this file with operating system command? The speed of import depends largely on the speed of the hard drive.

RamKumar
Fluorite | Level 6

How much time does it take to simply copy this file with operating system command? It takes only 5-7 seconds to copy a file in UNIX box. I did this using cp command.

Kurt_Bremser
Super User

Forget proc import. Take the documented structure(!!!), write a data step, and it won't take much longer than a simple cp.

I read files up to > 10 GB, and I rarely drop below 50MB/sec throughput during the data step. And that's a p520, almost 10 years old.

If you get 200 MB of undocumented data, print it out on paper and use that stack to whack the idiot that delivered it over the head. Maybe that gets him(her) thinking.

OTOH, how do you technically read the file? You stated that it takes 5-7 secs for the copy. Is SAS running on the same UNIX machine, or do you read the file over the network?

RamKumar
Fluorite | Level 6

Is SAS running on the same UNIX machine, or do you read the file over the network? Yes, SAS running on UNIX server.


I've used my proc import in SAS EG. I could see a difference after I put a compress option and trimming one zero's from guessing zeros as Gergely suggests.

Tom
Super User Tom
Super User

If you are using EG where is the SAS server relative to the file?  If your SAS server is remote from the machine running EG then EG might be uploading the file across the network to the SAS server.

Much better to store the file on a disk that the SAS server can see and read it from there.

Do not use PROC IMPORT to read a CSV file.  It will have to guess at how to read the file.  If you really have no idea what is in the file then read the first few lines and look at them.  Then write your DATA step to read the file.

RamKumar
Fluorite | Level 6

If you really have no idea what is in the file then read the first few lines and look at them - how to read first few lines in SAS?

Kurt_Bremser
Super User

It is a .csv in UNIX, so head -5 filename from the commandline where you did the cp will show you the first 5 lines. This way you can make an educated guess about the record length.

Then do

data test;

infile "your_file" lrecl=your_guess obs=how_may_you_want;

length dataline $ your_guess;

input;

dataline = _infile_;

run;

This will give you a file with the complete lines to look at.

SASKiwi
PROC Star

Look at the SAS code being generated in the SAS log by CSV PROC IMPORT as I have explained above...you can use that directly to read in the file.

To read the first 10 lines:

data test;

  infile rawdata obs = 10;

  input;

  put _infile_;

run;

RamKumar
Fluorite | Level 6

can we create a dataset with first few observations instead of reading it in log?

gergely_batho
SAS Employee

After the input statement we need to put a variable name. Otherwise the created dataset (test) does not have columns.

data test;

  infile rawdata obs = 10;

  input someColName $1000.;

  put _infile_;

run;

But I think @KurtBremser  original suggestion was to use the data step generated by the PROC IMPORT: modify it by including an obs=10 to the infile statement.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 23 replies
  • 13847 views
  • 8 likes
  • 7 in conversation