10-07-2014 06:10 AM
I am importing a million sample observations from a CSV file into my local SAS dataset. But its taking more time to import these observations. Please suggest tun options to improve performance to IMPORT PROC?
10-07-2014 07:31 AM
When I ran the IMPORT PROC on 1000 observations in a CSV file, the import was successful. But when I ran the same IMPORT PROC on million observations the import process status is shown as running. I waited for 15 min and cancelled the running process.
02-11-2015 06:47 AM
If 1000 obs are done in, say, 5 seconds, you may expect a million obs to need 5000 seconds, which means 1 1/2 hours. So I suggest to work up in orders of magnitude (10.000, then 100.000) to get a feeling for the comparable performance.
You also need to take a look at the record size. 100 million records with a single number will need less than 1 million records with 500 bytes each.
10-07-2014 08:01 AM
Thanks for your response. I will try stuff as you mentioned above.
I work on SAS University Edition and both SAS Session and CSV file is on local machine.
02-11-2015 05:25 AM
Try modifying performance parameters inside config file, usually located at location: D:\SAS\SASFoundation\9.4\nls\en\sasv9.cfg
Parameters like Memsize, sortsize, bufsize, bufno. Use help for such options in BASE SAS by keyword "performance".
It's experiment and after some try, you will reach to faster import. By the way, what is size of excel file and usual file of dataset which gets imported data?
02-11-2015 06:08 AM
Personally I would prefer to import CSV data via a datastep with infile statement. Many examples can be found on the forum, here is one: https://communities.sas.com/thread/60374
This gives you more flexibility with the import - proc import basically interprets your file, then generates internally a datastep to read it, so going straight in and programming it yourself avoids the guessing phase. You would also know your data better so can set lengths, formats more appropriately.
Also, why are you dealing with data of a million rows, but using the free university edition? I assumed the free version was to learn SAS so no point in huge data that a learner SAS programmer would generally not have to deal with or learn optimization techinques for.