Solved: Effective way to import huge CSV file - Page 2

AndersS · Posted 03-06-2015 10:44 AM

Hi! The problem, as I understand it, is to read the contents of a .csv file, into a SAS Data set, with several columns. This solution only has one(1) column. This is in my opinion NOT a correct solution!
/ Br Anders Sköllermo

Anders Sköllermo (Skollermo in English)

gergely_batho · Posted 03-06-2015 11:09 AM

You are right. The correct solution comes from all the answers here. Let's summarize:

- if file is really large, and you don't want to wait for PROC IMPORT to read it: take only the first N rows from the file. You can do it

with UNIX head -N

or by writing a data step with infile+file+input+put statements. Include obs=N on the infile statement.

- run PROC IMPORT once, possibly with a low guessingrows= parameter.

- copy the data step code generated by PROC IMPORT

- you can modify the data step by including firstobs=2 and perhaps obs= (if you want to limit number of obs to read (e.g. for testing purposes))

- add truncover or missover if needed

- you might need to modify the generated input statement if the guess of PROC IMPORT was wrong

- compress=yes

Now your data step is ready to run with obs=max (to import all the observations).

Message was edited by: Gergely Bathó

Tom · Posted 03-05-2015 07:58 AM

You can read it into variables using dummy variable names.

For example to read first 20 columns for the first 20 rows.

Use FIRSTOBS=2 to skip the header line.

data sample;

infile 'myfile.csv' dsd truncover lrecl=300000 firstobs=2 obs=20 ;

length x1-x20 $200 ;

input x1-x20 ;

run;

data_null__ · Posted 03-05-2015 08:27 AM

Isn't TRUNCOVER for FORMATTED and/or COLUMN?

Since delimited input is a form of LIST input would there ever be a situation where TRUNCOVER would be needed. MISSOVER seems more appropriate.

I know it serves the same role here but it seems wrong.

Tom · Posted 03-05-2015 08:32 AM

In my opinion MISSOVER is deprecated and should never be used unless you really do want to throw away short values.

data_null__ · Posted 03-05-2015 09:00 AM

The point is there is no such thing as "short values" when reading with LIST input.

Tom · Posted 03-05-2015 09:33 AM

I agree that the effect is the same for this program. However since it is an INFILE option and not an INPUT option there can be a disconnect.

Since the MISSOVER option is so wrongly named and potentially harmful for other types of INPUT statements I think it is important that its use be discouraged.

SASKiwi · Posted 03-04-2015 01:34 PM

In my experience using guessingrows with a large value can drastically slow down the import of CSV files.

Behind PROC IMPORT SAS builds DATA step code when reading CSVs. You can see this in the SAS log. Just copy this DATA step code into your SAS editor and run it. You will be amazed how much quicker it is.

AndersS · Posted 03-06-2015 10:58 AM

Hi! I used to import data from Excel-sheets some years ago.
The data were characters (text) and characters(numbers, to be interpreted as numbers in SAS) and numbers (with period or comma as decimal sign).
My solution: Read a data line into a long character variable CV (32000 bytes).
Then read CV from the left and decide what is alphabetic text, what is numbers, etc. - and all the details.
In my problems I had a column structure from Excel, like: CustomerId (number) CustomerName (text) NumericValue (period or comma, perhaps E-format) next NumericValue, etc

Basically you have to use this structure to program the reading of the parts of the string.
We used this in production. 5000-10000 lines in Excel sheets, very mixed style.
Send me an email to anders.skollermo@one.se plus more info. I will solve it. / Br Anders Sköllermo

Anders Sköllermo (Skollermo in English)

Re: Effective way to import huge CSV file

Re: Effective way to import huge CSV file

Re: Effective way to import huge CSV file

Re: Effective way to import huge CSV file

Re: Effective way to import huge CSV file

Re: Effective way to import huge CSV file

Re: Effective way to import huge CSV file

Re: Effective way to import huge CSV file

Re: Effective way to import huge CSV file

Classroom Training Available!