Hi! The problem, as I understand it, is to read the contents of a .csv file, into a SAS Data set, with several columns. This solution only has one(1) column. This is in my opinion NOT a correct solution!
/ Br Anders Sköllermo
You are right. The correct solution comes from all the answers here. Let's summarize:
- if file is really large, and you don't want to wait for PROC IMPORT to read it: take only the first N rows from the file. You can do it
with UNIX head -N
or by writing a data step with infile+file+input+put statements. Include obs=N on the infile statement.
- run PROC IMPORT once, possibly with a low guessingrows= parameter.
- copy the data step code generated by PROC IMPORT
- you can modify the data step by including firstobs=2 and perhaps obs= (if you want to limit number of obs to read (e.g. for testing purposes))
- add truncover or missover if needed
- you might need to modify the generated input statement if the guess of PROC IMPORT was wrong
- compress=yes
Now your data step is ready to run with obs=max (to import all the observations).
Message was edited by: Gergely Bathó
You can read it into variables using dummy variable names.
For example to read first 20 columns for the first 20 rows.
Use FIRSTOBS=2 to skip the header line.
data sample;
infile 'myfile.csv' dsd truncover lrecl=300000 firstobs=2 obs=20 ;
length x1-x20 $200 ;
input x1-x20 ;
run;
Isn't TRUNCOVER for FORMATTED and/or COLUMN?
Since delimited input is a form of LIST input would there ever be a situation where TRUNCOVER would be needed. MISSOVER seems more appropriate.
I know it serves the same role here but it seems wrong.
In my opinion MISSOVER is deprecated and should never be used unless you really do want to throw away short values.
The point is there is no such thing as "short values" when reading with LIST input.
I agree that the effect is the same for this program. However since it is an INFILE option and not an INPUT option there can be a disconnect.
Since the MISSOVER option is so wrongly named and potentially harmful for other types of INPUT statements I think it is important that its use be discouraged.
In my experience using guessingrows with a large value can drastically slow down the import of CSV files.
Behind PROC IMPORT SAS builds DATA step code when reading CSVs. You can see this in the SAS log. Just copy this DATA step code into your SAS editor and run it. You will be amazed how much quicker it is.
Hi! I used to import data from Excel-sheets some years ago.
The data were characters (text) and characters(numbers, to be interpreted as numbers in SAS) and numbers (with period or comma as decimal sign).
My solution: Read a data line into a long character variable CV (32000 bytes).
Then read CV from the left and decide what is alphabetic text, what is numbers, etc. - and all the details.
In my problems I had a column structure from Excel, like: CustomerId (number) CustomerName (text) NumericValue (period or comma, perhaps E-format) next NumericValue, etc
Basically you have to use this structure to program the reading of the parts of the string.
We used this in production. 5000-10000 lines in Excel sheets, very mixed style.
Send me an email to anders.skollermo@one.se plus more info. I will solve it. / Br Anders Sköllermo
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.