12-07-2012 02:13 PM
I have delimited txt files that I want to import into SAS. Each contains character and numeric variables, some of which are more interesting than others. Each file contains millions of records, so importing all of the cases and all of the variables is taking way too long.
I'm currently running a proc import (sample code below), but it's taking forever. Is there a more efficient way to do this? Can the proc import specify certain variables to drop before even reading in the program datavector? For instance, if I only want to import var1 and var3, can I specify that in the proc import?
proc import datafile="C:\rawdata\sales.txt" out=output.sales dmbs=dlm;
set output.sales (keep= var1 var3);
12-07-2012 02:23 PM
You could put the dataset options in the proc import statement :
proc import datafile="C:\rawdata\sales.txt" out=output.sales(keep=var1 var3) dmbs=dlm;
12-07-2012 02:32 PM
If your CSVs have a field name row as the first record and IF you know
You can use a program like that described here http://www.lexjansen.com/pharmasug/2011/TT/PharmaSUG-2011-TT04.pdf to read the files. If you have many files you can read them all at once, gathering the fields of interest from each.
If you don't have the field name row you could still use a similar technique but it would be simplified by just specifying the field relative positions. You will still need to know the type and informat.
You won't actually need to specify informat for most fields as most character and numeric fields will be read with the SAS default.