I have delimited txt files that I want to import into SAS. Each contains character and numeric variables, some of which are more interesting than others. Each file contains millions of records, so importing all of the cases and all of the variables is taking way too long.
I'm currently running a proc import (sample code below), but it's taking forever. Is there a more efficient way to do this? Can the proc import specify certain variables to drop before even reading in the program datavector? For instance, if I only want to import var1 and var3, can I specify that in the proc import?
sample code:
proc import datafile="C:\rawdata\sales.txt" out=output.sales dmbs=dlm;
delimiter="|";
getnames=yes;
run;
data sales_small;
set output.sales (keep= var1 var3);
run;
You could put the dataset options in the proc import statement :
proc import datafile="C:\rawdata\sales.txt" out=output.sales(keep=var1 var3) dmbs=dlm;
PG
If your CSVs have a field name row as the first record and IF you know
You can use a program like that described here http://www.lexjansen.com/pharmasug/2011/TT/PharmaSUG-2011-TT04.pdf to read the files. If you have many files you can read them all at once, gathering the fields of interest from each.
If you don't have the field name row you could still use a similar technique but it would be simplified by just specifying the field relative positions. You will still need to know the type and informat.
You won't actually need to specify informat for most fields as most character and numeric fields will be read with the SAS default.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.