I have dozens of CSV files with about 80,000 records in each. At the top of each file are between 2 and 6 lines of header information, one of which starts with the text string VARIABLE_NAMES. Following this string are, unsurprisingly, the comma-separated variable names. The variable names differ from file to file, as do the number of fields.
I would like to write a macro (I don't think Proc Import will work in this instance) which will:
(a) identify in which of the first 2-6 rows the variable names are located (this is easily done, probably best done outside of the macro, the
row number being passed to the macro as a parameter)
(b) read in each record after the header information using these variable names as the names for the comma separated data fields.
The second parameter to the macro will be the file name and path.
I would tend to attack this by writing a DATA step that would copy the file into a working .csv file, stripping off the information up to VARIABLE_NAMES. That is fairly simple using _infile_ and string functions. Then you have a dataset that can be imported using PROC IMPORT.
This can be MACRO-ized, with one parameter for the initial file name, and maybe another for the output SAS dataset name, unless you want to go to the effort to parse that from the initial file name.