Create a statement to control which row to extract the variable names when using PROC IMPORT to read in a comma, tab or delimited file
Not sure what others opinions are, but delimited files (CSV etc.) to my understanding are rows of data, separated by a delimiter, one record per row, with an *optional* header row. Ok, delimited is not as structured as XML for instance, however in most cases you should only see data, or first row header information then data. So I don't see what the value of adding an option to read headers from non-first row would be as they then are not CSV but who-ever wrote the file's own format. How many case do we want to cover there? (reference: Comma-separated values - Wikipedia, the free encyclopedia, http://tools.ietf.org/html/rfc4180)
As a side note, the infile is a far better option for reading delimited files anyways, proc import is left in just to give headaches I am sure.
You can define ranges in excel
Named rangés are used by sas to read that area
This option is already available not well documented.
Proc import for xlsx not old style xls 2003 is not having that. For xlsx files it would make more sense to follow open office type as ms excel is based on that.
getnames option is valid for that
seems only valid if the variable names in the first row
then set getnames=no Base SAS(R) 9.3 Procedures Guide, Second Edition
The you will get default variable name Var1, Var2.... while in fact you have variable names in the raw file, just not located in the first row. sometimes if a file have many variables like hundreds, it's quite useful if we can get all the variable names automatically.
There is no way to predict all kind of possible approaches. For the common conventions SAS is having already much trouble to keep the interfaces up to date (eg Open office Json).With the datastep you can program all kind of own needed conventions. That is why programming needs exists. (no creationists)
Seems unnecessary to me. For unusually formatted files, use a datestep and infile. I'd rather see the dev team work on other problems that aren't easily solved.
I'd vote yes. Last year I had a problem reading in CSV files created by a hardware vendor's report. They put their own report name into the CSV file, it would be the first data record. The field names were on the second data record but GETNAMES doesn't allow specifying which row. I wound up having to use a data statement, strip out the first record, then pass the output file to PROC IMPORT. Messy and aggravating.
Lloydc for you frustrating at that time bu not messy and aggravating.
The real mess is coming from the surprises of the guessing approach of proc import never sure what the decision will be.
You could have got a data checksum or more as first record, the data could have been delivered by not 1 but multiple lines (many variable options). Any construction well documented as should be can be reliable handled by a datastep. Guessing with surprises is something done in a casino (monte carlo other area of statistical approaches).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.