on a recent project, we got a bit - beyond simple - a lot more specific
In the excel sheet metadata was maintained, providing name and type - as defined by storage length and/or informat. An optional format could be defined. The expected column header was also defined.
The order of metadata was the output column order.
A match between column header and "expected column header" was made. Unexpected columns would be "read" as $20.
It was designed to be an iterative process to support new data. It was built because the delivered information had "creep" as the use of underlying data columns adapted and new columns were added.
It overcame a few problems in the original "excel data collection".
1 mixed type columns
2 special missing values ( cells with symbols allied to special conditions instead of numbers)
3 columns delivered don't always match the columns required list.
The data was received as a formated excel sheet. It was "unformatted" (all columns given "format=text"). Then the data were saved from excel as a plain csv file. It is this csv file which is loaded subject to the rules of the metadata.
While the metadata can be maintained in a "quality" that is easy and reliable for SAS to read from excel (with just a libname statement making the connection), the primary data received, were not able to be controlled in such a reliable way. Hence the elaborate import management.
Of course once we thought we had a reliable method for handling almost any data supplied, new complexities demonstrated its shortcomings: sometimes column headers are not unique ("which metadata row should we choose?"); data sometimes needed more than one row of column header to define it.
Basically, these problems are caused by data that does not conform to "standard csv" but that is the nature of "data in excel".
At least "excel data" never causes a "problem of volume" ;-)