07-28-2017 05:07 AM
I have an Excel file with over 100 worksheets, importing them manually to SAS would be a nightmare. Unfortunately, I don't have PC/FILES ACCESS, so proc compare doesn't work :/ is there any way to do that efficiently?
Thank you in advance!
07-28-2017 05:31 AM
Well, I will tell you straight up that importing data from Excel will always be a nightmare, yours will be 100 * nightmare. What OS are you working on, if its Windows and you have office installed and you are using SAS 9.4 then it is a simple task:
libname imp excel "<path to your file>/<your file>.xlsx"; ... libname imp clear;
The libname to xlsx files will create a usable libname with a dataset for each tab.
Of course if you are not on Windows/Office and do not have PCFILES then you can't import the data as is. Also if you don't have SAS 9.4 then you can't use libname. In these cases the simplest suggestion would be to copy the file to somewhere which does have Office (or LIbre or Open Office) open the file, write a simple VBA script which loops over each tab and drops it out to a CSV file. CSV are text files which can be read by anything (and are a far better data medium) or you could dump the data out as sql create statements which is then just run under your SAS (again text file, use anywhere). So yes, really case of provide enough information to decide on a method.
07-28-2017 05:46 AM
07-28-2017 06:35 AM
Well, I would really suggest moving up to 9.4. Also, and this is always one of the problems with Excel, if each tab has a different structure, how are you going to work with it as you will need to program for each tab separately however you get the data into SAS as each tab will be different? I mean its relatively simple to write a macro around a proc import and pass each one of a list of tab names in, but each of those output datasets will be different. There isn't a magic fix my Excel mess button I am afraid.
(To note, it doesn't help in this case as it wont fix your multiple structures, but for reference in VBA you just loop over sheet names and save as CSV, Excel does the writing for you).
07-28-2017 11:46 AM - edited 07-28-2017 11:47 AM
There are ways other than VBA script (for example, I use C# and read Excel using a library, but VBA is probably an easy option for you). Alternatives probably abound with stuff such as Python and R.
Loop the tabs and structure in VBA. Should be pretty easy to do.
Another alternative is to use Excel as an ODBC source if you have SAS/Access to ODBC.It doesn't save you from the table names but you can get that by reading the xlsx files as a zip and extracting out the pertinent metadata in the Excel XML files.
Take one of your xlsx files, copy it, then rename it to a .zip extension. You can see the structure.
There are also probably command line tools, for free, that will do this for you. It is a fairly common task to export Excel to CSV.