07-11-2011 10:22 AM
I am looking for a way to import all of the worksheets in an excel book without specifying the names of those sheets. Ideally, they would all be then joined to make one large dataset.
I imagine there has to be a macro that can handle this....
07-11-2011 10:29 AM
no macro required, if you are running SAS on windows platforms with SAS/Access for PC File Formats available
Just a libname statement, like
libname myxl excel 'path/workbook.xls' ;
proc contents data= myxl._all_ ; run ;
07-11-2011 10:36 AM
I don't see how I would join those into one file then. The myxl folder contains each of the worksheets, but then how to join them?
Is there a function to set all files in a directory together?
07-11-2011 12:12 PM
Sorry wcpatton, I didn't expect you wanted all sheets in a single table;
once you have the names in a macro variable, like &xlNames (created with code like)
proc sql noprint ;
select cats( "myxl.'", memname, "'n" ) into : xlNames separated by ' '
where libname = 'MYXL' and Nvar > 0
order by memname ;
%put found &sqlobs worksheets with columns in MYXL ;
just use that in code like:
data together ;
length from_sheet sht $50 ;
set &xlNames indsname= sht ;
from_sheet = sht ;
07-14-2011 11:52 AM
This works great, brilliant piece of code.
Any way to speed up the first step, though? Is there any thing that would stop the process once it has identified the sheet name? It runs for a while on these large workbooks, like it is reading the whole sheet into memory.
07-14-2011 01:01 PM
Here is an alternative program to find the sheet names that you can use to compare performance. If you have to work with EXCEL files a lot you may want to learn a bit about Office Automation and this is a good a place to start as any.
filename workbook "path to workbook.xls";
length path script filevar command $256;
path = pathname('WORKBOOK');
script = catx('\',pathname('WORK'),'SHEETNAMESS.vbs');
filevar = script;
/* write the script */
file dummy1 filevar=filevar;
put 'Const ' path=:$quote256.;
put 'Set objExcel = CreateObject("Excel.Application")';
put 'With objExcel';
put +3 '.Visible = False';
put +3 '.DisplayAlerts = False';
put +3 'Set objWorkbook = .Workbooks.Open(path)';
put +3 'Set colSheets = .Worksheets';
put +3 'For Each objSheet In colSheets';
put +6 'WScript.echo objsheet.name';
put +6 'Next';
put +3 '.Application.Quit';
put +3 'End With';
/* close the script file by opening another, not used */
filevar = catx('\',pathname('WORK'),'DUMMY.vbs');
file dummy1 filevar=filevar;
/* look at the script, not necessary but may be useful */
infile dummy2 filevar=script end=eof;
do _n_ = 1 by 1 while(not eof);
putlog _n_ z3. +1 _infile_;
/* call the script */
command = catx(' ','cscript //nologo',quote(strip(script)));
infile dummy3 pipe filevar=command end=eof length=l;
do while(not eof);
input sheet $varying256. l;
07-14-2011 01:28 PM
One should always take note of data_null's suggestions as they almost always bring expedient solutions, but I don't understand why the first part of Peter's code would take long. It is only reading file names from dictionary.tables. It never even accesses the spreadsheets.
07-14-2011 01:43 PM
I'm pretty sure the XL libname engine requires some interaction with EXCEL, but I don't know how it works. The information needed, in the case the MEMNAME aka SHEET Name, does not exist until is is asked for and the XL engine has to go get it.
Also the VBSCRIPT methods is not exactly speedy it has to open EXCEL load the workbook, etc...