Interesting problem. And a good chance to learn some new things about enhanced XLMV2 engine options that make it easier to deal with XML files that are inside of ZIP files (such as XLSX files).
So basically what I see is that the XLSX file has an XML file named xl/workbook.xml that contains a SHEETS tab with the worksheet names. And potentially it also has a RANGES tag that has the list of named ranges (if any exist). It might also have files named xl/worksheets/_rels/sheet<#>.xml.rels where the # is the suffix on the ID tag for the SHEET. In these xml.rels files are RELATIONSHIP tags which have a field named TARGET that might point to files with names like ../tables/table<#>.xml. Those table#.xml files will have NAME and RANGE tags.
At least with SAS version 9.4 (TS1M7) you can use the XMLV2 libref engine to read a file that you pointed a fileref using the ZIP filename engine.
filename _wb zip "myfile.xlsx" member="xl/workbook.xml";
libname _wb xmlv2 xmlfileref=_wb xmlmap=_map ;
If you include tables in your XMLMAP definition that don't appear in the XML file you are reading there is no error generated. You just get an empty table. So no need for multiple MAP files, one file can cover all three XML file types.
If you use FILENAME() and LIBNAME() functions and HASH objects you can find all of the worksheets, named ranges and "ranges formats as tables" from and XLSX file in one data step.
/*
Get list of Sheets, Named Ranges, and Format as Table Ranges
from an XLSX file
Needs the XML MAP file to read the XML file.
Needs two inputs: XSLX filename and target dataset name
Output is dataset with:
TYPE Type of range:
RANGE = Named Range
SHEET = Sheet name
TABLE = Name of "format as Table" areas
ORDER Order within TYPE
NAME Name of sheet/name/table
REFERENCE String to use for SHEET= or RANGE= statements
of proc import to read the range.
XLSXFILE Path of the source XLXS file
This program will use the following fileref and libref names
xlsxmap - File with XML mapping for the XLM files used from the XLSX file
_wb - Point to workbook.xml file
_rels - Point to .xml.rels files
_tbl - Point to table#.xlm files
*/
%let xlsxfile=c:\downloads\tables.xlsx;
%let xlsxfile=c:\downloads\PRAMS-MCH-Indicators-2016-2021.xlsx;
%let xlsxfile=c:\downloads\tables_ranges.xlsx;
%let dsn=xlsx_ranges;
* Make XML map file ;
filename xlsxmap temp;
data _null_;
file xlsxmap;
put '<SXLEMAP version="2.1">'
/ '<!-- Table definitions for xl/workbook.xml -->'
/ '<TABLE name="sheets">'
/ ' <TABLE-PATH>/workbook/sheets/sheet</TABLE-PATH>'
/ ' <COLUMN name="Id">'
/ ' <TYPE>character</TYPE>'
/ ' <LENGTH>32</LENGTH>'
/ ' <PATH>/workbook/sheets/sheet/@r:id</PATH>'
/ ' <DATATYPE>string</DATATYPE>'
/ ' </COLUMN>'
/ ' <COLUMN name="Name">'
/ ' <TYPE>character</TYPE>'
/ ' <LENGTH>32</LENGTH>'
/ ' <PATH>/workbook/sheets/sheet/@name</PATH>'
/ ' <DATATYPE>string</DATATYPE>'
/ ' </COLUMN>'
/ '</TABLE>'
/ '<TABLE name="ranges">'
/ ' <TABLE-PATH>/workbook/definedNames/definedName</TABLE-PATH>'
/ ' <COLUMN name="Name">'
/ ' <PATH>/workbook/definedNames/definedName/@name</PATH>'
/ ' <TYPE>character</TYPE>'
/ ' <DATATYPE>string</DATATYPE>'
/ ' <LENGTH>200</LENGTH>'
/ ' </COLUMN>'
/ ' <COLUMN name="Range">'
/ ' <PATH>/workbook/definedNames/definedName</PATH>'
/ ' <TYPE>character</TYPE>'
/ ' <DATATYPE>string</DATATYPE>'
/ ' <LENGTH>200</LENGTH>'
/ ' </COLUMN>'
/ '</TABLE>'
/ '<!-- Table definitions for xl/worksheet/table#.xml -->'
/ '<TABLE name="table_range">'
/ ' <TABLE-PATH>/table</TABLE-PATH>'
/ ' <COLUMN name="Id">'
/ ' <PATH>/table/@id</PATH>'
/ ' <TYPE>character</TYPE>'
/ ' <DATATYPE>string</DATATYPE>'
/ ' <LENGTH>32</LENGTH>'
/ ' </COLUMN>'
/ ' <COLUMN name="Name">'
/ ' <PATH>/table/@name</PATH>'
/ ' <TYPE>character</TYPE>'
/ ' <DATATYPE>string</DATATYPE>'
/ ' <LENGTH>100</LENGTH>'
/ ' </COLUMN>'
/ ' <COLUMN name="Range">'
/ ' <PATH>/table/@ref</PATH>'
/ ' <TYPE>character</TYPE>'
/ ' <DATATYPE>string</DATATYPE>'
/ ' <LENGTH>200</LENGTH>'
/ ' </COLUMN>'
/ '</TABLE>'
/ '<!-- Table definition for .rels files -->'
/ '<TABLE name="relationship">'
/ ' <TABLE-PATH>/Relationships/Relationship</TABLE-PATH>'
/ ' <COLUMN name="Id">'
/ ' <PATH>/Relationships/Relationship/@Id</PATH>'
/ ' <TYPE>character</TYPE>'
/ ' <DATATYPE>string</DATATYPE>'
/ ' <LENGTH>32</LENGTH>'
/ ' </COLUMN>'
/ ' <COLUMN name="Target">'
/ ' <PATH>/Relationships/Relationship/@Target</PATH>'
/ ' <TYPE>character</TYPE>'
/ ' <DATATYPE>string</DATATYPE>'
/ ' <LENGTH>200</LENGTH>'
/ ' </COLUMN>'
/ '</TABLE>'
/ '</SXLEMAP>'
;
run;
data _null_;
length type $5 order 8 name $32 reference $200;
length xlsxfile dsn memname fname msg $200 fileref libref $8 rc 8;
length id $32 target range $200;
call missing(of _all_);
xlsxfile = symget('xlsxfile');
dsn = symget('dsn');
* Point libref at workbook.xml file ;
memname="xl/workbook.xml";
fileref='_wb';
libref='_wb';
link make_libref;
if rc then stop;
* Setup hash object and iters ;
* RANGES - This is the output results ;
declare hash ranges(ordered:'YES');
ranges.definekey('type','order');
ranges.definedata('type','order','name','reference','xlsxfile');
ranges.definedone();
* SHEETS - Read from xl/workbook.xml file ;
declare hash sheets(dataset:'_wb.sheets',ordered:'YES');
sheets.definekey('Id');
sheets.definedata('Id','Name');
sheets.definedone();
declare hiter sheet('sheets');
* NAMED_RANGES - Read from xl/workbook.xml file ;
declare hash named_ranges(dataset:'_wb.ranges',ordered:'YES');
named_ranges.definekey('Name');
named_ranges.definedata('Name','Range');
named_ranges.definedone();
declare hiter named_range('named_ranges');
* RELS - Read from xl/worksheets/sheet#.xml.rels file ;
declare hash rels;
declare hiter rel;
* TABLES - Read from xl/tables/table#.xml files ;
declare hash tables;
declare hiter table;
* Add each named range ;
eof0=named_range.first();
do ord=1 by 1 while(0=eof0);
type='RANGE';
order=ord;
reference=name;
ranges.add();
eof0=named_range.next();
end;
* Add each sheet ;
eof1=sheet.first();
do ord=1 by 1 while(0=eof1);
sheetid=id;
sheetname=name;
type='SHEET';
order=ord;
reference=name;
ranges.add();
* Point libref at this sheets relationships file (if it exists) ;
libref='_rels';
fileref='_rels';
memname=cats('xl/worksheets/_rels/sheet',substr(id,4),'.xml.rels');
link make_libref;
if not rc then do;
* Read in the sheet relationships ;
rels = _new_ hash(dataset:'_rels.relationship(where=(target like ''../tables/table%.xml''))');
rels.definekey('Id');
rels.definedata('Id','Target');
rels.definedone();
rel = _new_ hiter('rels');
eof2= rel.first();
do while(0=eof2);
* Point libref at this tables definition file ;
fileref='_tbl';
libref='_tbl';
memname=cats('xl/tables/',scan(target,-1,'/'));
link make_libref;
if rc then put 'ERROR: Unable to assign libref for table range.' target= memname= fileref= libref=;
else do;
* Read in the table range file ;
tables = _new_ hash(dataset:'_tbl.table_range');
tables.definekey('Id');
tables.definedata('Id','Name','Range');
tables.definedone();
table = _new_ hiter('tables');
range=' ';
eof3= table.first();
type='TABLE';
order=input(substr(scan(target,-2,'./'),6),32.);
reference=catx('$',sheetname,range);
ranges.add();
tables.delete();
end;
rc=libname('_tbl');
rc=filename('_tbl');
eof2=rel.next();
end;
rels.delete();
end;
rc=libname('_rels');
rc=filename('_rels');
eof1=sheet.next();
end;
* Close libref/fileref ;
rc=libname('_wb');
rc=filename('_wb');
* Write ranges hash to dataset ;
ranges.output(dataset: dsn);
return;
make_libref:
* Make a libref pointing to XML file inside XLSX file. ;
* Input vars: xlsxfile, memname, fileref, libref ;
* Output vars: rc ;
* Point fileref to member in XLSX file ;
rc=filename(fileref,xlsxfile,'zip',cats('member=',quote(trim(memname))));
if rc then do;
msg=sysmsg();
put 'ERROR: Unable to create ' fileref= 'pointing to ' memname= 'in ' xlsxfile= ;
put 'ERROR- ' msg ;
end;
else rc=fileref(fileref);
if not rc then do;
* When member file exists then point libref to it ;
rc=libname(libref,,'xmlv2',catx(' ','xmlfileref=',fileref,'xmlmap=xlsxmap access=readonly'));
if rc then do;
msg=sysmsg();
put 'ERROR: Unable to create XMLV2 libref pointing to ' memname=;
put 'ERROR- ' msg ;
end;
end;
return;
run;
proc print;
run;
Example Results:
NOTE: Processing XMLMap version 2.1.
NOTE: There were 3 observations read from the data set _WB.sheets.
NOTE: There were 1 observations read from the data set _WB.ranges.
NOTE: Processing XMLMap version 2.1.
NOTE: There were 2 observations read from the data set _RELS.relationship.
WHERE target like '../tables/table%.xml';
NOTE: Processing XMLMap version 2.1.
NOTE: There were 1 observations read from the data set _TBL.table_range.
NOTE: Processing XMLMap version 2.1.
NOTE: There were 1 observations read from the data set _TBL.table_range.
NOTE: Processing XMLMap version 2.1.
NOTE: There were 1 observations read from the data set _RELS.relationship.
WHERE target like '../tables/table%.xml';
NOTE: Processing XMLMap version 2.1.
NOTE: There were 1 observations read from the data set _TBL.table_range.
NOTE: The data set WORK.XLSX_RANGES has 7 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.04 seconds
Use the REFERENCE value with the SHEET= statement for TYPE=SHEET and with the RANGE= statement for the other two types.
proc import dbms=xlsx file="c:\downloads\tables_ranges.xlsx" out=Name1 replace;
range="Name1";
run;
proc import dbms=xlsx file="c:\downloads\tables_ranges.xlsx" out=WasSheet2 replace;
sheet="WasSheet2";
run;
proc import dbms=xlsx file="c:\downloads\tables_ranges.xlsx" out=WasTable2 replace;
range="Sheet1$C6:E7";
run;
Here is another file that had a different type of relationship record for one of the sheets which why the WHERE= dataset option is used to only find the relationships to tables.
NOTE: Processing XMLMap version 2.1.
NOTE: There were 6 observations read from the data set _WB.sheets.
NOTE: There were 1 observations read from the data set _WB.ranges.
NOTE: Processing XMLMap version 2.1.
NOTE: There were 0 observations read from the data set _RELS.relationship.
WHERE target like '../tables/table%.xml';
NOTE: The data set WORK.XLSX_RANGES has 7 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
... View more