BookmarkSubscribeRSS Feed
RichardAD
Obsidian | Level 7

Proc IMPORT has the SHEET= statement to read data from a NamedRange or an explicit cell range reference.  

 

What do you do when you are given a workbook with a Table in it, and you need to read its data into SAS?

  • The SHEET= statement will fail if you use the name of the Table
  • Open the workbook and copy the range into my SAS code? No way I'm doing that more than twice.  The same workbook a week later and the table size could have changed and now my SAS code is not aligned with the table.
  • The functionality demonstrated here should be in the Proc IMPORT engine for DBMS=xlsx.  Either a new statement TABLE= or update SHEET= to also handle tables.

Base SAS does have the tools to discover the table range and then Proc IMPORT from the discovered range.

 

Here is some code to read in the Table data that is in an Excel file.

%let workbook_file=c:\temp\blank.xlsx ;

dm 'clear log; clear output;' ;

options source ;
options linesize=200 ;

proc template;
%let font = Calibri ;
define style styles.pearl_custom;
   parent = styles.pearl;
   class fonts /
      'TitleFont2'          = ("&font",10pt,bold)
      'TitleFont'           = ("&font",11pt,bold)
      'StrongFont'          = ("&font",8pt,bold)
      'EmphasisFont'        = ("&font",8pt,italic)
      'headingEmphasisFont' = ("&font",9pt,bold italic)
      'headingFont'         = ("&font",8pt,bold)
      'docFont'             = ("&font",8pt);
  end;
run;

ods _all_ close ;
ods html path='c:\temp' file="excel-data-table.html" style=styles.pearl_custom ;

*----------------------------------------------------------------------;
* Find each table definition                                        ;
*----------------------------------------------------------------------;

%macro halt () ; %abort cancel ; %mend ;
%macro noop () ; %mend ;

data _null_ ;
  rc = filename('opentest',"&workbook_file") ;
  if rc ne 0 then do ;
    msg = sysmsg() ; put msg "!" ;
    call execute('%nrstr(%halt;)') ;
  end ;
  fid = fopen ('opentest','i',0,'B') ;
  if fid = 0 then do ; 
    msg = sysmsg() ; put msg ; 
    call execute('%nrstr(%halt;)') ; 
  end ;
  else do ;
    rc = fclose (fid) ;
    put 'NOTE: Workbook file can be opened' ;
  end ;
  rc = filename('opentest') ;
run ;

*----------------------------------------------------------------------;
* Locate .xlsx items that specify data table range and related sheet   ;
*----------------------------------------------------------------------;

filename wbkrel catalog 'WORK.SXLEMAPS.WBKREL.SOURCE' ;
filename wshtbl catalog 'WORK.SXLEMAPS.WSHTBL.SOURCE' ;
filename wshrel catalog 'WORK.SXLEMAPS.WSHREL.SOURCE' ;
filename wbkmap catalog 'WORK.SXLEMAPS.WORKBOOK.SOURCE' ;
filename tblref catalog 'WORK.SXLEMAPS.TBLREF.SOURCE' ;
filename null temp ;
data _null_ ;
  infile datalines truncover ;
  file wbkrel ; do until (0) ; input line $char200. ; if line =: '<!' then leave ; l=length(line) ; put line $varying. l ; end ;
  file wshtbl ; do until (0) ; input line $char200. ; if line =: '<!' then leave ; l=length(line) ; put line $varying. l ; end ;
  file wshrel ; do until (0) ; input line $char200. ; if line =: '<!' then leave ; l=length(line) ; put line $varying. l ; end ;
  file wbkmap ; do until (0) ; input line $char200. ; if line =: '<!' then leave ; l=length(line) ; put line $varying. l ; end ;
  file tblref ; do until (0) ; input line $char200. ; if line =: '<!' then leave ; l=length(line) ; put line $varying. l ; end ;
  stop ;
datalines ;
<SXLEMAP version="2.1">
<TABLE name="WorkbookRelationships">
    <TABLE-PATH>/Relationships/Relationship</TABLE-PATH>
    <COLUMN name="Id">
        <PATH>/Relationships/Relationship/@Id</PATH>
        <TYPE>character</TYPE>
        <DATATYPE>string</DATATYPE>
        <LENGTH>32</LENGTH>
    </COLUMN>
    <COLUMN name="Target">
        <PATH>/Relationships/Relationship/@Target</PATH>
        <TYPE>character</TYPE>
        <DATATYPE>string</DATATYPE>
        <LENGTH>200</LENGTH>
    </COLUMN>
</TABLE>
</SXLEMAP>
<!-- Next -->
<SXLEMAP version="2.1">
<TABLE name="SheetTableParts">
    <TABLE-PATH>/worksheet/tableParts/tablePart</TABLE-PATH>
    <COLUMN name="Id">
        <PATH>/worksheet/tableParts/tablePart/@r:id</PATH>
        <TYPE>character</TYPE>
        <DATATYPE>string</DATATYPE>
        <LENGTH>32</LENGTH>
    </COLUMN>
</TABLE>
</SXLEMAP>
<!-- Next -->
<SXLEMAP version="2.1">
<TABLE name="WorksheetRelationships">
    <TABLE-PATH>/Relationships/Relationship</TABLE-PATH>
    <COLUMN name="Id">
        <PATH>/Relationships/Relationship/@Id</PATH>
        <TYPE>character</TYPE>
        <DATATYPE>string</DATATYPE>
        <LENGTH>32</LENGTH>
    </COLUMN>
    <COLUMN name="Target">
        <PATH>/Relationships/Relationship/@Target</PATH>
        <TYPE>character</TYPE>
        <DATATYPE>string</DATATYPE>
        <LENGTH>200</LENGTH>
    </COLUMN>
</TABLE>
</SXLEMAP>
<!-- Next -->
<SXLEMAP version="2.1">
<TABLE name="sheets">
    <TABLE-PATH>/workbook/sheets/sheet</TABLE-PATH>
    <COLUMN name="name">
        <TYPE>character</TYPE>
        <LENGTH>32</LENGTH>
        <PATH>/workbook/sheets/sheet/@name</PATH>
        <DATATYPE>string</DATATYPE>
    </COLUMN>
    <COLUMN name="id">
        <TYPE>character</TYPE>
        <LENGTH>32</LENGTH>
        <PATH>/workbook/sheets/sheet/@r:id</PATH>
        <DESCRIPTION>http://schemas.openxmlformats.org/officeDocument/2006/relationships</DESCRIPTION>
        <DATATYPE>string</DATATYPE>
    </COLUMN>
</TABLE>
<TABLE name="definedNames">
    <TABLE-PATH>/workbook/definedNames/definedName</TABLE-PATH>
    <COLUMN name="definedName">
        <PATH>/workbook/definedNames/definedName/@name</PATH>
        <TYPE>character</TYPE>
        <DATATYPE>string</DATATYPE>
        <LENGTH>200</LENGTH>
    </COLUMN>
    <COLUMN name="range">
        <PATH>/workbook/definedNames/definedName</PATH>
        <TYPE>character</TYPE>
        <DATATYPE>string</DATATYPE>
        <LENGTH>200</LENGTH>
    </COLUMN>
</TABLE>
</SXLEMAP>
<!-- Next -->
<SXLEMAP version="2.1">
<TABLE name="tableRef">
    <TABLE-PATH>/table</TABLE-PATH>
    <COLUMN name="name">
        <PATH>/table/@name</PATH>
        <TYPE>character</TYPE>
        <DATATYPE>string</DATATYPE>
        <LENGTH>100</LENGTH>
    </COLUMN>
    <COLUMN name="id">
        <PATH>/table/@id</PATH>
        <TYPE>character</TYPE>
        <DATATYPE>string</DATATYPE>
        <LENGTH>8</LENGTH>
    </COLUMN>
    <COLUMN name="ref">
        <PATH>/table/@ref</PATH>
        <TYPE>character</TYPE>
        <DATATYPE>string</DATATYPE>
        <LENGTH>32</LENGTH>
    </COLUMN>
</TABLE>
</SXLEMAP>
;

%macro GetWorkbookItems(file=, member=, mapref=, print=0) ;
  %if %sysfunc(fileref(&mapref)) ne 0 %then %do ;
    %put %sysfunc(sysmsg()) ;
    %abort cancel ;
  %end ;

  filename src zip "&file" member="&member" ;
  filename dst temp ;

  %put NOTE: fcopy rc = %sysfunc(fcopy (src, dst)) ;

  %if tblref = &mapref %then %do ;
  data _null_ ; infile &mapref ; input ; put _infile_ ; run ;
  data _null_ ; infile dst     ; input ; put _infile_ ; run ;
  %end ;

  libname member xmlv2 xmlfileref=dst xmlmap=&mapref access=READONLY;

  proc copy in=member out=work ;
  run ;

  %let memnames = '' ;
  proc sql noprint ;
    select quote(trim(memname)) into :memnames separated by ',' from dictionary.tables
    where libname = 'MEMBER' ;
  %let memname_count = &sqlobs ;

  data _null_ ;
    length memname $32 ;
    if &memname_count then do memname = &memnames ;
      call execute ('data ' || trim(memname) || ';') ;
      call execute ("retain member '&member' ;") ;
      call execute ("retain mapref '&mapref' ;") ;
      call execute ('set ' || trim(memname) || ';') ;
      call execute ('run ;') ;
      if "&print"="1" then
      call execute ('title ' || quote(trim(memname)) || ' ; proc print noobs data=' || trim(memname) || ' ; run ; title ;') ;
    end ;
  run ;

  libname member ;
  filename src ;
  filename dst ;
%mend GetWorkbookItems ;

%macro ReadExcelTables(workbook_file=) ;

options mprint ; title ;
options nocenter ;

%GetWorkbookItems (file=&workbook_file, member=xl/workbook.xml, mapref=wbkmap) ;
%GetWorkbookItems (file=&workbook_file, member=xl/_rels/workbook.xml.rels, mapref=wbkrel) ;

%* keep only worksheet relationships ;
proc sort data=WorkbookRelationships(where=(target like 'worksheets/%')) ;
  by id ;
run ;

title "sheets" ; proc print noobs data=sheets ; run ; title ;
title "definedNames" ; proc print noobs data=definedNames ; run ; title ;
title "workbookRelationships" ; proc print noobs data=workbookRelationships ; run ; title ;

%*-------------------------------------------------------------------- get table sheet+id relationships ;
proc delete data=allTableParts ;
data _null_ ;
  set WorkbookRelationships ;
  call execute ('%nrstr(%getWorkbookItems)(file=&workbook_file, mapref=wshtbl, member=xl/'||trim(Target)||')') ;
  call execute ('proc append base=allTableParts data=sheetTableParts ; run ;') ;
  call execute ('proc delete data=sheetTableParts ; run ;') ;
run ;

title "allTableParts" ;
proc sort data=allTableParts; by member Id ; run ;
proc print noobs data=allTableParts ; title ; run ;

%*-------------------------------------------------------------------- ;
proc delete data=allWorksheetRelationships ;
data _null_ ;
  set allTableParts ;
  by member ;
  if first.member ;
  length relmember $1000 ;
  relmember = trim(transtrn(member,'worksheets/','worksheets/_rels/')) || '.rels' ;
  call execute ('%nrstr(%getWorkbookItems)(file=&workbook_file, mapref=wshrel, member='||trim(relmember)||')') ;
  call execute ('proc append base=allWorksheetRelationships data=worksheetRelationships ; run ;') ;
  call execute ('proc delete data=worksheetRelationships ; run ;') ;
run ;

%if not %sysfunc(exist(allWorksheetRelationships)) %then %do ;
  %put NOTE: &workbook_file does not contain any Excel tables ;
  %return ;
%end ;

title "allWorksheetRelationships" ;
proc sort data=allWorksheetRelationships; by member Id ; run ;
proc print noobs data=allWorksheetRelationships ; run ; title ;

%*-------------------------------------------------------------------- ;
proc delete data=allTableRefs ;
data _null_ ;
  set allWorksheetRelationships ;
  length refmember $1000 ;
  refmember = 'xl/tables/' || scan(target,-1,'/') ;
  put 'NOTE: ' refmember= ;
  call execute ('%nrstr(%getWorkbookItems)(file=&workbook_file, mapref=tblref, member='||trim(refmember)||')') ;
  call execute ('proc append base=allTableRefs data=tableRef ; run ;') ;
run ;

title "allTableRefs" ;
/*proc sort data=allWorkbooksRelationships; by member Id ; run ;*/
proc print noobs data=allTableRefs ; run ; title ;

proc sql noprint ;
  create table allTableRanges as
  select 
    A.name as tableName
  , E.name as sheetName
  , A.ref as range
  from allTableRefs A
  join allWorksheetRelationships B on substr(B.Target,3) = substr(A.member,3)
  join allTableParts C on C.Id = B.Id and prxchange('s#/?[_.]rels##', 2, B.member) = C.member
  join workbookRelationships D on 'xl/'||D.Target = C.member
  join sheets E on E.Id = D.Id
  ;

%macro compact(data=) ;
%let syslast = &data ;
%local libname memname ;
%let libname = %scan(&syslast,1,.) ;
%let memname = %scan(&syslast,2,.) ;
proc sql noprint ;
  select 
    'max(length('||trim(name)||'))' ,
    ':l' || name ,
    trim(name) || ' char(&l' || trim(name) || ')'
    into 
    :measures separated by ',' ,
    :intos separated by ',' ,
    :modifies separated by ','
    from dictionary.columns where libname = "&libname" and memname = "&memname" and type='char'
    order by varnum
  ;

  %if &sqlobs = 0 %then %return ;

  select &measures into &intos from &data ;
  alter table &data modify &modifies ;
quit ;
%mend compact ;

%compact(data=allTableRanges) ;

title "Table Ranges" ;
proc print noobs data=allTableRanges; run ; title ;

filename workbook "&workbook_file" ;
data _null_ ;
  do until (lastTable) ;
    set allTableRanges end=lastTable;
    call execute ('proc import dbms=xlsx file=workbook replace out=' || trim(tableName) || ' ;') ;
    call execute ('range='||quote(catx('$',sheetName,range))||' ;') ;
    call execute ('run ;') ;
    call execute ('title ' || quote(trim(tableName)) || ' ; proc print noobs data=' || trim(tableName) || ' ; run ; title ;') ;
  end ;
run ;
filename workbook ;

ods html close ;

%mend ReadExcelTables ;

%ReadExcelTables(workbook_file=&workbook_file) ;
5 REPLIES 5
Tom
Super User Tom
Super User

What do you mean by "name of the Table"?  Is that something different the Excel file than the NamedRange you mentioned in the first sentence?

RichardAD
Obsidian | Level 7

An Excel "Table" is a data table created in Excel when you highlight a range of cell and then do "Format as Table".   The default names are Table1, Table2, etc... but the table name can be changed using the Edit feature of the "Name Manager" in the Formulas ribbon.  A named table is a scenario I encounter when a 'template' workbook is distributed for data collection or entry by other departments and groups.

 

Tables are not the same as NamedRanges.  There is no automatic named range created for a table, so the deep dive is needed to discover the cell range of the table.

 

 

Tom
Super User Tom
Super User

Can you explain the logic of how your program is decoding the XML in the XLSX file to find the table reference ranges?

To play around I made a simple workbook with three sheets and three "format as table" areas.

Tom_0-1732630614707.pngTom_1-1732630630888.pngTom_2-1732630648386.png

I then pointed a fileref at the workbook using the ZIP engine so I could see what files it contained.

 

Spoiler
filename x zip 'c:\downloads\tables.xlsx';

data files;
  did=dopen('x');
  do i=1 to dnum(did);
    length fname $200;
    fname=dread(did,i);
    output;
  end;
  rc=dclose(did);
  drop did rc ;
run;

Tom_3-1732630776465.png

It looks to me like the important files are:

  • xl/workbook.xml - Has the list of sheets
  • xl/tables/table*.xml  - Has the table names and reference ranges
  • xl/worksheets/_rels/sheet*.xml.rels - Has which tables are in which sheets

It is probably more correct to use the XML map files to read those, like in your program.  But since they follow a pretty consistent pattern it is also possible to just read them using data steps.

 

Spoiler
data sheets;
  infile x('xl/workbook.xml') recfm=n dlm=' >' ;
  input @'<sheets>' @ ;
  do sheetord=1 by 1 ;
    input @'<sheet ' @'name=' sheetname :$40. @;
    sheetname=dequote(sheetname);
    output;
  end;
run;

proc print;
 title 'sheets';
run;


data tables;
  set files;
  where fname =: 'xl/tables/';
  memvar=fname;
  infile x memvar=memvar recfm=n end=eof;
  do while (not eof);
    input @'<table ' @'id=' tableid :$10.
          @'name=' name :$50.
          @'ref=' ref :$100.
    ;
    tableid=dequote(tableid);
    name=dequote(name);
    ref=dequote(ref);
  end;
  keep tableid name ref;
run;

proc print data=tables;
 title 'tables';
run;


data rels;
  set files;
  where fname =: 'xl/worksheets/_rels/sheet';
  memvar=fname;
  infile x memvar=memvar dlm=' >' recfm=n end=eof;
  sheetord = input(substr(scan(fname,-3,'./'),6),32.);
  length tableid $10 word $200;
  input @'<Relationships' @'<Relationship ' @;
  do until(word=:'</Relationships');
    input word word word @;
    tableid=substr(scan(word,3,'./'),6);
    output;
    input word @;
  end;
  keep sheetord tableid ;
run;

proc print data=rels;
title 'rels';
run;
Tom_6-1732631432034.pngTom_7-1732631447162.pngTom_8-1732631460721.png

 

Once you have read those in you can combine to get a list of table names and the RANGE= value to use for PROC IMPORT.

proc sql;
  create table tablelist as
    select a.tableid,c.name as tablename,catx('$',b.sheetname,c.ref) as range
    from rels a
    inner join sheets b
    on a.sheetord = b.sheetord
    inner join tables c
    on a.tableid = c.tableid
    order by 1
  ;
quit;

Tom_4-1732631233881.png

Which you can then use to generate PROC IMPORT code.

filename code temp;
data _null_;
  set tablelist;
  file code ;
  put 'proc import dbms=xlsx file="c:\downloads\tables.xlsx" out=' tablename 'replace;'
    / range= :$quote. ';'
    / 'run;'
    / 'proc print data=' tablename '; title ' tablename :$quote. ';run;'
  ;
run;
%include code / source2;

Tom_5-1732631296595.png

 

 

 

RichardAD
Obsidian | Level 7

This could a paper or article...

For example:

.xlsx Structure, 3 Sheets.
  1 sheet has 2 tables, 1 sheet has 1 table, and 1 sheet has no tables

xl/
  workbook.xml
    <sheets> name, sheetId (#), r:id 
    <definedNames> name, range
  _rels/
    workbook.xml.rels
      <relationshipssheets> Id, Target
  tables/
    table1.xml
    table2.xml
    table3.xml
      <table> id, name, ref(range {just cells, no sheet})
  worksheets/
    _rels/
      sheet1.xml.rels
      sheet2.xml.rels
        <relationships> Id, Target(a table<n> xml)
   sheet1.xml
   sheet2.xml
   sheet3.xml
     <sheetData> ...ignore...
     <tableParts> r:id

The blue text are the xml tags and content therein and are retrieved by the xmlmaps I created for the XMLV2 library engine.

There is no quick association of which worksheet a table resides in.  It must be discovered by following values through the relationships.

Each table, sheet and sheet relationship are in different .xml files, so they have to be processed one at a time and stacked for use in Proc SQL.

 

A worksheet can have more than one table in it, so simple assumptions based on just member names (xml filenames) is not robust.

 

The macro GetWorkbookItems(file=, member=, mapref=, print=0) retrieves the table(s) defined by the xmlmap and adds the additional columns member and mapref for debugging, traceability and connectivity that is inferred by member name.

 

  The SQL join is a traversing inner join that connects table to worksheet and is quite a trek to get the SheetName for code generating SHEET=<sheet>$<range>.

RichardAD_0-1732638898173.png

 

 

Tom
Super User Tom
Super User

Interesting problem.  And a good chance to learn some new things about enhanced XLMV2 engine options that make it easier to deal with XML files that are inside of ZIP files (such as XLSX files).

 

So basically what I see is that the XLSX file has an XML file named xl/workbook.xml that contains a SHEETS tab with the worksheet names.  And potentially it also has a RANGES tag that has the list of named ranges (if any exist).  It might also have files named xl/worksheets/_rels/sheet<#>.xml.rels where the # is the suffix on the ID tag for the SHEET.  In these xml.rels files are RELATIONSHIP tags which have a field named TARGET that might point to files with names like ../tables/table<#>.xml.  Those table#.xml files will have NAME and RANGE tags.

 

At least with SAS version 9.4 (TS1M7) you can use the XMLV2 libref engine to read a file that you pointed a fileref using the ZIP filename engine.

filename _wb zip "myfile.xlsx" member="xl/workbook.xml";
libname _wb xmlv2 xmlfileref=_wb xmlmap=_map ;

If you include tables in your XMLMAP definition that don't appear in the XML file you are reading there is no error generated.  You just get an empty table.  So no need for multiple MAP files, one file can cover all three XML file types.

 

If you use FILENAME() and LIBNAME() functions and HASH objects you can find all of the worksheets, named ranges and "ranges formats as tables" from and XLSX file in one data step.

 

Spoiler
/* 
Get list of Sheets, Named Ranges, and Format as Table Ranges 
from an XLSX file 

Needs the XML MAP file to read the XML file.

Needs two inputs:  XSLX filename and target dataset name
Output is dataset with:

TYPE      Type of range: 
           RANGE = Named Range 
           SHEET = Sheet name
           TABLE = Name of "format as Table" areas
ORDER     Order within TYPE
NAME      Name of sheet/name/table
REFERENCE String to use for SHEET= or RANGE= statements
          of proc import to read the range.
XLSXFILE  Path of the source XLXS file

This program will use the following fileref and libref names
    xlsxmap - File with XML mapping for the XLM files used from the XLSX file
    _wb   - Point to workbook.xml file
    _rels - Point to .xml.rels files
    _tbl  - Point to table#.xlm files


*/
%let xlsxfile=c:\downloads\tables.xlsx;
%let xlsxfile=c:\downloads\PRAMS-MCH-Indicators-2016-2021.xlsx;
%let xlsxfile=c:\downloads\tables_ranges.xlsx;
%let dsn=xlsx_ranges;


* Make XML map file ;
filename xlsxmap temp;
data _null_;
  file xlsxmap;
put '<SXLEMAP version="2.1">'
  / '<!-- Table definitions for xl/workbook.xml -->'
  / '<TABLE name="sheets">'
  / '  <TABLE-PATH>/workbook/sheets/sheet</TABLE-PATH>'
  / '  <COLUMN name="Id">'
  / '    <TYPE>character</TYPE>'
  / '    <LENGTH>32</LENGTH>'
  / '    <PATH>/workbook/sheets/sheet/@r:id</PATH>'
  / '    <DATATYPE>string</DATATYPE>'
  / '  </COLUMN>'
  / '  <COLUMN name="Name">'
  / '    <TYPE>character</TYPE>'
  / '    <LENGTH>32</LENGTH>'
  / '    <PATH>/workbook/sheets/sheet/@name</PATH>'
  / '    <DATATYPE>string</DATATYPE>'
  / '  </COLUMN>'
  / '</TABLE>'
  / '<TABLE name="ranges">'
  / '  <TABLE-PATH>/workbook/definedNames/definedName</TABLE-PATH>'
  / '  <COLUMN name="Name">'
  / '    <PATH>/workbook/definedNames/definedName/@name</PATH>'
  / '    <TYPE>character</TYPE>'
  / '    <DATATYPE>string</DATATYPE>'
  / '    <LENGTH>200</LENGTH>'
  / '  </COLUMN>'
  / '  <COLUMN name="Range">'
  / '    <PATH>/workbook/definedNames/definedName</PATH>'
  / '    <TYPE>character</TYPE>'
  / '    <DATATYPE>string</DATATYPE>'
  / '    <LENGTH>200</LENGTH>'
  / '  </COLUMN>'
  / '</TABLE>'
  / '<!-- Table definitions for xl/worksheet/table#.xml -->'
  / '<TABLE name="table_range">'
  / '  <TABLE-PATH>/table</TABLE-PATH>'
  / '  <COLUMN name="Id">'
  / '    <PATH>/table/@id</PATH>'
  / '    <TYPE>character</TYPE>'
  / '    <DATATYPE>string</DATATYPE>'
  / '    <LENGTH>32</LENGTH>'
  / '  </COLUMN>'
  / '  <COLUMN name="Name">'
  / '    <PATH>/table/@name</PATH>'
  / '    <TYPE>character</TYPE>'
  / '    <DATATYPE>string</DATATYPE>'
  / '    <LENGTH>100</LENGTH>'
  / '  </COLUMN>'
  / '  <COLUMN name="Range">'
  / '    <PATH>/table/@ref</PATH>'
  / '    <TYPE>character</TYPE>'
  / '    <DATATYPE>string</DATATYPE>'
  / '    <LENGTH>200</LENGTH>'
  / '  </COLUMN>'
  / '</TABLE>'
  / '<!-- Table definition for .rels files -->'
  / '<TABLE name="relationship">'
  / '  <TABLE-PATH>/Relationships/Relationship</TABLE-PATH>'
  / '  <COLUMN name="Id">'
  / '    <PATH>/Relationships/Relationship/@Id</PATH>'
  / '    <TYPE>character</TYPE>'
  / '    <DATATYPE>string</DATATYPE>'
  / '    <LENGTH>32</LENGTH>'
  / '  </COLUMN>'
  / '  <COLUMN name="Target">'
  / '    <PATH>/Relationships/Relationship/@Target</PATH>'
  / '    <TYPE>character</TYPE>'
  / '    <DATATYPE>string</DATATYPE>'
  / '    <LENGTH>200</LENGTH>'
  / '    </COLUMN>'
  / '</TABLE>'
  / '</SXLEMAP>'
;
run;


data _null_;
  length type $5 order 8 name $32 reference $200;
  length xlsxfile dsn memname fname msg $200 fileref libref $8 rc 8;
  length id $32 target range $200;
  call missing(of _all_);

  xlsxfile = symget('xlsxfile');
  dsn = symget('dsn');

* Point libref at workbook.xml file ;
  memname="xl/workbook.xml";
  fileref='_wb';
  libref='_wb';
  link make_libref;
  if rc then stop;

* Setup hash object and iters ;
* RANGES - This is the output results ;
  declare hash ranges(ordered:'YES');
  ranges.definekey('type','order');
  ranges.definedata('type','order','name','reference','xlsxfile');
  ranges.definedone();

* SHEETS - Read from xl/workbook.xml file ;
  declare hash sheets(dataset:'_wb.sheets',ordered:'YES');
  sheets.definekey('Id');
  sheets.definedata('Id','Name');
  sheets.definedone();
  declare hiter sheet('sheets');

* NAMED_RANGES - Read from xl/workbook.xml file ;
  declare hash named_ranges(dataset:'_wb.ranges',ordered:'YES');
  named_ranges.definekey('Name');
  named_ranges.definedata('Name','Range');
  named_ranges.definedone();
  declare hiter named_range('named_ranges');

* RELS - Read from xl/worksheets/sheet#.xml.rels file ;
  declare hash rels;
  declare hiter rel;

* TABLES - Read from xl/tables/table#.xml files ;
  declare hash tables;
  declare hiter table;

* Add each named range ;
  eof0=named_range.first();
  do ord=1 by 1 while(0=eof0);
    type='RANGE';
    order=ord;
    reference=name;
    ranges.add();
    eof0=named_range.next();
  end;

* Add each sheet ;
  eof1=sheet.first();
  do ord=1 by 1 while(0=eof1);
    sheetid=id;
    sheetname=name;
    type='SHEET';
    order=ord;
    reference=name;
    ranges.add();

* Point libref at this sheets relationships file (if it exists) ;
    libref='_rels';
    fileref='_rels';
    memname=cats('xl/worksheets/_rels/sheet',substr(id,4),'.xml.rels');
    link make_libref;

    if not rc then do;
* Read in the sheet relationships ;
      rels = _new_ hash(dataset:'_rels.relationship(where=(target like ''../tables/table%.xml''))');
      rels.definekey('Id');
      rels.definedata('Id','Target');
      rels.definedone();
      rel = _new_ hiter('rels');
      eof2= rel.first();
      do while(0=eof2);
* Point libref at this tables definition file ;
        fileref='_tbl';
        libref='_tbl';
        memname=cats('xl/tables/',scan(target,-1,'/'));
        link make_libref;
        if rc then put 'ERROR: Unable to assign libref for table range.' target= memname= fileref= libref=;
        else do;
* Read in the table range file ;
          tables = _new_ hash(dataset:'_tbl.table_range');
          tables.definekey('Id');
          tables.definedata('Id','Name','Range');
          tables.definedone();
          table = _new_ hiter('tables');
          range=' ';
          eof3= table.first();
          type='TABLE';
          order=input(substr(scan(target,-2,'./'),6),32.);
          reference=catx('$',sheetname,range);
          ranges.add();
          tables.delete();
        end;
        rc=libname('_tbl');
        rc=filename('_tbl');
        eof2=rel.next();
      end;
      rels.delete();
    end;
    rc=libname('_rels');
    rc=filename('_rels');
    eof1=sheet.next();
  end;
* Close libref/fileref ;
  rc=libname('_wb');
  rc=filename('_wb');

* Write ranges hash to dataset ;
  ranges.output(dataset: dsn);
return;

make_libref:
* Make a libref pointing to XML file inside XLSX file. ;
* Input vars: xlsxfile, memname, fileref, libref ;
* Output vars: rc ;

* Point fileref to member in XLSX file ;
  rc=filename(fileref,xlsxfile,'zip',cats('member=',quote(trim(memname))));
  if rc then do;
    msg=sysmsg();
    put 'ERROR: Unable to create ' fileref= 'pointing to ' memname= 'in ' xlsxfile= ;
    put 'ERROR- ' msg ;
  end;
  else rc=fileref(fileref);
  if not rc then do;
* When member file exists then point libref to it ;
    rc=libname(libref,,'xmlv2',catx(' ','xmlfileref=',fileref,'xmlmap=xlsxmap access=readonly'));
    if rc then do;
      msg=sysmsg();
      put 'ERROR: Unable to create XMLV2 libref pointing to ' memname=;
      put 'ERROR- ' msg ;
    end;
  end;
return;

run;

proc print;
run;

Example Results:

NOTE: Processing XMLMap version 2.1.
NOTE: There were 3 observations read from the data set _WB.sheets.
NOTE: There were 1 observations read from the data set _WB.ranges.
NOTE: Processing XMLMap version 2.1.
NOTE: There were 2 observations read from the data set _RELS.relationship.
      WHERE target like '../tables/table%.xml';
NOTE: Processing XMLMap version 2.1.
NOTE: There were 1 observations read from the data set _TBL.table_range.
NOTE: Processing XMLMap version 2.1.
NOTE: There were 1 observations read from the data set _TBL.table_range.
NOTE: Processing XMLMap version 2.1.
NOTE: There were 1 observations read from the data set _RELS.relationship.
      WHERE target like '../tables/table%.xml';
NOTE: Processing XMLMap version 2.1.
NOTE: There were 1 observations read from the data set _TBL.table_range.
NOTE: The data set WORK.XLSX_RANGES has 7 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.04 seconds

Tom_0-1732942216310.png

Use the REFERENCE value with the SHEET= statement for TYPE=SHEET and with the RANGE= statement for the other two types.

proc import dbms=xlsx file="c:\downloads\tables_ranges.xlsx" out=Name1 replace;
  range="Name1";
run;
proc import dbms=xlsx file="c:\downloads\tables_ranges.xlsx" out=WasSheet2 replace;
  sheet="WasSheet2";
run;
proc import dbms=xlsx file="c:\downloads\tables_ranges.xlsx" out=WasTable2 replace;
  range="Sheet1$C6:E7";
run;

Here is another file that had a different type of relationship record for one of the sheets which why the WHERE= dataset option is used to only find the relationships to tables.

NOTE: Processing XMLMap version 2.1.
NOTE: There were 6 observations read from the data set _WB.sheets.
NOTE: There were 1 observations read from the data set _WB.ranges.
NOTE: Processing XMLMap version 2.1.
NOTE: There were 0 observations read from the data set _RELS.relationship.
      WHERE target like '../tables/table%.xml';
NOTE: The data set WORK.XLSX_RANGES has 7 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds

Tom_1-1732942232266.png

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 340 views
  • 1 like
  • 2 in conversation