I had to deal with XLSX files with quirky table and column names that were not known in advance. So I created a program to use PROC CONTENTS information to write code to copy all of the sheets. It lets SAS try to create valid variable names using the VALIDVARNAME=V7 option, but includes a step to generate more user friendly variable names from the column headers. If includes code to generate valid member names from the sheet names.
To use it first create a libref pointing to the XLSX file using the XLSX libname engine. And another pointing to where you want to write the copied data. This code assumes the source libref is A and the target libref is OUTA.
%let path=...\;
%let fnameA=A.xlsx;
libname A xlsx "&path.&fnameA";
libname outA "&path.sas";
* Set VALIDMEMNAME option to EXTEND to handle sheetnames with spaces ;
* Set VALIDVARNAME option to V7 ;
options validmemname=extend validvarname=v7;
* Get contents information from source workbook ;
proc contents data=A._all_ noprint out=contents;
run;
proc sort data=contents;
by memname varnum;
run;
* Process the contents and use it to generate code to create datasets ;
filename code temp;
data contents;
retain memname varnum name xlsxname type length label memlabel;
set contents (keep=memname memlabel varnum name type length label );
by memname ;
* Save current memname as quoted string in MEMLABEL to help with code gen;
memlabel=quote(trim(memname),"'");
* Build valid memname from existing memname ;
memname=translate(trim(prxchange('s/([^a-zA-Z0-9]+)/ /',-1,memname)),' _','_ ');
memname=prxchange('s/(^[0-9])/_$1/',1,memname);
* Remove tabs, lf or cr from labels ;
label=compbl(translate(label,' ','090A0D'x));
* Generate valid name from LABEL value ;
* Save name that the XLSX engine geneated when different;
XLSXname=name;
name=translate(trim(prxchange('s/([^a-zA-Z0-9]+)/ /',-1,label)),' _','_ ');
name=prxchange('s/(^[0-9])/_$1/',1,name);
if upcase(xlsxname)=upcase(name) then xlsxname=' ';
* Generate code to copy data from XLSX to permanent library ;
* Rename any variables where a better name was derived ;
* Remove formats and informats ;
file code;
if first.memname then put
'data outa.' memname '( label=' memlabel ');'
/ ' set a.' memlabel +(-1) 'n (rename=('
;
if xlsxname ne ' ' then put
@4 xlsxname '=' name
;
if last.memname then put
'));'
/ ' if cmiss(of _all_)=' varnum 'then delete;'
/ ' format _all_;'
/ ' informat _all_;'
/ 'run;'
;
* Remove the qutoes from the member label ;
memlabel=dequote(memlabel);
run;
/*
* Show generated code in SAS log ;
data _null_;
infile code;
input;
put _infile_;
run;
*/
* Run generated code ;
%include code/source2;
* Print the contents information ;
proc print data=contents;
by memname memlabel ;
id varnum name;
run;
I included generating a FORMAT statement to remove formats as SAS makes the mistake of attaching $xx formats to character variables. You might want to remove that or modify it to not remove formats from any variables that are DATE, TIME or DATETIME values. Should be possible if you keep the FORMAT column from the PROC CONTENTS output and use the FMTINFO() function to check if the format attached is in one of those categories.
... View more