Dear All:
This is a question that puzzled me a long time: How to import multiples Excel files (be it .xls or .xlsx) in a given folder (say, c:\excelfolder\) to SAS dataset? Preferably, I can get all the Excesl appended to one SAS dataset. But if that is too difficult, I hope to have multiple converted SAS files so that I can merge them later.
Thus, I have:
c:\excelfolder\pig.xlsx
c:\excelfolder\rabit.xlsx
c:\excelfolder\big rabit.xlsx
c:\excelfolder\monster.xlsx
...
I want:
c:\sasfolder\animals.sas7bdat
If that is too challenging, I hope to have:
c:\sasfolder\pig.sas7bdat
c:\sasfolder\rabit.sas7bdat
c:\sasfolder\big rabit.sas7bdat
c:\sasfolder\monster.sas7bdat
I tried multiple macro and codes online without success.. Thanks !!!
Here is another program that the author didn't specify how to modify the code such that it can be used in other directory.
** 1) Create a macro variable &PATH with the folder location of this program and XLS
files;
proc sql noprint;
select tranwrd(xpath,"\" || scan(xpath,-1,'\'),'') into :path from
sashelp.vextfl
where upcase(xpath) like '%.SAS';
quit;
%put &path;
** 2) Create a file with one row per XLS file to be converted;
data dirxls(keep=file_name);
length fullname file_name $200;
rc=filename("dir","&path");
d=dopen("dir");
n=dnum(d);
do i=1 to n;
fullname=dread(d,i);
if index(UPCASE(fullname),".XLS") then do;
file_name=scan(fullname,1,".");
output;
end;
end;
rc=dclose(d);
run;
** 3) Read each XLS file into SAS with PROC IMPORT and save permanently to curr
library;
libname curr "%trim(&path)";
data _null_;
set dirxls;
call execute('PROC IMPORT OUT=curr._'||strip(file_name)||
' DATAFILE= "%trim(&path)\'||strip(file_name)||'.xls" DBMS=EXCEL REPLACE;'||
' GETNAMES=YES; MIXED=YES; SCANTEXT=YES; USEDATE=YES; SCANTIME=YES; RUN;');
run;
This might be helpful.
I was trying to use the following code. The funny thing is there is no place even for me to specify the directory where my Excel files sit in:
****************************************************************************;
* Multifileimporterattrib2b.sas *;
* Version 1.40 *;
* Date 2011/05/06 *;
* Function - to import any .csv .txt, .dbf, stata .dta, spss .sav or .xls files *;
* in a directory and read it into the SAS system *;
* To read dbf dta sav or xls file you need SAS/Access to PC files licensed *;
* It also has an option to append all the files into a final dataset *;
* Current limitations, filenames must be two level names such as *;
* fred.xls or myfile.txt , it does not handle *;
* fred.barney.xls, or myfile.filename.txt *;
* For excel will only read the first alphabetical sheet in the xls file *;
* or named range in the xls file *;
* To read multiple sheets see autoimporterv9 program *;
* *;
* This program assumes that the variable names are in the 1st row *;
* *;
* PLEASE LOOK AT THE SECTIONS OF THE PROGRAM YOU MUST CHANGE *;
* *;
* There are 3 sections at the top and one section at the bottom *;
* that you must change *;
* Features or Fixes added: *;
* 2002/08/29 - added the ability to specify the libref where *;
* individual datasets are stored *;
* - added a new routine to clean out the master dataset *;
* 2002/10/21 - added csv support *;
* 2004/10/21 - dbf support added *;
* fixed bug with scan function *;
* 2006/12/19 - added a sheetname option *;
* 2007/11/12 - added a checking routine for spreadsheets with all *;
* numbers as their names *;
* 2007/11/14 - added a removal routine to remove empty excel files *;
* from the input stream *;
* based on the size in bytes *;
* 2007/11/16 - made the numcheck routine an option here rather *;
* than required *;
* 2007/12/04 - added all the version 9 options to the import xls *;
* 2007/12/07 - added smoother dbf routines for importing *;
* 2008/02/05 - fixed branching routine for excel files *;
* 2010/02/01 - smoothed appending routine *;
* 2010/06/24 - added SPSS and STATA support *;
* 2011/04/29 - added variable scanning support for appending *;
* into the master dataset *;
* *;
*********************************************************************;
dm wpgm 'clear log' wpgm;
options macrogen symbolgen mprint;
* first change the directory of the file you want to read here *;
* note if you have spaces in your directory you must bound them *;
* with double quotes like this *;
* filename fred pipe 'dir k:\"program files"\"my sas files" ' *;
***** >>>>>>>>>>>>> Change the directory here <<<<<<<<<<<<<<<<<<< *;
filename fred pipe 'dir c:\sastest2\ditter';
data temp1;
* use the same directory here note the extra \ used however *;
***** >>>>>>>>>>>>>>>>>>>>> Change the directory here <<<<<<<<<<< *;
dir1="c:\sastest2\ditter\";
**** >>>>>>>>>>>>>>>>>> Change the value here <<<<<<<<<<<<<<<<< *;
* change the value to what you want to read, tab, csv, dbf, dta, sav, or xls *;
* make sure your extensions in the directory are LOWER case , that is *;
* xls , not XLS *;
db='xls';
infile fred truncover;
length var1 $ 150;
input var1 $ 1-150;
put _infile_;
run; quit;
* this strips out the headers found in the data *;
data temp2;
set temp1;
if _n_ > 7;
ordervar=_n_;
run; quit;
*********************************************************************;
* We then sort the results descending to get put the trailer data *;
* on the top *;
*********************************************************************;
proc sort out=temp3;
by descending ordervar;
run;
*********************************************************************;
* Then we strip out the trailer data *;
*********************************************************************;
data temp3; set temp3;
if _n_ > 2;
run;
* then resort it back *;
proc sort data=temp3;
by ordervar;
run;
*****************************************************************;
* Then we setup the data to be processed *;
* We substring out the dataset name and get the extension *;
* so we know what to do *;
*****************************************************************;
data preset; set temp3;
length dbms $ 10;
length sasname $ 100;
dsname=reverse(scan(reverse(left(var1)),1,' '));
dsname2=reverse(scan(reverse(left(var1)),2,'20'x));
dsname3=reverse(scan(reverse(left(var1)),3,'20'x));
dsext=scan(dsname,2,'.');
sasname=scan(dsname,1,'.');
quoter='"';
if (dsname3='AM') or (dsname3='PM') then do;
dectectit='NO' ;
end;
else do;
detectit='YES';
end;
if detectit='YES' then do;
dsnameb=dsname;
sasnewname=left(trim(dsname2))||left(trim(dsnameb));
newfield2=compress(sasnewname);
dsname=newfield2;
sasname2=left(trim(dsname2))||left(trim(sasname));
end;
fullname=left(trim(quoter))||left(trim(dir1))||left(trim(dsname))||left(trim(quoter));
xlssize=substr(var1,28,11);
xlssize2=compress(xlssize,',');
truesize=xlssize2 * 1.0;
if dsext='txt' then dbms='tab';
else if dsext='xls' then dbms='xls';
else if dsext='csv' then dbms='csv';
else if dsext='dbf' then dbms='dbf';
else if dsext='sav' then dbms='sav';
else if dsext='dta' then dbms='dta';
else if sasname='xls' then dbms='xls';
run;
data setup; set preset;
if dbms=db;
sasname=translate(sasname,'_','-');
run;
quit;
proc print data=setup;
run;
********************************************;
* Now to start the macro *;
* the first loop here takes all the *;
* dataset names and figures out how many *;
* there are and creates a counter *;
* see the actual macro call for the *;
* explanation of the parameters *;
********************************************;
options macrogen symbolgen mprint;
%macro imploop(lib1=,appender=,replacer=,lib2=,final=,specsht=,mysheet=,
getnames=,mixed=,scantext=,usedate=,scantime=,textsize=,
prechar=,xlsrem=,xlsremsz=,numcheck=,lengthcall=,lengthcode=);
*this routine checks for the presence of numbers *;
* in column1 which is illegal for sas names and adds the string of your choice *;
*to the name of the output dataset *;
* call the empty xls file remove routine here *;
%if &xlsrem=yes %then %do;
data setup; set setup;
if truesize > &xlsremsz;
run;
%end;
%if &numcheck=yes %then %do;
data setup;
set setup;
numcheck=substr(sasname,1,1);
numcheck=(numcheck*1.0);
if numcheck >= 0 then do;
sasname=&prechar||sasname;
end;
run;
%end;
data _null_;
set setup end=last;
call symput('dsnamem'||left(_n_),trim(fullname));
call symput('dbmsm' ||left(_n_),trim(dbms));
call symput('sasnamem' ||left(_n_),trim(sasname));
if last then call symput('counter',_n_);
run;
%do i= 1 % to &counter;
%put &&dsnamem&i;
%put &&dbmsm&i;
%put &&sasnamem&i;
%if &&dbmsm&i=xls %then %do;
%if &specsht=yes %then %do;
proc import datafile=&&dsnamem&i
out=&lib1..&&sasnamem&i
dbms=excel replace;
sheet=&mysheet;
getnames=&getnames;
mixed=&mixed;
scantext=&scantext;
usedate=&usedate;
scantime=&scantime;
textsize=&textsize;
run;
%end;
%if &specsht=no %then %do;
proc import datafile=&&dsnamem&i
out=&lib1..&&sasnamem&i
dbms=excel replace;
getnames=&getnames;
mixed=&mixed;
scantext=&scantext;
usedate=&usedate;
scantime=&scantime;
textsize=&textsize;
run;
run; quit;
%end;
%end;
%if (&&dbmsm&i ^= xls) %then %do;
proc import datafile=&&dsnamem&i
out=&lib1..&&sasnamem&i
dbms=&&dbmsm&i replace;
run; quit;
%end;
%end;
%if &replacer=yes %then %do;
proc datasets library=&lib2;
delete &final;
run; quit;
%end;
******************************************************************;
* set lengths on variables before appending on to master dataset *;
******************************************************************;
%if &lengthcall=yes %then %do;
proc datasets;
delete bigmastervar;
delete maxmastervar;
run;
%do i= 1 %to &counter;
proc sql;
create table colmaster&i
as select name as myname, type, length as mylength from dictionary.columns
where memname=upcase("&&sasnamem&i");
quit;
data colmaster&i; set colmaster&i;
obsno=_n_;
run;
proc append base=bigmastervar data=colmaster&i;
run;
%end;
proc sql;
create table maxmaster as
select obsno, myname, type, max(mylength) as maxlength from bigmastervar
group by obsno, myname;
quit;
proc sort nodup data=maxmaster out=maxmaster2;
by obsno myname type maxlength;
run;
data maxmaster2;
set maxmaster2;
maxlength2=maxlength||'.';
run;
data _null_; set maxmaster2;
file &lengthcode ;
if type='char' then do ;
put 'length ' myname ' $' maxlength2 ' ;' ;
end;
else do ;
put 'length ' myname ' ' maxlength2 ' ;' ;
end;
run;
data &lib2..&final;
%include &lengthcode;
run;
%end;
******************************************************************;
* end of 1st lengthcall routine *;
******************************************************************;
%if &appender=yes %then %do;
%do i= 1 % to &counter;
proc append base=&lib2..&final data=&lib1..&&sasnamem&i force;
run; quit;
%end;
%end;
%mend imploop;
*****************************************************************;
* now call the macro itself *;
* there are five components *;
* ALL OF THESE COMPONENTS MUST BE SPECIFIED WHETHER YOU USE *;
* THAT PARTICULAR FEATURE OR NOT, THEY WILL BE IGNORED IF *;
* THE FEATURE IS NOT USED BUT THE MACRO STILL REQUIRES THEM *;
* *;
* 1. the libref where the individual datasets should be stored *;
* 2. the append routine is called if the value is yes *;
* this places all the individual datasets into one final *;
* final dataset *;
* 3. whether we should create a new master dataset to append *;
* the datasets into or whether we should use an existing one *;
* if it is available *;
* Yes means delete the existing master and start again for *;
* the run appending to it *;
* No means add to the existing master for the run *;
* 4. the libref where the final dataset should be stored *;
* 5. the name of the final dataset *;
* 6. specifying a specific sheet to import from each xls *;
* yes or no *;
* If you specify no then we will read the first sheet in *;
* the xls file. *;
* If you want to read all the sheets in the xls file *;
* contact tech support for a different program *;
* 7. The actual sheetname in quotes as specified here *;
* 8. Getnames - yes or no - default is YES *;
* whether the first row of the *;
* sheet is used to extract the names of the variables *;
* This parameter is also used to set SCANMEMO when *
* importing ACCESS databases *;
*************************************************************;
* 9. Mixed - yes or no - Default is NO *;
* specifies whether to convert numeric data values into *;
* character data values data for a column with mixed data *;
* types. This option is valid only while importing data *;
* from Excel. The default is NO, which means that numeric *;
* data will be imported as missing values in a character *;
* column. If MIXED_DATATYPE=YES, the engine will assign a *;
* SAS character type for the column and convert all numeric *;
* data values to characters data. This option is only valid *;
* while reading(importing) data into SAS. *;
*************************************************************;
* 10. Scantext Yes or No, default is YES *;
* specifies whether to scan the length of text data for *;
* a DBMS column and use the length of the longest string *;
* data found as SAS column width. For MS Excel, This *;
* applies to all character data type columns. For *;
* MS Access, this applies to MEMO data type fields only *;
* and doesnot apply to TEXT(less than 256 characters long)*;
* fields. If the maximum length found is greater than *;
* what is specified in DBMAX_TEXT=, the small value *;
* specified in DBMAX_TEXT= will be applied. *;
*************************************************************;
* 11. Usedate Yes or No - Default is YES *;
* specifies whether to use DATE. format for date/time *;
* columns/fileds while importing data from MS Excel or *;
* MS Access file. YES means to use the SAS DATE. format,*;
* NO means to use SAS DATETIME. format. *;
*************************************************************;
* 12. Scantime Yes or No - Default is YES *;
* specifies whether to scan all row values in a date/time*;
* data type column and automatically determine to use the*;
* TIME. format if only time values(i.e. no date or *;
* datetime values) found in the column. To import data,*;
* by default, this libname engine assigns SAS DATE. *;
* format for date/time columns in MS Excel and assigns *;
* SAS DATETIME. format for MS Access date/time fields. *;
* However, if this option is turned on, a column with only *;
* time values will be assigned with a SAS TIME. format. *;
*******************************************************************;
* Textsize *;
* 13. Controls the largest length of a text string imported *;
* used in conjunction with scantext *;
* 14. this is the parameter that adds a character to the front *;
* of the dataset if the sheet starts with a number which is *;
* illegal in SAS dataset naming conventions by default it is *;
* set to sas_ *;
* 15 Call the drop routine to strip out empty xls files *;
* 16. If calling the drop routine how small are the empty excel *;
* in bytes. Use windows explorer to see this *;
* set this for the largest empty xls file *;
* PLEASE USE OPTIONS 15 and 16 CAREFULLY *;
* 17. Call the numcheck string. This is good when your *;
* sheet tab names are all numbers *;
* 18. lengthcall , this checks the lengths of all the data *;
* variables in all datasets and create a length statement *;
* that is long enough to cover them all *;
* 19. where to write the length statement calls *;
* this a flat file to store the .sas program with the length *;
* statements *;
* this will generate a note that the variables are unitialized *;
* because they have no values at that point *;
*******************************************************************;
%imploop(lib1 =work, /*libref where datasets are stored */
appender =yes, /*call append routine */
replacer =yes, /*create a new master dataset */
lib2 =work, /*libref for the new master dataset */
final =final, /*name of the new master dataset */
specsht =no, /*specify a specific sheet (xls files only) */
mysheet ="sheet1", /*name of the sheet (xls files only) */
getnames =yes, /* getnames */
mixed =yes, /* mixed data (xls files only) */
scantext =yes, /* scan text (xls files only) */
usedate =yes, /* use date (xls files only) */
scantime =yes, /* scantime (xls files only) */
textsize =1024, /* textsize (xls files only) */
prechar ="sas_", /*extension to add to spreadsheets with only numbers */
xlsrem =no, /*call the drop routine */
xlsremsz =18432, /*size of empty xls files */
numcheck =yes, /*call the numcheck routing routine for all numbers sheet names */
lengthcall=yes, /* call the routine to check the lengths variables in imported data */
lengthcode='c:\sastest2\ditter\mylength.sas'); /* where do I write the length statements to set */
caveman529 wrote:
I was trying to use the following code. The funny thing is there is no place even for me to specify the directory where my Excel files sit in:
dm wpgm 'clear log' wpgm;
options macrogen symbolgen mprint;
* first change the directory of the file you want to read here *;
* note if you have spaces in your directory you must bound them *;
* with double quotes like this *;
* filename fred pipe 'dir k:\"program files"\"my sas files" ' *;
***** >>>>>>>>>>>>> Change the directory here <<<<<<<<<<<<<<<<<<< *;
filename fred pipe 'dir c:\sastest2\ditter';
data temp1;
* use the same directory here note the extra \ used however *;
***** >>>>>>>>>>>>>>>>>>>>> Change the directory here <<<<<<<<<<< *;
dir1="c:\sastest2\ditter\";
**** >>>>>>>>>>>>>>>>>> Change the value here <<<<<<<<<<<<<<<<< *;
* change the value to what you want to read, tab, csv, dbf, dta, sav, or xls *;
* make sure your extensions in the directory are LOWER case , that is *;
* xls , not XLS *;
db='xls';
I don't think you need to make it so complicated. Something like this should be adequate. My OS is UNIX but you change LS to DIR.
Could you please how to use it?
I get "Unable to import, please specify a DATAFILE."
All I did is modify the following line:
command = "/usr/bin/ls ~/*.xlsx";
to
command = "dir c:\excelfolder\*.xlsx";
In addition to the above, I also tried to add the following at the end, but it doesn't seem to run... Thanks -
%impt(c:\excelfolder\,i);
What OS are you on?
Post code that works for importing a single excel file as well.
I have SAS 94 x64 on Win 7 x64 with 32 bit Office. SAS File Server installed. Thanks -
The following code runs on both SAS and SAS EG
PROC IMPORT OUT= WANT DATAFILE= "H:\file1.xlsx"
DBMS=xlsx REPLACE;
SHEET="Sheet1";
GETNAMES=YES;
RUN;
proc contents data=want; run;
I have SAS 94 x64 on Win 7 x64 with 32 bit Office. SAS File Server installed. Thanks -
The following code runs on both SAS and SAS EG
PROC IMPORT OUT= WANT DATAFILE= "H:\file1.xlsx"
DBMS=xlsx REPLACE;
SHEET="Sheet1";
GETNAMES=YES;
RUN;
proc contents data=want; run;
I thing you first step is to get the output from DIR read in correctly and then you can gen the call to the PROC IMPORT macro.
Run the following and post the result. Also look at help for DIR so that it does not display any "extra stuff" you don't need.'
data _null_; 
   command = "your-dir-and the rest""; 
   infile dummy pipe filevar=command end=eof truncover; 
   do i = 1 by 1 while(not eof);
      input; 
      put _infile_;
      end; 
   stop; 
   run; 
Here is the log after running the code. It seem there is a small typo in the code.
1 The SAS System 22:19 Saturday, January 25, 2014
1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Program';
4 %LET _CLIENTPROJECTPATH='';
5 %LET _CLIENTPROJECTNAME='';
6 %LET _SASPROGRAMFILE=;
7
8 ODS _ALL_ CLOSE;
9 OPTIONS DEV=ACTIVEX;
10 GOPTIONS XPIXELS=0 YPIXELS=0;
11 FILENAME EGSR TEMP;
12 ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
13 STYLE=HtmlBlue
14 STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/6.1/Styles/HtmlBlue.css")
15 NOGTITLE
16 NOGFOOTNOTE
17 GPATH=&sasworklocation
18 ENCODING=UTF8
19 options(rolap="on")
20 ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
21
22 GOPTIONS ACCESSIBLE;
23 data _null_;
24 command = "dir h:\*.xlsx /b"";
25 infile dummy pipe filevar=command end=eof truncover;
26 do i = 1 by 1 while(not eof);
27 input;
28 put _infile_;
29 end;
30 stop;
31 run;
32
33 GOPTIONS NOACCESSIBLE;
34 %LET _CLIENTTASKLABEL=;
35 %LET _CLIENTPROJECTPATH=;
36 %LET _CLIENTPROJECTNAME=;
37 %LET _SASPROGRAMFILE=;
38
39 ;*';*";*/;quit;run;
____
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.05 seconds
cpu time 0.04 seconds
40 ODS _ALL_ CLOSE;
41
42
43 QUIT; RUN;
44
I would think 2 things: (1) your command line isn't correct and (2) with windows the dir command doesn't include the path (in your case the macro variable filename.
Dir, by itself, includes a couple of extra fields. I would think that you need to use:
command = "dir c:\excelfolder\*.xlsx /b";
Then, while I can't test it, I'd think you want to add a line in your macro. Namely, where you have:
input path $128.;
I'd think you want to add another line something like
path=catt("&filename.",path);
Hi, Arthur:
I have to admit that I'm not that familiar with these procedures with SAS. Do you mind telling me how to incorporate your point to Data_Null's code? The thing with programming is that a tiny mistake can render the whole program not able to run. Thanks -
%macro impt(filename,i); 
   proc import DATAFILE="&filename" OUT=XL&i DBMS=XLSX REPLACE;
      run; 
   %mend impt;
data _null_; 
   command = "/usr/bin/ls ~/*.xlsx"; 
   infile dummy pipe filevar=command end=eof truncover; 
   do i = 1 by 1 while(not eof);
      input path $128.; 
      put 'NOTE: ' path=;
      call execute(cats('%nrstr(%impt(',path,',',i,'));'));
      end; 
   stop; 
   run; 
data all;
   set XL:;
   run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
