08-30-2017 10:30 AM
I’m trying to import a bunch of excel files (~30 odd files, with over 10 tabs in each.) into SAS using proc import , and I’d then append them into a single SAS file for my analysis. However, I’m facing a few challenges in putting this entire import into a SAS macro because the work sheet names in each of the excel files are different.All of them have the same 2-3 words at the beginning, but can have very different names towards the end.
I was wondering if there’s a work around for this? Can we use some function/ operation similar to substr where I can select the worksheets/tabs that start with “Cost Cg PRCH” and then import them?
Please do let me know if we can thing about an alternative.
Thanks a lot.
08-30-2017 10:39 AM
Assuming the latest version of SAS you can try this method:
libname myFiles XLSX 'path to my excel file.xlsx'; proc copy in=myFiles out=WORK; run;quit; libname myFiles;
08-31-2017 04:08 AM
Thank you for your reply. But ,I am in doubt how it will help me in getting conditional sheets of the excel . So, If I explain it in more details I have 2 requirements::
1. I have to import multiple excel files.
2. Each excel file have multiple sheets with diffeent names . But, all the files have some sheets having names starting with "Cost Cg PRCH" . I want to import only those sheets of all the excel files . Per my understanding some filter or conditional statement needed to take the specific sheets.
08-31-2017 10:33 AM
Per my understanding some filter or conditional statement needed to take the specific sheets.
Import all the worksheets and delete what you don't want instead.
Regarding importing multiple files you're likely going to need to turn that into a macro or copy and paste it a few times, depending on what you're comfortable with.
08-30-2017 04:11 PM
I suggest taking a close look at all of the datasets created from multiple Excel sheets. Probability approaches unity that at least one variable will have a different data type or length that will cause issues when appending the data.
08-31-2017 11:02 AM - edited 08-31-2017 11:04 AM
There are many posts on this sites about how to deal with reading multiple files and/or multiple sheets. Just combine the techniques.
So first get the list of files you need to read. If you do not have the list then for me the easist is to use operating system command to get the filenames into a dataset, but if your SAS installation doesn't allow using PIPE engine there are SAS functions that can do that also.
/* %let path=/path/to/excel/files; */ %let path=c:\path\to\excel\files ; data files ; /* infile "cd &path ; ls -d *.xlsx" pipe truncover ; */ infile "dir /b &path\*.xlsx" pipe truncover ; input filename $256. ; run;
Once you have the list of filenames then you can assign a libref to each one. Might be easiest to use the LIBNAME() function to do that, but you could also use some type of code generatation technique (macros, call execute(), or combination of PUT and %INCLUDE).
data librefs; length libref $8 ; set files ; libref = 'XLSX' || put(_n_,Z4.); rc = libname(libref,"&path/"||filename,'xlsx'); run;
Now you can query the metadata about what members exists in the libraries you created.
proc sql noprint ; select catx('.',libname,nliteral(memname)) into :memlist separated by ' ' from dictionary.members where libname like 'XLSX%' and memname like 'COST CG PRCH%' ; quit;
And then use the list of member names in your data step.
data want ; set &memlist ; run;
But as others have pointed out that unless there are strict controls on the generation of these EXCEL sheets the odds are almost 100% that when you try to read from multiple EXCEL sheets you will get conflicting variable lengths ($10 vs $20 for example) or even conflicting variable types (num vs char) or names (VisitDate vs VisDT). You can analyze other metadata about the sheets to see if there are conflicts and adapt your combination step(s) to adjust, if possible, or at least report on the conflicts.