06-21-2017 06:05 PM
I'm a relatively new SAS 9.3 user, so please bear with me.
I have the following library structure:
libname jc_&jcname 'W:\JCs\JC Analyses 2016-2017\JC Data 2016-2017\Received from JCs\&jcname\JC_2016-2017_&jcname.xls';
There are 63 &jcname values, so it follows that there are 63 folders and 63 files, all with the same variables in the same order.
Can anyone please help me with extracting those files and stacking them into a single file? Had they been in the same location, this would have worked:
data all; set JC_2016_jcA JC_2016_jcB JC_2016_jcC JC_2016_jcD JC_2016_jcE JC_2016_jcF ... ... JC_2016_jcBJ; run;
but, yeah, they're not...
Thanks in advance!!!
06-21-2017 06:57 PM
Don't you get an error with that filename? Such as file or path doesn't exist?
Or did you actually mean to post:
libname jc_&jcname "W:\JCs\JC Analyses 2016-2017\JC Data 2016-2017\Received from JCs\&jcname.\JC_2016-2017_&jcname..xls";
The macro variable wouldn't resolve inside single quotes and without the periods added in red above you would get a macro error because it would think that you want to use &jcnameJC_2016-2017_ as a macro variable and without the second red period the period denoting the start of the file extension would have been resolved as concatenating &jcnamexls.
You likely also need something on the libname to indicate the engine to read the file.
06-22-2017 04:21 AM
Well, firstly, no matter how properly you have defined the Excel files, or how similar they are, if you use a guessing import procedure (i.e. you don't specify data structure) from a bad data file format such as Excel, you will end up with data which is not the same. I am just mentioning this as you will encounter it going forward.
As for doing the code:
/* Import first one to get a base */ libname tmp excel "c:\folder\',JC_2016_jcA,'.xlsx"; data base' set tmp.JC_2016_jcA; run; data _null_; do f="JC_2016_jcb","JC_2016_jcc"...; call execute(cats('libname tmp excel "c:\folder\',f,'.xlsx"; proc append base=base data=tmp.',f,' force; run;')); end; run;
07-17-2017 06:00 PM - edited 07-17-2017 06:00 PM
Thank you -- I think I had kind of subconsciously ignored the potential for format disagreement across files. Do you have a suggestion for how to account for that in a procedure similar to the one you propose below?
06-22-2017 04:35 AM
Excel files are messy, unstructured heaps of data. Just as proc import, libname excel has to make guesses about the structure, and you will end up with all kinds of WARNINGs because of possibly truncated data.
Imagine that your first Excel file contains strings in column 'D' that have a maximum length of 12.
Now, your third Excel file has a maximum string in 'D' that's 25 bytes long, you'll lose 13 bytes of that, because the first guess used that.
Or the Excel engine goes the safe path and makes all strings 255 bytes long, wasting lots of space for nothing.
Therefore you should dump the Excel file format for such operations and switch to something that works. Have the data delivered in csv files, for instance. There you will define the structure in the data step, and it will work. Period.
Excel files are NOT, I repeat NOT, suited for such a task. Never ever. They may be nice for a one-shot test, but that'ts it.
07-17-2017 05:53 PM
Your first sentence could not be truer! I can't stand using Excel for these purposes. Unfortunately, I work for an agency that doesn't exactly embrace change/efficiency/my sanity. Thank you, though!
06-22-2017 08:01 AM
The following code doesn't circumvent all the issues with Excel sources. What it does:
1. Create a list of all Excel files under a parent folder (files can be in sub-folders)
2. Import first sheet of all these Excels into separate SAS tables in Work
3. Use a SQL OUTER UNION JOIN to combine all the tables into a single table
- Using a SQL OUTER UNION will first analyse the length of all variables and choose the longest length per variable from all tables.
- This will help to avoid truncation issues.
%let sourcePath=C:\temp\JCs\JC Analysis 2016-2017\KC Data 2016-2017\Received from JCs; %macro collectTable(file=,no=); proc import file="&file" dbms=XLSX replace out=collectedTable&no ; run;quit; %mend; /* cleanup Work from previous runs */ proc datasets lib=work nolist nowarn; delete collectedTable:; run; quit; /* import first tab from all Excel files found under &sourcePath into WORK */ filename files pipe "dir /b/s/a-d ""&sourcePath\*.xlsx"""; data _null_; infile files truncover; input thisFile $500.; call execute(cats( '%collectTable(file=',thisFile,',no=',_n_,')' )); run; filename files; /* consolidate imported data */ %let union_list=; proc sql noprint; /* 1. create macro var with SQL command for UNION Set operation for all tables */ select catx(' ','select * from',memname) into :union_list separated by ' outer union corr ' from dictionary.tables where libname='WORK' and memname like 'COLLECTEDTABLE%' ; /* 2. execute generated UNION SET operation */ create table consolidated as &union_list; quit;