- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I'm a relatively new SAS 9.3 user, so please bear with me.
I have the following library structure:
libname jc_&jcname 'W:\JCs\JC Analyses 2016-2017\JC Data 2016-2017\Received from JCs\&jcname\JC_2016-2017_&jcname.xls';
There are 63 &jcname values, so it follows that there are 63 folders and 63 files, all with the same variables in the same order.
Can anyone please help me with extracting those files and stacking them into a single file? Had they been in the same location, this would have worked:
data all;
set JC_2016_jcA
JC_2016_jcB
JC_2016_jcC
JC_2016_jcD
JC_2016_jcE
JC_2016_jcF ...
... JC_2016_jcBJ;
run;
but, yeah, they're not...
Thanks in advance!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Don't you get an error with that filename? Such as file or path doesn't exist?
Or did you actually mean to post:
libname jc_&jcname "W:\JCs\JC Analyses 2016-2017\JC Data 2016-2017\Received from JCs\&jcname.\JC_2016-2017_&jcname..xls";
The macro variable wouldn't resolve inside single quotes and without the periods added in red above you would get a macro error because it would think that you want to use &jcnameJC_2016-2017_ as a macro variable and without the second red period the period denoting the start of the file extension would have been resolved as concatenating &jcnamexls.
You likely also need something on the libname to indicate the engine to read the file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well, firstly, no matter how properly you have defined the Excel files, or how similar they are, if you use a guessing import procedure (i.e. you don't specify data structure) from a bad data file format such as Excel, you will end up with data which is not the same. I am just mentioning this as you will encounter it going forward.
As for doing the code:
/* Import first one to get a base */ libname tmp excel "c:\folder\',JC_2016_jcA,'.xlsx"; data base' set tmp.JC_2016_jcA; run; data _null_; do f="JC_2016_jcb","JC_2016_jcc"...; call execute(cats('libname tmp excel "c:\folder\',f,'.xlsx"; proc append base=base data=tmp.',f,' force; run;')); end; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you -- I think I had kind of subconsciously ignored the potential for format disagreement across files. Do you have a suggestion for how to account for that in a procedure similar to the one you propose below?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Excel files are messy, unstructured heaps of data. Just as proc import, libname excel has to make guesses about the structure, and you will end up with all kinds of WARNINGs because of possibly truncated data.
Imagine that your first Excel file contains strings in column 'D' that have a maximum length of 12.
Now, your third Excel file has a maximum string in 'D' that's 25 bytes long, you'll lose 13 bytes of that, because the first guess used that.
Or the Excel engine goes the safe path and makes all strings 255 bytes long, wasting lots of space for nothing.
Therefore you should dump the Excel file format for such operations and switch to something that works. Have the data delivered in csv files, for instance. There you will define the structure in the data step, and it will work. Period.
Excel files are NOT, I repeat NOT, suited for such a task. Never ever. They may be nice for a one-shot test, but that'ts it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Your first sentence could not be truer! I can't stand using Excel for these purposes. Unfortunately, I work for an agency that doesn't exactly embrace change/efficiency/my sanity. Thank you, though!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The following code doesn't circumvent all the issues with Excel sources. What it does:
1. Create a list of all Excel files under a parent folder (files can be in sub-folders)
2. Import first sheet of all these Excels into separate SAS tables in Work
3. Use a SQL OUTER UNION JOIN to combine all the tables into a single table
- Using a SQL OUTER UNION will first analyse the length of all variables and choose the longest length per variable from all tables.
- This will help to avoid truncation issues.
%let sourcePath=C:\temp\JCs\JC Analysis 2016-2017\KC Data 2016-2017\Received from JCs;
%macro collectTable(file=,no=);
proc import
file="&file"
dbms=XLSX
replace
out=collectedTable&no
;
run;quit;
%mend;
/* cleanup Work from previous runs */
proc datasets lib=work nolist nowarn;
delete collectedTable:;
run;
quit;
/* import first tab from all Excel files found under &sourcePath into WORK */
filename files pipe "dir /b/s/a-d ""&sourcePath\*.xlsx""";
data _null_;
infile files truncover;
input thisFile $500.;
call execute(cats( '%collectTable(file=',thisFile,',no=',_n_,')' ));
run;
filename files;
/* consolidate imported data */
%let union_list=;
proc sql noprint;
/* 1. create macro var with SQL command for UNION Set operation for all tables */
select catx(' ','select * from',memname) into :union_list separated by ' outer union corr '
from dictionary.tables
where libname='WORK' and memname like 'COLLECTEDTABLE%'
;
/* 2. execute generated UNION SET operation */
create table consolidated as
&union_list;
quit;