Hello,
I have a directory with a bunch of zip files, and each of those zip files have multiple tsv files I need to read and concatenate into one dataset. They all have the same format.
So I have DirectoryX with ZipA that contains File1, File2, etc. and then there's a ZipB,ZipC,etc. that contains File1,File2 (same names but different datasets as ZipA) and they all need to be the same dataset in SAS, with ideally a variable that names the zip file it came from and the file it came from (eg ZipAFile1). The final dataset will have information that came from ZipAFile1,ZipAFile2,ZipBFile1,etc.
I've done this with a zip file that contains just one file I need to read (I've only seen ways to read a particular file from a zip), and I've added a variable that scans the filename to create a filename variable, but I'm not sure how to do that in a nested way and concatenate all datasets from all zip files I have in a directory. I believe I need some sort of macro to help loop through all the zip files and every file within the zip file. The below code is what I used to add a txt_file_name variable.
data allfiles;
length filename txt_file_name $256;
/*variables*/
retain txt_file_name;
infile 'FileDirectory\*' eov=eov filename=filename truncover
ENCODING="WLATIN1"
DLM='09'x
MISSOVER
DSD ;
input@;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -1, "\");
eov=0;delete;
end;
else input /*variables*/;
;
run;
Hello,
You can start here :
Go to https://blogs.sas.com/content/
Then enter in the search field : filename zip
27 hits are returned.
Take a look at the blogs by Chris Hemedinger. They should bring you forward.
Thanks,
Koen
Try like that:
/* prepare some data */
%let myPath = C:\Users\bart\Desktop\zipTest;
filename z zip "&myPath./zipTestA.zip" member="file1.txt";
data _null_;
file z;
put "1 2 3";
put "4 5 6";
run;
filename z zip "&myPath./zipTestA.zip" member="file2.txt";
data _null_;
file z;
put "10 20 30";
put "40 50 60";
run;
filename z zip "&myPath./zipTestB.zip" member="file1.txt";
data _null_;
file z;
put "100 200 300";
put "400 500 600";
run;
filename z zip "&myPath./zipTestB.zip" member="file2.txt";
data _null_;
file z;
put "1000 2000 3000";
put "4000 5000 6000";
run;
/* code to execute */
%macro readAll(path);
/* get the list of zip files from dir */
filename p "&path.";
data _null_;
did = dopen("p");
if did > 0 then do;
do i = 1 to dnum(did);
length name $ 64 zipList $ 1024;
name = DREAD(did, i);
if find(name, ".zip", "it") then
zipList = catx(" ", zipList, name);
end;
end;
else do;
msg=sysmsg();
put msg;
end;
call symputx("zipList", zipList, "L");
run;
%do i = 1 %to %sysfunc(countw(&zipList.,%str( )));
filename z ZIP "&path./%scan(&zipList., &i., %str( ))" member = "*";
data test_zip_&i.;
infile z;
input a b c;
file = "%scan(&zipList., &i., %str( ))";
run;
filename z clear;
%end;
%mend;
%readAll(C:\Users\bart\Desktop\zipTest);
data test_All;
set test_zip:;
run;
Bart
TSV file? Does that mean delimited text file that is using TAB as the delimiter.
You should be able to use DOPEN() and DREAD() to find the lists of files.
Then you could use that list to call a macro once for each file.
First let's make some dummy files for testing.
%let path=%sysfunc(pathname(work));
data _null_;
length filename zipfile $256 ;
do zipfile='zipa.zip','zipb.zip','zipc.zip';
zipfile=catx('/',"&path",zipfile);
do filename='file1.txt','file2.txt','file3.txt';
file out zip filevar=zipfile memvar=filename dsd dlm='09'x ;
do name='a','b','c'; put name @; end;
put;
retain i 0;
do i=i to i+3; put i @; end;
put;
end;
end;
run;
Now let's get the list of files.
data files;
length zipno 8 fileno 8 path zipfile filename $256 ;
keep zipno--filename;
path="&path";
rc=filename('dir',path);
did=dopen('dir');
do dnum=1 to dnum(did);
zipfile=dread(did,dnum);
if index(zipfile,'.') and upcase(scan(zipfile,-1,'.'))='ZIP' then do;
zipno+1;
rc=filename('zip',catx('/',path,zipfile),'zip');
did2=dopen('zip');
do dnum2=1 to dnum(did2);
filename=dread(did2,dnum2);
fileno+1;
output;
end;
rc=dclose(did2);
end;
end;
rc=dclose(did);
run;
Now let's create a macro that can take as input a zip file name and a member file name. Let's make the path an option also.
%macro read_one(zipfile,memname,path);
data next;
infile "&path/&zipfile" zip member="&memname" dsd dlm='09'x firstobs=2 truncover;
input a b c;
length zipfile filename $256 ;
zipfile="&zipfile";
filename="&memname";
run;
proc append data=next base=want force;
run;
%mend read_one;
Now let's use a data step to generate one macro call per file we found.
proc delete data=want; run;
data _null_;
set files;
call execute(cats
('%nrstr(%read_one)'
,'(',zipfile
,',',filename
,',',path
,')'
));
run;
Results:
Obs a b c zipfile filename 1 0 1 2 zipa.zip file1.txt 2 4 5 6 zipa.zip file2.txt 3 8 9 10 zipa.zip file3.txt 4 12 13 14 zipb.zip file1.txt 5 16 17 18 zipb.zip file2.txt 6 20 21 22 zipb.zip file3.txt 7 24 25 26 zipc.zip file1.txt 8 28 29 30 zipc.zip file2.txt 9 32 33 34 zipc.zip file3.txt
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.