BookmarkSubscribeRSS Feed
sashelppls
Calcite | Level 5

Hello,

 

I have a directory with a bunch of zip files, and each of those zip files have multiple tsv files I need to read and concatenate into one dataset. They all have the same format.

 

So I have DirectoryX with ZipA that contains File1, File2, etc. and then there's a ZipB,ZipC,etc. that contains File1,File2 (same names but different datasets as ZipA) and they all need to be the same dataset in SAS, with ideally a variable that names the zip file it came from and the file it came from (eg ZipAFile1). The final dataset will have information that came from ZipAFile1,ZipAFile2,ZipBFile1,etc.

 

I've done this with a zip file that contains just one file I need to read (I've only seen ways to read a particular file from a zip), and I've added a variable that scans the filename to create a filename variable, but I'm not sure how to do that in a nested way and concatenate all datasets from all zip files I have in a directory. I believe I need some sort of macro to help loop through all the zip files and every file within the zip file. The below code is what I used to add a txt_file_name variable.

data allfiles;
 

length filename txt_file_name $256;
/*variables*/
 

retain txt_file_name;
 

infile 'FileDirectory\*' eov=eov filename=filename truncover
        ENCODING="WLATIN1"
        DLM='09'x
        MISSOVER
        DSD ;

input@;
 

if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -1, "\");
eov=0;delete;
end;
 

else input /*variables*/;

 
;
run;
3 REPLIES 3
sbxkoenk
SAS Super FREQ

Hello,

 

You can start here :

Go to https://blogs.sas.com/content/

Then enter in the search field : filename zip

27 hits are returned.

Take a look at the blogs by Chris Hemedinger. They should bring you forward.

 

Thanks,

Koen

yabwon
Onyx | Level 15

Try like that:


/* prepare some data */
%let myPath = C:\Users\bart\Desktop\zipTest;

filename z zip "&myPath./zipTestA.zip" member="file1.txt";
data _null_;
  file z;
  put "1 2 3";
  put "4 5 6";
run;

filename z zip "&myPath./zipTestA.zip" member="file2.txt";
data _null_;
  file z;
  put "10 20 30";
  put "40 50 60";
run;

filename z zip "&myPath./zipTestB.zip" member="file1.txt";
data _null_;
  file z;
  put "100 200 300";
  put "400 500 600";
run;

filename z zip "&myPath./zipTestB.zip" member="file2.txt";
data _null_;
  file z;
  put "1000 2000 3000";
  put "4000 5000 6000";
run;



/* code to execute */


%macro readAll(path);

  /* get the list of zip files from dir */
  filename p "&path.";
  data _null_;
     did = dopen("p"); 
     if did > 0 then do; 
      do i = 1 to dnum(did);
        length name $ 64 zipList $ 1024;
        name = DREAD(did, i);
        if find(name, ".zip", "it") then  
          zipList = catx(" ", zipList, name);
      end;
     end; 
     else do; 
        msg=sysmsg(); 
        put msg; 
     end;
    call symputx("zipList", zipList, "L");
  run;

  %do i = 1 %to %sysfunc(countw(&zipList.,%str( )));
    filename z ZIP "&path./%scan(&zipList., &i., %str( ))" member = "*";

    data test_zip_&i.;
      infile z;
      input a b c;
      file = "%scan(&zipList., &i., %str( ))";
    run;

    filename z clear;
  %end;
%mend;

%readAll(C:\Users\bart\Desktop\zipTest);

data test_All;
  set test_zip:;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Tom
Super User Tom
Super User

TSV file?  Does that mean delimited text file that is using TAB as the delimiter.

You should be able to use DOPEN() and DREAD() to find the lists of files.

Then you could use that list to call a macro once for each file.

First let's make some dummy files for testing.

%let path=%sysfunc(pathname(work));

data _null_;
  length filename zipfile $256 ;
  do zipfile='zipa.zip','zipb.zip','zipc.zip';
    zipfile=catx('/',"&path",zipfile);
    do filename='file1.txt','file2.txt','file3.txt';
       file out zip filevar=zipfile memvar=filename dsd dlm='09'x ;
       do name='a','b','c'; put name @; end;
       put;
       retain i 0;
       do i=i to i+3; put i @; end;
       put;
    end;
  end;
run;

Now let's get the list of files.

data files;
  length zipno 8 fileno 8 path zipfile filename $256 ;
  keep zipno--filename;
  path="&path";
  rc=filename('dir',path);
  did=dopen('dir');
  do dnum=1 to dnum(did);
     zipfile=dread(did,dnum);
     if index(zipfile,'.') and upcase(scan(zipfile,-1,'.'))='ZIP' then do;
       zipno+1;
       rc=filename('zip',catx('/',path,zipfile),'zip');
       did2=dopen('zip');
       do dnum2=1 to dnum(did2);
         filename=dread(did2,dnum2);
         fileno+1;
         output;
       end;
       rc=dclose(did2);
    end;
  end;
  rc=dclose(did);
run;

Now let's create a macro that can take as input a zip file name and a member file name. Let's make the path an option also.

%macro read_one(zipfile,memname,path);
data next;
  infile "&path/&zipfile" zip member="&memname" dsd dlm='09'x firstobs=2 truncover;
  input a b c;
  length zipfile filename $256 ;
  zipfile="&zipfile";
  filename="&memname";
run;
proc append data=next base=want force;
run;
%mend read_one;

Now let's use a data step to generate one macro call per file we found.


proc delete data=want; run;
data _null_;
  set files;
  call execute(cats
 ('%nrstr(%read_one)'
  ,'(',zipfile
  ,',',filename
  ,',',path
  ,')'
  ));
run;

Results:

Obs     a     b     c    zipfile     filename

 1      0     1     2    zipa.zip    file1.txt
 2      4     5     6    zipa.zip    file2.txt
 3      8     9    10    zipa.zip    file3.txt
 4     12    13    14    zipb.zip    file1.txt
 5     16    17    18    zipb.zip    file2.txt
 6     20    21    22    zipb.zip    file3.txt
 7     24    25    26    zipc.zip    file1.txt
 8     28    29    30    zipc.zip    file2.txt
 9     32    33    34    zipc.zip    file3.txt

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 687 views
  • 1 like
  • 4 in conversation