BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AdamMadison
Fluorite | Level 6

It appears that my earlier post was identified as spam for some reason and deleted. So I am re-posting.

 

The work process I have  in the following program takes a zip file (NSDQsh201012.zip), processes it and outputs a data set (nasdaq.NSDQsh201012).

I would like to go through this process for many different zip files which are named by "date" (Year/Month). For example: there is NSDQsh201012.zip, NSDQsh201101.zip, NSDQsh201102.zip, .... NSDQsh202002.zip.

And the output datasets should be named accordingly. For example: nasdaq.NSDQsh201012, nasdaq.NSDQsh201101, nasdaq.NSDQsh201102, ... nasdaq.NSDQsh202002.

Ideally, I would construct a do loop that cycles through these "dates" (Year/Month) and changes the macro parameters on each iteration. 

Something like:

%do year=2010 to 2020

%do month=01 to 12

 

And I would have:  source=NSDQsh&year&month.zip.

But I am not sure how to accomplish this. Any assistance will be appreciated. 


/* create dataset with members in specific zip file */
%zipMemList(source=NSDQsh201012.zip, outds=memlist);



/* execute macro %ReadMemInZip() once per member in zip file */
data _null_;
  set memlist;
  cmd=cats('%ReadMemInZip(source=',zip,', member=',memname,', outds=want)');
  call execute(cmd);
run;

%convertdaily(source=want, outds=nasdaq.NSDQsh201012);
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The basic design I think you are looking for is a macro that process one file.  So it reads it, summarizes it and adds the summarized results to some master dataset of values.  So something like this:

%macro run_one(filename);

* read in the file ;
data this_file;
  ....

* summarize the file;
proc summary or proc sql  
to produce this_file_summary.

* append the summary to the master file.
proc append base=maste_summary data=this_file_summary;
run;

* clear up work file;
proc delete data=this_file this_file_summary;
run;
%mend run_one;

So once you get that working then the "looping" is to just use a dataset to generate one call to the macro per file.

 

View solution in original post

14 REPLIES 14
Patrick
Opal | Level 21

What's in a specific zip file? Just a single text file that's called the same name like the zip file but with extension .txt? Or something else?

AdamMadison
Fluorite | Level 6
Inside the zip files are numerous txt files. Different name than the zip (they are named NSDQsh&year&month&day).
Reeza
Super User


Tutorial on converting a working program to a macro

This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

 

What about changing your process?

You're calling zipMemlist manually, why not loop within a data step and two call executes?? Change your program that reads members to a macro as well. 

 

EDIT: Small change to format the variable correctly to YYMMN. 

 

data have;
start_date = '01Jan2010'd;

do i=0 to 119;
date = intnx('month', start_date, i, 'b');
date_formatted = put(date, yymmn6.);

str1 = catt('%zipmemList(source=NSDQSH', date_formatted, ', outds=memlist);');
call execute(str1);

*macro to read in members is called;
*change program that reads members to a macro;

end;

run;

 

 

Tom
Super User Tom
Super User

If you want to generate names like:  NSDQsh201012.zip, NSDQsh201101.zip, NSDQsh201102.zip, .... NSDQsh202002.zip. Then just look from '01DEC2010'd to '01FEB2020'd.

 

Much easier in DATA step code.

data _null_;
  start= '01DEC2010'd;
  stop='01FEB2020'd;
  do offset=0 to intck('month',start,stop);
     length name $100;
     name=cats('NSDQsh',put(intnx('month',start,offset),yymmn6.),'.zip');
 ...
  end;
run;

You could do it in macro code, but why?

 

AdamMadison
Fluorite | Level 6
Tom: How would I then pass those name values (i.e. NSDQsh201012.zip, NSDQsh201101.zip, ...) iteratively into the macro parameters (i.e. .source= , and outds=)?
Reeza
Super User
CALL EXECUTE() the same way as in your original code.
Patrick
Opal | Level 21

I believe the following code does pretty much what you're asking for.

/** create sample data **/
%macro createSampleData(dir=%sysfunc(pathname(work)));
  %do month=1 %to 4;
    %let _month=%sysfunc(putn(&month,z2.));
    filename out zip "&dir/NSDQsh2020&_month..zip";
    %do day=1 %to 30;
      %let _day=%sysfunc(putn(&day,z2.));
      data _null_;
        file out("NSDQsh2020&_month.&_day..txt");
        put 
          "a,b,c,&day" /
          "x,y,z,&day"
        ;
        stop;
      run;
    %end;
    filename out clear;
  %end;
%mend;


/** macro definitions **/

/*
  list files in a directory. Code based on:
  https://go.documentation.sas.com/?docsetId=mcrolref&docsetTarget=n0js70lrkxo6uvn1fl4a5aafnlgt.htm&docsetVersion=9.4&locale=en
*/
%macro dirlist(dir,ext,result=dir_list);
  %local filrf rc did memcnt name i;
  %let rc=%sysfunc(filename(filrf,&dir));
  %let did=%sysfunc(dopen(&filrf));   

  proc datasets lib=%scan(work.&result,-2,.) nolist nowarn;
    delete %scan(&result,-1,.);
    run; 
  quit;

   %if &did eq 0 %then %do; 
    %put Directory &dir cannot be open or does not exist;
    %return;
  %end;

   %do i = 1 %to %sysfunc(dnum(&did));   

   %let name=%qsysfunc(dread(&did,&i));

      %if %qupcase(%qscan(&name,-1,.)) = %upcase(&ext) %then %do;
        /*%put &dir\&name;*/
        data _&result;
          length dir $200 file_name $100;
          dir="&dir";
          file_name="&name";
          output;
          stop;
        run;
        proc datasets lib=%scan(work.&result,-2,.) nolist nowarn; 
          append base=%scan(&result,-1,.) data=_%scan(&result,-1,.);
          run;
          delete _%scan(&result,-1,.);
          run;
        quit;
      %end;
      %else %if %qscan(&name,2,.) = %then %do;        
        %dirlist(&dir\&name,&ext)
      %end;

   %end;
   %let rc=%sysfunc(dclose(&did));
   %let rc=%sysfunc(filename(filrf));     

%mend dirlist;

/* list members in zip file */
%macro zipMemList(source=, outds=zip_mem_list);
  /* Assign a fileref wth the ZIP method */
  filename inzip zip "&source";

  /* Read the "members" (files) from the ZIP file */
  data &outds(keep=zip memname);
   length zip $200 memname $200;
   zip="&source";
   fid=dopen("inzip");
   if fid=0 then
    stop;
   memcount=dnum(fid);
   do i=1 to memcount;
    memname=dread(fid,i);
    output;
   end;
   rc=dclose(fid);
  run;

  filename inzip clear;
%mend;


/* read member in zip file into SAS dataset */
%macro ReadMemInZip(source=, member=, outds=);
  /* Assign a fileref with the ZIP method */
  filename inzip zip "&source";

  /* Import a text file directly from the ZIP */
  data _tmp(compress=yes);
    infile inzip(&member) 
      firstobs=1 dsd dlm=',';
    input 
      (var1-var3) ($) var4;
    length source_file $ 150;
    source_file="&member";
  run;

  /* append to want dataset */
  proc append base=&outds(compress=yes) data=_tmp;
  run;quit;
  proc delete data=_tmp;
  run;quit;

  filename inzip clear;
%mend;

/* extract per zip file all the data */
%macro extract(zipfile,outds=want);
  /* extract list of members for a zip file */
  data _null_;
    length _cmd $1000;
    _cmd=cats('%',"zipMemList(source=&zipfile, outds=_zip_mem_list);");
    call execute(_cmd);
    stop;
  run;

  /* read all the members in zip file into SAS dataset */
  data _null_;
    set _zip_mem_list;
    length _cmd $1000;
    _cmd=cats('%ReadMemInZip(source=',zip,',member=',memname, ",outds=&outds);");
    call execute(_cmd);
  run;
%mend;


/** execution **/

/* define path where zip files reside */
%let source_dir=%sysfunc(pathname(work));

/* define target lib for result tables */
%let target_lib=nasdaq;
libname &target_lib "%sysfunc(pathname(work))";

/* define date range for zip file selection */
%let start_yyyymm=202002;
%let end_yyyymm=202003;

/* create sample zip files under this path */
%createSampleData(dir=&source_dir);

/* create SAS table with all zip files in folder path */
%dirlist(&source_dir,zip,result=_dir_list);

/* extract data. Create a table per source zip file */
%let start_dt=%sysfunc(inputn(&start_yyyymm,yymmn6.));
%let end_dt=%sysfunc(inputn(&end_yyyymm,yymmn6.));
data _null_;
  set _dir_list(
    where=(input(scan(scan(file_name,1,'.'),-1,,'kd'),yymmn6.) between &start_dt and &end_dt)
    );
  length _cmd $1000 _outds $41;
  _outds=catx('.',"&target_lib",scan(file_name,1,'.'));
  put _outds=;
  /* extract members list per zip file */
  _cmd=cats('%extract(zipfile=',dir,'/',file_name,',outds=',_outds,');');
  call execute(_cmd);
run;

AdamMadison
Fluorite | Level 6

 

Patrick: Your reply almost has me where I need to be. I have (I think) one last question.

 

I would like to, as a final step, sum up observations by group.

SAS.PNG

 

Using your sample data (I added an additional row of data to your code below to make my question functional), I get the above image as a final output data set. However, what I really want as a final output is a condensed version of this that sums "var4" by group "var1" and a date (one of the variables on my true data is a date. I'm just using source_file as a stand in here).

 

So my desired output would look like this for "Nsdqsh202002".

 var1source_filesum_var4
1a202002012
2x202002011
3a202002024
4x202002022
5a202002036
6x202002033

 

 

Then do the same process for the next file "Nsdqsh202003".

I just can not figure out the timing of the macros and call execute commands.

Thank you!

 

/** create sample data **/
%macro createSampleData(dir=%sysfunc(pathname(work)));
  %do month=1 %to 4;
    %let _month=%sysfunc(putn(&month,z2.));
    filename out zip "&dir/NSDQsh2020&_month..zip";
    %do day=1 %to 30;
      %let _day=%sysfunc(putn(&day,z2.));
      data _null_;
        file out("NSDQsh2020&_month.&_day..txt");
        put 
          "a,b,c,&day" /
          "x,y,z,&day" /
		  "a,t,s,&day"
        ;
        stop;
      run;
    %end;
    filename out clear;
  %end;
%mend;


/** macro definitions **/

/*
  list files in a directory. Code based on:
  https://go.documentation.sas.com/?docsetId=mcrolref&docsetTarget=n0js70lrkxo6uvn1fl4a5aafnlgt.htm&docsetVersion=9.4&locale=en
*/
%macro dirlist(dir,ext,result=dir_list);
  %local filrf rc did memcnt name i;
  %let rc=%sysfunc(filename(filrf,&dir));
  %let did=%sysfunc(dopen(&filrf));   

  proc datasets lib=%scan(work.&result,-2,.) nolist nowarn;
    delete %scan(&result,-1,.);
    run; 
  quit;

   %if &did eq 0 %then %do; 
    %put Directory &dir cannot be open or does not exist;
    %return;
  %end;

   %do i = 1 %to %sysfunc(dnum(&did));   

   %let name=%qsysfunc(dread(&did,&i));

      %if %qupcase(%qscan(&name,-1,.)) = %upcase(&ext) %then %do;
        /*%put &dir\&name;*/
        data _&result;
          length dir $200 file_name $100;
          dir="&dir";
          file_name="&name";
          output;
          stop;
        run;
        proc datasets lib=%scan(work.&result,-2,.) nolist nowarn; 
          append base=%scan(&result,-1,.) data=_%scan(&result,-1,.);
          run;
          delete _%scan(&result,-1,.);
          run;
        quit;
      %end;
      %else %if %qscan(&name,2,.) = %then %do;        
        %dirlist(&dir\&name,&ext)
      %end;

   %end;
   %let rc=%sysfunc(dclose(&did));
   %let rc=%sysfunc(filename(filrf));     

%mend dirlist;

/* list members in zip file */
%macro zipMemList(source=, outds=zip_mem_list);
  /* Assign a fileref wth the ZIP method */
  filename inzip zip "&source";

  /* Read the "members" (files) from the ZIP file */
  data &outds(keep=zip memname);
   length zip $200 memname $200;
   zip="&source";
   fid=dopen("inzip");
   if fid=0 then
    stop;
   memcount=dnum(fid);
   do i=1 to memcount;
    memname=dread(fid,i);
    output;
   end;
   rc=dclose(fid);
  run;

  filename inzip clear;
%mend;


/* read member in zip file into SAS dataset */
%macro ReadMemInZip(source=, member=, outds=);
  /* Assign a fileref with the ZIP method */
  filename inzip zip "&source";

  /* Import a text file directly from the ZIP */
  data _tmp(compress=yes);
    infile inzip(&member) 
      firstobs=1 dsd dlm=',';
    input 
      (var1-var3) ($) var4;
    length source_file $ 150;
    source_file="&member";
  run;

  /* append to want dataset */
  proc append base=&outds(compress=yes) data=_tmp;
  run;quit;
  proc delete data=_tmp;
  run;quit;

  filename inzip clear;
%mend;

/* extract per zip file all the data */
%macro extract(zipfile,outds=want);
  /* extract list of members for a zip file */
  data _null_;
    length _cmd $1000;
    _cmd=cats('%',"zipMemList(source=&zipfile, outds=_zip_mem_list);");
    call execute(_cmd);
    stop;
  run;

  /* read all the members in zip file into SAS dataset */
  data _null_;
    set _zip_mem_list;
    length _cmd $1000;
    _cmd=cats('%ReadMemInZip(source=',zip,',member=',memname, ",outds=&outds);");
    call execute(_cmd);
  run;
%mend;


/** execution **/

/* define path where zip files reside */
%let source_dir=%sysfunc(pathname(work));

/* define target lib for result tables */
%let target_lib=nasdaq;
libname &target_lib "%sysfunc(pathname(work))";

/* define date range for zip file selection */
%let start_yyyymm=202002;
%let end_yyyymm=202003;

/* create sample zip files under this path */
%createSampleData(dir=&source_dir);

/* create SAS table with all zip files in folder path */
%dirlist(&source_dir,zip,result=_dir_list);

/* extract data. Create a table per source zip file */
%let start_dt=%sysfunc(inputn(&start_yyyymm,yymmn6.));
%let end_dt=%sysfunc(inputn(&end_yyyymm,yymmn6.));
data _null_;
  set _dir_list(
    where=(input(scan(scan(file_name,1,'.'),-1,,'kd'),yymmn6.) between &start_dt and &end_dt)
    );
  length _cmd $1000 _outds $41;
  _outds=catx('.',"&target_lib",scan(file_name,1,'.'));
  put _outds=;
  /* extract members list per zip file */
  _cmd=cats('%extract(zipfile=',dir,'/',file_name,',outds=',_outds,');');
  call execute(_cmd);
run;

 

 

AdamMadison
Fluorite | Level 6

After experimenting with different techniques, I unfortunately still can't get this solved. But I'm so close thanks to all of your help. Smiley Happy

 

I can produce all the datasets and then subsequently summarize each one. However the size of each produced datasets makes it impossible to have all of them on my hard drive at once. Therefore it is necessary to summarize each one first before moving on to the next file.

 

I would think just adding an additional call execute (PROC SQL......) at the end of the program would do the trick. But the Proc SQL statement only runs after everything else is complete. Which defeats the purpose

 How can I get the timing of the commands to correspond appropriately?

 

Thanks

Tom
Super User Tom
Super User

The basic design I think you are looking for is a macro that process one file.  So it reads it, summarizes it and adds the summarized results to some master dataset of values.  So something like this:

%macro run_one(filename);

* read in the file ;
data this_file;
  ....

* summarize the file;
proc summary or proc sql  
to produce this_file_summary.

* append the summary to the master file.
proc append base=maste_summary data=this_file_summary;
run;

* clear up work file;
proc delete data=this_file this_file_summary;
run;
%mend run_one;

So once you get that working then the "looping" is to just use a dataset to generate one call to the macro per file.

 

AdamMadison
Fluorite | Level 6

Tom: That is exactly what I want.

I have the middle part ready to go. 

The issue I have is calling the files to pass through into the macro.

 

I have .zip files, which themselves contain .txt files. How can I get the first .txt file in the first .zip to pass through into this. Then the second .txt file in the first .zip....... then the last .txt file in the last .zip.

 

Thinking this through, I think if I had a dataset that had a list of all the zip files and their corresponding .txt contents, then I could make something like this the first step of the macro.

 

%macro run_one(source= , member= );


* Assign a fileref with the ZIP method;
filename inzip zip "&source";


*Import a text file directly from the ZIP;
data _tmp(compress=yes);
   infile inzip(&member)
     firstobs=1 dsd dlm=',';
   input
     (var1-var3) ($) var4;
run;




* read in the file ;
data this_file;
  ....

* summarize the file;
proc summary or proc sql  
to produce this_file_summary.

* append the summary to the master file.
proc append base=maste_summary data=this_file_summary;
run;

* clear up work file;
proc delete data=this_file this_file_summary;
run;
%mend run_one;

 

Do you think this could work? And if yes, do you know how I could I could create the dataset that has a list of all my zip files and their corresponding .txt contents?

 

 

Patrick
Opal | Level 21

Macro %ReadMemInZip() in the code I've posted earlier reads the data per zip file and then appends it to an output dataset which collects the results from all iterations.

The only thing you would have to change in this macro to only collect summary data, is to aggregate the data (using Proc Summary) and then append the aggregated data.

Patrick
Opal | Level 21

Below the whole sample code with the amended macro %ReadMemInZip() now collecting aggregated data.

/** create sample data **/
%macro createSampleData(dir=%sysfunc(pathname(work)));
  %do month=1 %to 4;
    %let _month=%sysfunc(putn(&month,z2.));
    filename out zip "&dir/NSDQsh2020&_month..zip";
    %do day=1 %to 30;
      %let _day=%sysfunc(putn(&day,z2.));
      data _null_;
        file out("NSDQsh2020&_month.&_day..txt");
        put 
          "a,b,c,&day" /
          "x,y,z,&day" /
		  "a,t,s,&day"
        ;
        stop;
      run;
    %end;
    filename out clear;
  %end;
%mend;


/** macro definitions **/

/*
  list files in a directory. Code based on:
  https://go.documentation.sas.com/?docsetId=mcrolref&docsetTarget=n0js70lrkxo6uvn1fl4a5aafnlgt.htm&docsetVersion=9.4&locale=en
*/
%macro dirlist(dir,ext,result=dir_list);
  %local filrf rc did memcnt name i;
  %let rc=%sysfunc(filename(filrf,&dir));
  %let did=%sysfunc(dopen(&filrf));   

  proc datasets lib=%scan(work.&result,-2,.) nolist nowarn;
    delete %scan(&result,-1,.);
    run; 
  quit;

   %if &did eq 0 %then %do; 
    %put Directory &dir cannot be open or does not exist;
    %return;
  %end;

   %do i = 1 %to %sysfunc(dnum(&did));   

   %let name=%qsysfunc(dread(&did,&i));

      %if %qupcase(%qscan(&name,-1,.)) = %upcase(&ext) %then %do;
        /*%put &dir\&name;*/
        data _&result;
          length dir $200 file_name $100;
          dir="&dir";
          file_name="&name";
          output;
          stop;
        run;
        proc datasets lib=%scan(work.&result,-2,.) nolist nowarn; 
          append base=%scan(&result,-1,.) data=_%scan(&result,-1,.);
          run;
          delete _%scan(&result,-1,.);
          run;
        quit;
      %end;
      %else %if %qscan(&name,2,.) = %then %do;        
        %dirlist(&dir\&name,&ext)
      %end;

   %end;
   %let rc=%sysfunc(dclose(&did));
   %let rc=%sysfunc(filename(filrf));     

%mend dirlist;

/* list members in zip file */
%macro zipMemList(source=, outds=zip_mem_list);
  /* Assign a fileref wth the ZIP method */
  filename inzip zip "&source";

  /* Read the "members" (files) from the ZIP file */
  data &outds(keep=zip memname);
   length zip $200 memname $200;
   zip="&source";
   fid=dopen("inzip");
   if fid=0 then
    stop;
   memcount=dnum(fid);
   do i=1 to memcount;
    memname=dread(fid,i);
    output;
   end;
   rc=dclose(fid);
  run;

  filename inzip clear;
%mend;

/* read member in zip file into SAS dataset */
%macro ReadMemInZip(source=, member=, outds=);
  /* Assign a fileref with the ZIP method */
  filename inzip zip "&source";

  /* Import a text file directly from the ZIP */
  data _tmp(compress=yes);
    infile inzip(&member) 
      firstobs=1 dsd dlm=',';
    input 
      (var1-var3) ($) var4;
    length source_file $ 150;
    source_file="&member";
  run;

  proc sql;
    create table _tmp_sum as
      select 
        var1,
        source_file,
        sum(var4) as sum_var4
      from _tmp
      group by var1, source_file
      ;
  quit;

  proc delete data=_tmp;
  run;quit;

  /* append to want dataset */
  proc append base=&outds(compress=yes) data=_tmp_sum;
  run;quit;

  proc delete data=_tmp_sum;
  run;quit;

  filename inzip clear;
%mend;

/* extract per zip file all the data */
%macro extract(zipfile,outds=want);
  /* extract list of members for a zip file */
  data _null_;
    length _cmd $1000;
    _cmd=cats('%',"zipMemList(source=&zipfile, outds=_zip_mem_list);");
    call execute(_cmd);
    stop;
  run;

  /* read all the members in zip file into SAS dataset */
  data _null_;
    set _zip_mem_list;
    length _cmd $1000;
    _cmd=cats('%ReadMemInZip(source=',zip,',member=',memname, ",outds=&outds);");
    call execute(_cmd);
  run;
%mend;


/** execution **/

/* define path where zip files reside */
%let source_dir=%sysfunc(pathname(work));

/* define target lib for result tables */
%let target_lib=nasdaq;
libname &target_lib "%sysfunc(pathname(work))";

/* define date range for zip file selection */
%let start_yyyymm=202002;
%let end_yyyymm=202003;

/* create sample zip files under this path */
%createSampleData(dir=&source_dir);

/* create SAS table with all zip files in folder path */
%dirlist(&source_dir,zip,result=_dir_list);

/* extract data. Create a table per source zip file */
%let start_dt=%sysfunc(inputn(&start_yyyymm,yymmn6.));
%let end_dt=%sysfunc(inputn(&end_yyyymm,yymmn6.));
data _null_;
  set _dir_list(
    where=(input(scan(scan(file_name,1,'.'),-1,,'kd'),yymmn6.) between &start_dt and &end_dt)
    );
  length _cmd $1000 _outds $41;
  _outds=catx('.',"&target_lib",scan(file_name,1,'.'));
  put _outds=;
  /* extract members list per zip file */
  _cmd=cats('%extract(zipfile=',dir,'/',file_name,',outds=',_outds,');');
  call execute(_cmd);
run;

 

 

AdamMadison
Fluorite | Level 6

I finally got what I need using bits and pieces from literally everyone's suggestions. And I learned a lot in the process. So thank you all very much!

 

/** create sample data **/
%macro createSampleData(dir=%sysfunc(pathname(work)));
  %do month=1 %to 4;
    %let _month=%sysfunc(putn(&month,z2.));
    filename out zip "&dir/NSDQsh2020&_month..zip";
    %do day=1 %to 30;
      %let _day=%sysfunc(putn(&day,z2.));
      data _null_;
        file out("NSDQsh2020&_month.&_day..txt");
        put 
          "a,b,c,&day, 2020&_month.&_day" /
          "x,y,z,&day, 2020&_month.&_day" /
		  "a,t,s,&day, 2020&_month.&_day"
        ;
        stop;
      run;
    %end;
    filename out clear;
  %end;
%mend;

%macro dirlist(dir,ext,result=dir_list);
/*Gets the directory location and file name for every ZIP in the directory*/

  %local filrf rc did memcnt name i;
  %let rc=%sysfunc(filename(filrf,&dir));
  %let did=%sysfunc(dopen(&filrf));   

  proc datasets lib=%scan(work.&result,-2,.) nolist nowarn;
    delete %scan(&result,-1,.);
    run; 
  quit;

   %if &did eq 0 %then %do; 
    %put Directory &dir cannot be open or does not exist;
    %return;
  %end;

   %do i = 1 %to %sysfunc(dnum(&did));   

   %let name=%qsysfunc(dread(&did,&i));

      %if %qupcase(%qscan(&name,-1,.)) = %upcase(&ext) %then %do;
        /*%put &dir\&name;*/
        data _&result;
          length dir $200 file_name $100;
          dir="&dir";
          file_name="&name";
          output;
          stop;
        run;
        proc datasets lib=%scan(work.&result,-2,.) nolist nowarn; 
          append base=%scan(&result,-1,.) data=_%scan(&result,-1,.);
          run;
          delete _%scan(&result,-1,.);
          run;
        quit;
      %end;
      %else %if %qscan(&name,2,.) = %then %do;        
        %dirlist(&dir\&name,&ext)
      %end;

   %end;
   %let rc=%sysfunc(dclose(&did));
   %let rc=%sysfunc(filename(filrf));     

%mend dirlist;


%macro zipMemList(source=, outds=zip_mem_list);
/*Gets the file path names for all the ZIP files in the directory and the names of the .txt file contents of the ZIP files.
Produces dataset: _zip_mem_list which contains zip(the file path name) and memname (the txt contents).
*/

  /* Assign a fileref wth the ZIP method */
  filename inzip zip "&source";

  /* Read the "members" (files) from the ZIP file */
  data _tmp(keep=zip memname);
   length zip $200 memname $200;
   zip="&source";
   fid=dopen("inzip");
   if fid=0 then
    stop;
   memcount=dnum(fid);
   do i=1 to memcount;
    memname=dread(fid,i);
    output;
   end;
   rc=dclose(fid);
  run;

  filename inzip clear;

/*Append to get info for every ZIP and every .txt file*/
   proc append base=&outds data=_tmp;
  run;quit;
  proc delete data=_tmp;
  run;quit;
%mend;



%macro processData(source= , member=, outds= );
/*Imports data from each txt file and summarizes. Then appends to create a master file of summary data over all txt files*/

filename inzip zip "&source";

*Import a text file directly from the ZIP;
data _tmp(compress=yes);
   infile inzip(&member)
     firstobs=1 dsd dlm=',';
   input
     (var1-var3) ($) var4 var5;
run;

*Summarize the data;
proc sql;
	create table _tmp2 as
	select var1, var5, sum(var4) as sum_4
	from _tmp
	group by  var1, var5;
quit;

*Append summarized dataset and delete original;
   proc append base=&outds data=_tmp2;
  run;quit;
  proc delete data=_tmp2;
  run;quit;
%mend processData;



/** execution **/

/* define path where zip files reside */
%let source_dir=%sysfunc(pathname(work));

/* define target lib for result tables */
%let target_lib=nasdaq;
libname &target_lib "%sysfunc(pathname(work))";


/* create sample zip files under this path */
%createSampleData(dir=&source_dir);

/* create dataset that contains directory path names and zip file names */
%dirlist(&source_dir,zip,result=_dir_list);



/*create dataset that contains zip file pathnames and .txt file contents*/
data _null_;
set _dir_list;
length _name $1000;
length _cmd $1000;
_name=cats(dir,'/',file_name);
call execute('%zipMemList(source=' ||strip(_name)|| ', outds=_zip_mem_list);');
run;



/*Import each txt file and summarize the data. Append to create master dataset of summarized data across all txt files in all ZIPs*/
data _null_;
set _zip_mem_list;
call execute('%processData(source=' ||strip(zip)|| ' ,member=' ||strip(memname)|| ',outds=summarized_data);');
run;

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1349 views
  • 4 likes
  • 4 in conversation