DATA Step, Macro, Functions and more

listing all files (of all types) from all subdirectories

Accepted Solution Solved
Reply
Super Contributor
Posts: 413
Accepted Solution

listing all files (of all types) from all subdirectories

Hi,

 

In a previous question I asked how to list all files (of a certain type) from a directory and all of its subdirectories. Now what I would like is to list all files of all types. I tried to do a small modification to Patrick's code:

 

 

%macro list_files(dir);
  %local filrf rc did memcnt name i;
  %let rc=%sysfunc(filename(filrf,&dir));
  %let did=%sysfunc(dopen(&filrf));      

   %if &did eq 0 %then %do; 
    %put Directory &dir cannot be open or does not exist;
    %return;
  %end;

   %do i = 1 %to %sysfunc(dnum(&did));   

   %let name=%qsysfunc(dread(&did,&i));

     
        %if %sysfunc(findw(%qscan(&name,-1,\),.)) ne 0   %then %do;

        data _tmp;
          length dir $512 name $100;
          dir=symget("dir");
          name=symget("name");
        run;
        proc append base=want data=_tmp;
        run;quit;

      %end;
      %else %if %qscan(&name,2,.) = %then %do;        
        %list_files(&dir\&name)
      %end;

   %end;
   %let rc=%sysfunc(dclose(&did));
   %let rc=%sysfunc(filename(filrf));     

%mend list_files;

%list_files(C:\Documents and Settings\HP_Administrator\Desktop\files)

Here I omitted theextension parameter becasue I want all files of all extensions, and  the I did a slight modification to the 2nd if:

 

%if %sysfunc(findw(%qscan(&name,-1,\),.)) ne 0   %then %do;

So I want to scan the name starting fromt the right and extracting the substring until the first '\', and within this substring I want to find a '.' (dot) which signals that it is an extension. But when I ran the macro I didn't get any table at all. Tried different approaches but still nothing. Please correct me.

 

Thanks! 


Accepted Solutions
Solution
‎02-18-2017 12:19 PM
PROC Star
Posts: 7,363

Re: listing all files (of all types) from all subdirectories

I was too quick in my earlier response. I think the following will do what you want:

 

%macro list_files(dir,ext);
  %local filrf rc did memcnt name i;
  %let rc=%sysfunc(filename(filrf,&dir));
  %let did=%sysfunc(dopen(&filrf));      

   %if &did eq 0 %then %do; 
    %put Directory &dir cannot be open or does not exist;
    %return;
  %end;

   %do i = 1 %to %sysfunc(dnum(&did));   

   %let name=%qsysfunc(dread(&did,&i));

/*      %if %qupcase(%qscan(&name,-1,.)) = %upcase(&ext) %then %do;*/
      %if %qscan(&name,2,.) ne %then %do;
        %put &dir\&name;

        data _tmp;
          length dir $512 name $100;
          dir=symget("dir");
          name=symget("name");
        run;
        proc append base=want data=_tmp;
        run;quit;

      %end;
      %else %if %qscan(&name,2,.) = %then %do;        
        %list_files(&dir\&name,&ext)
      %end;

   %end;
   %let rc=%sysfunc(dclose(&did));
   %let rc=%sysfunc(filename(filrf));     

%mend list_files;
%list_files(C:\SASUniversityEdition\myfolders,*)

HTH,

Art, CEO, AnalystFinder.com

View solution in original post


All Replies
Super User
Posts: 17,831

Re: listing all files (of all types) from all subdirectories

The If condition is incorrect. To debug the macro Add some %PUT as checkpoints in your code and see how the logic is evaluating. 

 

If you check the SAS 9.4 macro appendix it has an example of what you want. 

 

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

Super Contributor
Posts: 413

Re: listing all files (of all types) from all subdirectories

Hi Reeza,

 

thanks for the links. In fact, the code that Patrick wrote to answer my previous question is a direct modification of Example1 - it is from this example that I was inspired to ask that question!

 

Now I am trying to find a modification to Patrick's code which will enable to extract all  files names and paths of all file types from all subdiretories 

PROC Star
Posts: 7,363

Re: listing all files (of all types) from all subdirectories

Only one change has to be made to the original macro. The following has the change made (with the old code commented out):

 

%macro list_files(dir,ext);
  %local filrf rc did memcnt name i;
  %let rc=%sysfunc(filename(filrf,&dir));
  %let did=%sysfunc(dopen(&filrf));      

   %if &did eq 0 %then %do; 
    %put Directory &dir cannot be open or does not exist;
    %return;
  %end;

   %do i = 1 %to %sysfunc(dnum(&did));   

   %let name=%qsysfunc(dread(&did,&i));

      %if %length/*%qupcase*/(%qscan(&name,-1,.)) gt 0 /*= %upcase(&ext)*/ %then %do;
        %put &dir\&name;

        data _tmp;
          length dir $512 name $100;
          dir=symget("dir");
          name=symget("name");
        run;
        proc append base=want data=_tmp;
        run;quit;

      %end;
      %else %if %qscan(&name,2,.) = %then %do;        
        %list_files(&dir\&name,&ext)
      %end;

   %end;
   %let rc=%sysfunc(dclose(&did));
   %let rc=%sysfunc(filename(filrf));     

%mend list_files;
%list_files(C:\SASUniversityEdition\myfolders,*)

HTH,

Art, CEO, AnalystFinder.com

Super Contributor
Posts: 413

Re: listing all files (of all types) from all subdirectories

Hi art297,

 

thanks for the code. I ran it, but it didn't go into the subdirectories.

 

From what I understand, the part of the code:

%if %length(%qscan(&name,-1,.)) gt 0 %then %do;

looks at the name of the files, and from it substrings everything fromt he right until it encounters the firsr "." - if such a substring exists than it means that there is a "." and therefore there is an extension and therefore it is a file whose name and path can be extracted

 

But what is strange is that I obtain the names of the second level subfolders (although they do not contain a ".") - but not the names of the files within these subfolders

Super Contributor
Posts: 413

Re: listing all files (of all types) from all subdirectories

Hi again art297,

 

I think that I got what I wanted. In my original code I used  %sysfunc(findw()), but what I should have used instead is the %index function.

 

With the following code:

 

%macro list_files(dir);
  %local filrf rc did memcnt name i;
  %let rc=%sysfunc(filename(filrf,&dir));
  %let did=%sysfunc(dopen(&filrf));      

   %if &did eq 0 %then %do; 
    %put Directory &dir cannot be open or does not exist;
    %return;
  %end;

   %do i = 1 %to %sysfunc(dnum(&did));   

   %let name=%qsysfunc(dread(&did,&i));

   %if %index(%qscan(&name,-1,'\'),.) gt 0   %then %do;


        data _tmp;
          length dir $512 name $100;
          dir=symget("dir");
          name=symget("name");
        run;
        proc append base=want data=_tmp;
        run;quit;

      %end;
      %else %if %qscan(&name,2,.) = %then %do;        
        %list_files(&dir\&name)
      %end;

   %end;
   %let rc=%sysfunc(dclose(&did));
   %let rc=%sysfunc(filename(filrf));     

%mend list_files;
%list_files(C:\Documents and Settings\HP_Administrator\Desktop\files)

I got the names and paths of all files within all subdirectories. 

Super User
Super User
Posts: 6,500

Re: listing all files (of all types) from all subdirectories

This is already answered on this tread.

https://communities.sas.com/t5/Base-SAS-Programming/listing-all-files-within-a-directory-and-subdire...

Note there is no need to use MACRO code to do this.

Super Contributor
Posts: 413

Re: listing all files (of all types) from all subdirectories

Hi Tom,

 

actually, that is my previous question!

 

I noticed 

Solution
‎02-18-2017 12:19 PM
PROC Star
Posts: 7,363

Re: listing all files (of all types) from all subdirectories

I was too quick in my earlier response. I think the following will do what you want:

 

%macro list_files(dir,ext);
  %local filrf rc did memcnt name i;
  %let rc=%sysfunc(filename(filrf,&dir));
  %let did=%sysfunc(dopen(&filrf));      

   %if &did eq 0 %then %do; 
    %put Directory &dir cannot be open or does not exist;
    %return;
  %end;

   %do i = 1 %to %sysfunc(dnum(&did));   

   %let name=%qsysfunc(dread(&did,&i));

/*      %if %qupcase(%qscan(&name,-1,.)) = %upcase(&ext) %then %do;*/
      %if %qscan(&name,2,.) ne %then %do;
        %put &dir\&name;

        data _tmp;
          length dir $512 name $100;
          dir=symget("dir");
          name=symget("name");
        run;
        proc append base=want data=_tmp;
        run;quit;

      %end;
      %else %if %qscan(&name,2,.) = %then %do;        
        %list_files(&dir\&name,&ext)
      %end;

   %end;
   %let rc=%sysfunc(dclose(&did));
   %let rc=%sysfunc(filename(filrf));     

%mend list_files;
%list_files(C:\SASUniversityEdition\myfolders,*)

HTH,

Art, CEO, AnalystFinder.com

Super Contributor
Posts: 413

Re: listing all files (of all types) from all subdirectories

That is even simpler!

 

So you take the name of the file, if there is anything after the "." then it is  a file name and therefore extract it. If there is nothing after the "." (which here is the same thing as there being no "." at all), then it must be a folder and therefore repeat the process for this folder.

 

Thanks!

Super User
Super User
Posts: 6,500

Re: listing all files (of all types) from all subdirectories

[ Edited ]

In general data step code will be much easier to understand than macro and certainly much easier to debug. The only complication with the data step that Roger posted is that it uses the MODIFY command which is not very much used and so could take a little understanding.  Basically the use of the MODIFY command allows the data step to append records for all of the file names in any directories it found and those records will then be processed later at the step workd its way thorugh the data file.

 

Let's take that data step and add some logic to allow you capture the filename and extension.  You could then subset the result later based on the extension, or even modify the data step to remove the records for files with extensions you don't care about.

 

The first part is to set up a dataset with the top level directory that you want to search from. Note that if you want to search multiple top level directories you could make this dataset have more than one observation.  Let's create a variable to indicate if the record is for directory or not (DIR) the full name (FULLPATH), the directory part (DIRNAME) and file within the directory (FILENAME) and the extension, if any, (EXT).  For this example we will just set a macro variable with the starting point. You can see how easy it would be to wrap this into a macro definition and make DIR the parameter to the macro.

%let dir=%sysfunc(pathname(work));
data want ;
  length dir 8 ext filename dirname $256 fullpath $512 ;
  call missing(of _all_);
  fullpath = "&dir";
run;

Now let's look at the guts of the logic that takes the top level node and scans the directory tree.

data want ;
  modify want ;
  sep='/';
  if "&sysscp"="WIN" then sep='\' ;
  rc=filename('tmp',fullpath);
  dir_id=dopen('tmp');
  dir = (dir_id ne 0) ;
  if dir then dirname=cats(fullpath,sep);
  else do;
    filename=scan(fullpath,-1,sep) ;
    dirname =substrn(fullpath,1,length(fullpath)-length(filename));
    if index(filename,'.')>1 then ext=scan(filename,-1,'.');
  end;
  replace;
  if dir then do;
    do i=1 to dnum(dir_id);
      fullpath=cats(dirname,dread(dir_id,i));
      output;
    end;
    rc=dclose(dir_id);
  end;
  rc=filename('tmp');
run;

A fiew things to note about using the MODIFY statement. (1) Any new variables introduced in the data step (like SEP, DIR_ID, RC) are not output. There is no need to add a DROP statement for them. (2) You can both replace an existing record (REPLACE statement) and add new records (OUTPUT statement). 

 

I added some logic to split out the path into DIRNAME, FILENAME and EXT components. Note that for calculating the EXT is important to first check if the filename actual has a period in it. On Linus systems for sure you can make files without any extension and also you can make files with period as the first character of the filename. Note that these are considered hidden fles by Linux and it looks like the DNUM() and DREAD() functions ignore them.

 

Also I added some conditional logic to figure out if you are running on Windows or Linux and use the appropriate slash in the file names.

 

So the logic flow of this is to first create a fileref for the current observation and then try to open that file as a directory.  We then set the variables (other than FULLPATH) based on whether it is a directory or not and replace the record in the data set with these updated values.  Then if it IS a directory we add new records for all of the files (files and sub directories) in that directory.  These will appended to the end of the dataset so that they will later be processed by the top of the datastep when the MODIFY statement reaches them.

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 281 views
  • 4 likes
  • 4 in conversation