BookmarkSubscribeRSS Feed
Michele_E76
Fluorite | Level 6

I have a macro that I saw posted by @AllanBowe with an mp_dirlist macro, which I will list below. The macro works beautifully as intended to cleanup files older than 90 days (I modified the original post's 5 day limit). However, I have a shared folder where we store SAS logs that has almost 300K files in it and it takes forever just to run the macro to get the list of file names. Is there a way we can limit the macro to only retrieve the files older than x days before we go to the next statement? That way I could say, delete all files greater than 2 years, then greater than 1 year, and work my way down, versus trying to list everything all at once.

 

Here is what I have, courtesy of Allan, using the mp_dirlist macro located below:

 

/**
  @file
  @brief Returns all files and subdirectories within a specified parent
  @details Not OS specific (uses dopen / dread).  It does not appear to be
    possible to reliably identify unix directories, and so a recursive
    option is not available.
  usage:
      %mp_dirlist(path=/some/location,outds=myTable);
  @param path= for which to return contents
  @param outds= the output dataset to create
  @returns outds contains the following variables:
   - file_or_folder (file / folder)
   - filepath (path/to/file.name)
   - filename (just the file name)
   - ext (.extension)
   - msg (system message if any issues)
  @version 9.2
  @author Allan Bowe
**/

%macro mp_dirlist(path=%sysfunc(pathname(work))
    , outds=work.mp_dirlist
)/*/STORE SOURCE*/;
data &outds (compress=no keep=file_or_folder filepath filename ext msg);
  length filepath $500 fref $8 file_or_folder $6 filename $80 ext $20 msg $200;
  rc = filename(fref, "&path");
  if rc = 0 then do;
     did = dopen(fref);
     if did=0 then do;
        putlog "NOTE: This directory is empty - &path";
        msg=sysmsg();
        put _all_;
        stop;
     end;
     rc = filename(fref);
  end;
  else do;
    msg=sysmsg();
    put _all_;
    stop;
  end;
  dnum = dnum(did);
  do i = 1 to dnum;
    filename = dread(did, i);
    fid = mopen(did, filename);
    if fid > 0 then do;
      file_or_folder='file  ';
      ext = prxchange('s/.*\.{1,1}(.*)/$1/', 1, filename);
      if filename = ext then ext = ' ';
    end;
    else do;
      ext='';
      file_or_folder='folder';
    end;
    filepath="&path/"!!filename;
    output;
  end;
  rc = dclose(did);
  stop;
run;
%mend;

 

I want to find a way to limit the results above before we get to the part below, especially for directories that have been saving log files since we migrated to Linux in 2020.

 

data _null_;
  set work.mp_dirlist;
  drop rc fid close;
  format modified_dttm datetime19.;
  rc=filename("fref",filepath);
  fid=fopen("fref");
  if fid>0 then do;
    modified=finfo(fid,"Last Modified");
    modified_dttm=input(modified,anydtdtm24.);
  end;
  close=fclose(fid);  /* 60secs * 60 mins * 24 hours * 90 days */
  if modified_dttm>0 and datetime()-modified_dttm > (60*60*24*90) /* and ext='???'*/ then do;
    putlog 'deleting' filename;
    rc=fdelete("fref");
  end;
run;

 

 Any thoughts would be greatly appreciated!

4 REPLIES 4
Kurt_Bremser
Super User

The DREAD function must be used to retrieve all entries in this way, only after you have a name would you be able to retrieve file metadata to decide if you will continue for a given file.

Bite the bullet and wait for the macro to finish, if you must do it from within SAS.

A better way to handle such issues is a UNIX shell script using the find command. If the shared resource is located on a UNIX server, of course.

Such operations should always be done locally on a file server, not over a network mount.

Michele_E76
Fluorite | Level 6

The files themselves are on a Windows shared drive but we use Linux for SAS grid to read/write to the shared drive. I know the initial phase of cleanup will take a bit and I am halfway through them now, so that's progress. However, I want to have a system in place that will scan daily, weekly or monthly and clean them up before that location gets too cluttered again. We store all our SAS Logs here and trying to troubleshoot takes forever just to search through the files because there are so many. That is the main reason why I am going about it in SAS versus just deleting them manually.

Kurt_Bremser
Super User

Doing the selection of files in a programming environment you are familiar with makes sense, so doing it in SAS is OK.

The performance will be not as good as anything you do natively/locally (Windows shell script on the share server), but once the process is established and runs automatically in batch, this is not an issue.

Oligolas
Barite | Level 11

Hi,

 

it might be possible to retrieve the files using the pipe command which would be much faster.

(untested)

%let path=; 
filename files pipe "ls -haltr *.* ""%unquote(&path.)"" " lrecl=32767;  

data test; 
infile files truncover; 
input files $char1000.; 
put files=; 
run; 
________________________

- Cheers -

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 436 views
  • 2 likes
  • 3 in conversation