BookmarkSubscribeRSS Feed
dmbuffum3
Fluorite | Level 6

Windows experts please help! Thanks in advance.

 

The following SAS code runs for more than a few days and fails to complete. It issues a recursive Windows DIR command with which I want to gather the following metadata on all the files on a large Windows fileserver.

 

  1. folder_path
  2. file_name
  3. file_extension
  4. file_owner
  5. file_size_bytes
  6. file_last_access_datetime
  7. file_last_written_datetime
  8. file_last_read_datetime

I estimate there are >5 million files on this fileserver based on the following results.

 

H:\>dir "\\fsprod109\MTKA Public\*" /a:-D /T:A /-C /Q /s | find /c /v ""

6548474

 

Here is some sample data produced by the DIR command.

 

Volume in drive \\fsprod109\MTKA Public is New32k
Volume Serial Number is 3477-FAD7

Directory of \\fsprod109\MTKA Public

01/31/2019 08:26 AM 38622 CORP\First.Last (580).webpage
<snip>
05/09/2019 04:16 PM 162 CORP\First.Last ~$EDIT MEMO FROM HABAND.docx
259 File(s) 6682831885 bytes

Directory of \\fsprod109\MTKA Public\.Trashes\1325560687\Recovered files\(A Document Being Saved By Word)

02/12/2019 04:44 PM 27372 CORP\First.Last ~WRD0000
1 File(s) 27372 bytes

Directory of \\fsprod109\MTKA Public\.Trashes\156827625\Recovered files\(A Document Being Saved By Word 44)

02/21/2019 04:03 PM 13035 CORP\First.Last ~WRL0760
1 File(s) 13035 bytes

Directory of \\fsprod109\MTKA Public\2-OfficeInstallers

06/26/2019 10:20 AM 754 CORP\First.Last Pictures - Shortcut.lnk
10/30/2017 12:48 PM 299 CORP\First.Last ServerInstallO365PPx64.xml
06/05/2017 09:40 AM 299 CORP\First.Last ServerInstallO365PPx86.xml
06/05/2017 09:40 AM 295 CORP\First.Last ServerInstallO365VPx64.xml
06/05/2017 09:40 AM 295 CORP\First.Last ServerInstallO365VPx86.xml
06/05/2017 09:40 AM 299 CORP\First.Last ServerInstallO365x64.xml
06/05/2017 09:40 AM 299 CORP\First.Last ServerInstallO365x86.xml
06/05/2017 09:40 AM 1078456 CORP\First.Last setup.exe
06/05/2017 09:40 AM 86 CORP\First.Last uninstall link.txt
9 File(s) 1081082 bytes

 

Here is the current code that I would like to run faster and complete. 

 

FILENAME pipedir pipe "dir ""\\fsprod109\MTKA Public\*"" /a:-D /T:A /-C /Q /s" lrecl=5000;

DATA WORK.fileextensions;
  DROP line regex regex_id;

  INFILE pipedir TRUNCOVER;

  IF _N_ = 1 THEN DO;
    RETAIN regex_id;
    
    regex = '/<DIR>|FILE\(S\)|DIR\(S\)/';
    
    regex_id = PRXPARSE(regex);
    IF MISSING(regex_id) THEN DO;
      PUTLOG "ERROR: Invalid regular expression " regex;
      STOP;
    END;
  END;

  INPUT line $char1000.;

  *PUT "INPUT LINE " _N_= line=;

  LENGTH directory $1000;
  RETAIN directory;

  IF line = ' '
  OR PRXMATCH(regex_id, UPCASE(line)) > 0
  OR UPCASE(LEFT(line)) IN:('VOLUME','TOTAL FILES LISTED:')
  THEN DO;
    *PUT "DIR/VOL/FILE/BYTE " _N_= line=;
    DELETE;
  END;

  ELSE if UPCASE(LEFT(line))=:'DIRECTORY OF' THEN DO;
    directory = LEFT(SUBSTR(line,INDEX(UPCASE(line),'DIRECTORY OF')+12));
    *PUT "DIRECTORY OF " _N_= line= directory=;
    DELETE;
  END;

  ELSE DO;
   *INPUT @01 file_last_access_datetime ANYDTDTM21.
          @22 file_size_bytes                   17.
          @40 file_owner                       $23.
          @63 file_name                       $256.
    ;

    file_last_access_datetime = INPUT(SUBSTR(line, 1, 21),ANYDTDTM21.);
    file_size_bytes           = INPUT(SUBSTR(line,22, 17),        17.);
    file_owner                =       SUBSTR(line,40, 23)             ;
    file_name                 =       SUBSTR(line,63,256)             ;

    file_extension = LOWCASE(SCAN(file_name, -1, "."));

    *PUT "INPUT/OUTPUT " _ALL_;

    OUTPUT;
  END;

  FORMAT file_last_access_datetime E8601DT19.;
RUN;

 

6 REPLIES 6
Reeza
Super User
Don't know how to answer your question but you've included a bunch of people's name in the post. Is that something you're allowed to publicly post?

How long does it take to get a full list just via the OS, not including SAS? Do you have to include all the switches listed for your file path and have you tried reducing that to save space?
dmbuffum3
Fluorite | Level 6
@Reeza Thanks for pointing out my mistake in posting some names. Now corrected.
SASKiwi
PROC Star

Where are you running your SAS program? On a PC or a remote SAS server? If it is your PC, check Windows Task Manager to see if there are any bottlenecks with CPU, memory, IO or network traffic usage. My suspicion is your network is what might be slowing you down but that is just a guess.

 

Also try running a test on a small directory of your file server so you can estimate more accurately how long it would take to expand the processing.

dmbuffum3
Fluorite | Level 6

The SAS workspace server is running on Windows Server 2016.

 

Perhaps it would make sense to execute a Windows DIR command, PowerShell script, or VB Script that creates an output CSV file on the server, which I could read from SAS in a subsequent step.

 

Which Windows tool would be easier to prepare and tend to execute more efficiently?

  1. Windows DIR
  2. PowerShell
  3. VB Script
  4. Other?
Reeza
Super User

The SAS program is running a DIR command command already.

 

ballardw
Super User

@dmbuffum3 wrote:

Windows experts please help! Thanks in advance.

 

The following SAS code runs for more than a few days and fails to complete. It issues a recursive Windows DIR command with which I want to gather the following metadata on all the files on a large Windows fileserver.

 

  1. folder_path
  2. file_name
  3. file_extension
  4. file_owner
  5. file_size_bytes
  6. file_last_access_datetime
  7. file_last_written_datetime
  8. file_last_read_datetime

I estimate there are >5 million files on this fileserver based on the following results.

 

I might suggest breaking this into a few smaller jobs so that you have different root directories and save each to a different data set.

dir "\\fsprod109\MTKA Public\somefolder\*"

 

That way you may get some useable data without having to wait "days".

Performance related to network drives may be fun to address. Bandwidth, permissions, priorities and the every popular network security rules may have an impact.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 728 views
  • 0 likes
  • 4 in conversation