Windows experts please help! Thanks in advance.
The following SAS code runs for more than a few days and fails to complete. It issues a recursive Windows DIR command with which I want to gather the following metadata on all the files on a large Windows fileserver.
I estimate there are >5 million files on this fileserver based on the following results.
H:\>dir "\\fsprod109\MTKA Public\*" /a:-D /T:A /-C /Q /s | find /c /v ""
6548474
Here is some sample data produced by the DIR command.
Volume in drive \\fsprod109\MTKA Public is New32k
Volume Serial Number is 3477-FAD7
Directory of \\fsprod109\MTKA Public
01/31/2019 08:26 AM 38622 CORP\First.Last (580).webpage
<snip>
05/09/2019 04:16 PM 162 CORP\First.Last ~$EDIT MEMO FROM HABAND.docx
259 File(s) 6682831885 bytes
Directory of \\fsprod109\MTKA Public\.Trashes\1325560687\Recovered files\(A Document Being Saved By Word)
02/12/2019 04:44 PM 27372 CORP\First.Last ~WRD0000
1 File(s) 27372 bytes
Directory of \\fsprod109\MTKA Public\.Trashes\156827625\Recovered files\(A Document Being Saved By Word 44)
02/21/2019 04:03 PM 13035 CORP\First.Last ~WRL0760
1 File(s) 13035 bytes
Directory of \\fsprod109\MTKA Public\2-OfficeInstallers
06/26/2019 10:20 AM 754 CORP\First.Last Pictures - Shortcut.lnk
10/30/2017 12:48 PM 299 CORP\First.Last ServerInstallO365PPx64.xml
06/05/2017 09:40 AM 299 CORP\First.Last ServerInstallO365PPx86.xml
06/05/2017 09:40 AM 295 CORP\First.Last ServerInstallO365VPx64.xml
06/05/2017 09:40 AM 295 CORP\First.Last ServerInstallO365VPx86.xml
06/05/2017 09:40 AM 299 CORP\First.Last ServerInstallO365x64.xml
06/05/2017 09:40 AM 299 CORP\First.Last ServerInstallO365x86.xml
06/05/2017 09:40 AM 1078456 CORP\First.Last setup.exe
06/05/2017 09:40 AM 86 CORP\First.Last uninstall link.txt
9 File(s) 1081082 bytes
Here is the current code that I would like to run faster and complete.
FILENAME pipedir pipe "dir ""\\fsprod109\MTKA Public\*"" /a:-D /T:A /-C /Q /s" lrecl=5000;
DATA WORK.fileextensions;
DROP line regex regex_id;
INFILE pipedir TRUNCOVER;
IF _N_ = 1 THEN DO;
RETAIN regex_id;
regex = '/<DIR>|FILE\(S\)|DIR\(S\)/';
regex_id = PRXPARSE(regex);
IF MISSING(regex_id) THEN DO;
PUTLOG "ERROR: Invalid regular expression " regex;
STOP;
END;
END;
INPUT line $char1000.;
*PUT "INPUT LINE " _N_= line=;
LENGTH directory $1000;
RETAIN directory;
IF line = ' '
OR PRXMATCH(regex_id, UPCASE(line)) > 0
OR UPCASE(LEFT(line)) IN:('VOLUME','TOTAL FILES LISTED:')
THEN DO;
*PUT "DIR/VOL/FILE/BYTE " _N_= line=;
DELETE;
END;
ELSE if UPCASE(LEFT(line))=:'DIRECTORY OF' THEN DO;
directory = LEFT(SUBSTR(line,INDEX(UPCASE(line),'DIRECTORY OF')+12));
*PUT "DIRECTORY OF " _N_= line= directory=;
DELETE;
END;
ELSE DO;
*INPUT @01 file_last_access_datetime ANYDTDTM21.
@22 file_size_bytes 17.
@40 file_owner $23.
@63 file_name $256.
;
file_last_access_datetime = INPUT(SUBSTR(line, 1, 21),ANYDTDTM21.);
file_size_bytes = INPUT(SUBSTR(line,22, 17), 17.);
file_owner = SUBSTR(line,40, 23) ;
file_name = SUBSTR(line,63,256) ;
file_extension = LOWCASE(SCAN(file_name, -1, "."));
*PUT "INPUT/OUTPUT " _ALL_;
OUTPUT;
END;
FORMAT file_last_access_datetime E8601DT19.;
RUN;
Where are you running your SAS program? On a PC or a remote SAS server? If it is your PC, check Windows Task Manager to see if there are any bottlenecks with CPU, memory, IO or network traffic usage. My suspicion is your network is what might be slowing you down but that is just a guess.
Also try running a test on a small directory of your file server so you can estimate more accurately how long it would take to expand the processing.
The SAS workspace server is running on Windows Server 2016.
Perhaps it would make sense to execute a Windows DIR command, PowerShell script, or VB Script that creates an output CSV file on the server, which I could read from SAS in a subsequent step.
Which Windows tool would be easier to prepare and tend to execute more efficiently?
The SAS program is running a DIR command command already.
@dmbuffum3 wrote:
Windows experts please help! Thanks in advance.
The following SAS code runs for more than a few days and fails to complete. It issues a recursive Windows DIR command with which I want to gather the following metadata on all the files on a large Windows fileserver.
- folder_path
- file_name
- file_extension
- file_owner
- file_size_bytes
- file_last_access_datetime
- file_last_written_datetime
- file_last_read_datetime
I estimate there are >5 million files on this fileserver based on the following results.
I might suggest breaking this into a few smaller jobs so that you have different root directories and save each to a different data set.
dir "\\fsprod109\MTKA Public\somefolder\*"
That way you may get some useable data without having to wait "days".
Performance related to network drives may be fun to address. Bandwidth, permissions, priorities and the every popular network security rules may have an impact.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.