All,
How to find the full size of a folder through SAS code(in UNIX ).
Can't use pipe as we are constrained by SAS NOXCMD option.
below code has pipe so no luck with this.
filename du pipe "du -q /data/team";
data work.diskusage;
infile du;
input @;
put _infile_;
if ( _infile_ =: 'Size:' ) then do;
sizeInBytes = input(scan(_infile_,2,' '), comma32.);
output;
end;
input;
run;
Thanks,
SS
The macro will work well when you just want to get the one number out.
To get the full list of files you should stick with data step code. Here is method using the MODIFY statement to process all of the entries in a directory tree(s) based on this post:
First you make a dataset with the structure you want and any variables you are going to keep. Then in a second step use MODIFY to check each observation if it is a directory or not. For directories you append the directory entries and for non-directories you gather the information you want.
On Unix info numbers 5 and 6 are last mod date and size. The info numbers could be different on other OS.
data filelist;
length dname filename $256 dir level 8 lastmod size 8;
format lastmod datetime20.;
input dname;
retain filename ' ' level 0 dir 1;
cards4;
~/temp
;;;;
data filelist;
modify filelist;
rc1=filename('tmp',catx('/',dname,filename));
rc2=dopen('tmp');
dir = not not rc2;
if not dir then do;
fid=fopen('tmp','i',0,'b');
lastmod=input(finfo(fid,foptname(fid,5)),datetime24.);
size=input(finfo(fid,foptname(fid,6)),32.);
fid=fclose(fid);
end;
else do;
dname=catx('/',dname,filename);
filename=' ';
end;
replace;
if dir;
level=level+1;
do i=1 to dnum(rc2);
filename=dread(rc2,i);
output;
end;
rc3=dclose(rc2);
run;
Now you can use SAS to summarize the data.
proc summary data=filelist ;
class dname ;
var lastmod size;
output out=summary
sum(size)=total_size
max(lastmod)=max_ts
;
run;
proc print width=min;
format total_size comma20.;
run;
Obs dname _TYPE_ _FREQ_ total_size max_ts 1 0 85 12,055,127 10JUL2019:11:34:31 2 ~/temp 1 43 9,885,263 10JUL2019:11:34:31 3 ~/temp/autocutsel-0.9.0 1 38 1,728,684 11NOV2013:16:05:57 4 ~/temp/drg 1 4 441,180 22FEB2014:10:37:17
you can try something as below without use of any pipe symbol
1) use the x command with unix commands like du -h with the path of the directory for which we need to identify the size and also direct the size of directory to a file txt file with symbol >.
2) import the txt file with the filesize
3) use scan function to get the size into a separate variable.
x "du -h ¬path > ¬path/text.txt";
proc import datafile='¬path/text.txt' out=have dbms=dlm replace;
getnames=no;
run;
data want;
set have;
size=scan(var1,1,'/');
run;
@Jagadishkatam When XCMD is disabled, the X statement will not be available.
Try this macro:
%macro size(directory);
%local did i name subdir fref fref2 size;
%let size = 0;
%let did = %sysfunc(filename(fref,&directory));
%let did = %sysfunc(dopen(&fref));
%if &did ne 0
%then %do;
%do i = 1 %to %sysfunc(dnum(&did));
%let name = &directory/%sysfunc(dread(&did,&i));
%let subdir = %sysfunc(filename(fref2,&name));
%let subdir = %sysfunc(dopen(&fref2));
%if &subdir ne 0
%then %do;
%let subdir=%sysfunc(dclose(&subdir));
%let size = %eval(&size + %size(&name));
%end;
%else %do;
%let fid = %sysfunc(fopen(&fref2));
%let size = %eval(&size + %sysfunc(finfo(&fid,Dateigröße (Byte))));
%let fid = %sysfunc(fclose(&fid));
%end;
%let subdir=%sysfunc(filename(fref2));
%end;
%let did=%sysfunc(dclose(&did));
%end;
%let did=%sysfunc(filename(fref));
&size
%mend;
%put size=%size(/folders/myfolders);
This code is tested on SAS UE on a Mac with a German locale. That's why the name of the file information item is "Dateigröße (Byte)".
You need to determine the name of the information item in your locale by running
data items;
length fref $8;
rc = filename(fref,'/name_of_an_existing_file');
fid = fopen(fref);
do i = 1 to foptnum(fid);
item = foptname(fid,i);
output;
end;
fid = fclose(fid);
rc = filename(fref);
run;
Something similar to this should work: You provide the "physical-name" of your folder. It should start at a mount point. This should get all of the directory properties.
data diropts; length optname $ 12 optval $ 40; keep optname optval; rc=filename("mydir", "physical-name"); did=dopen("mydir"); numopts=doptnum(did); do i=1 to numopts; optname=doptname(did, i); optval=dinfo(did, optname); output; end; run;
DOPEN function opens a directory, Doptname is the name of an option or charactersitic, Dinfo is the value of the option. There really should be a DCLOSE after getting the options read.
On a UNIX system, this does not return any size at all, not even the size of the directory file itself:
data diropts;
length optname $ 30 optval $ 40;
keep optname optval;
rc=filename("mydir", "/folders/myfolders");
did=dopen("mydir");
numopts=doptnum(did);
do i=1 to numopts;
optname=doptname(did, i);
optval=dinfo(did, optname);
output;
end;
run;
Resulting dataset:
optname optval 1 Verzeichnis /folders/myfolders 2 Besitzername root 3 Gruppenname vboxsf 4 Zugriffsberechtigung drwxrwx--- 5 Zuletzt geändert 21. April 2020 17.09 Uhr
@Tom wrote:
The SIZE of a directory is pretty meaningless. It just is reflection of how many bytes it takes to store the list of filenames and links.
You need to check each individual file in the directory using FINFO() function.
That's exactly what my macro does. Run it for a test.
The macro will work well when you just want to get the one number out.
To get the full list of files you should stick with data step code. Here is method using the MODIFY statement to process all of the entries in a directory tree(s) based on this post:
First you make a dataset with the structure you want and any variables you are going to keep. Then in a second step use MODIFY to check each observation if it is a directory or not. For directories you append the directory entries and for non-directories you gather the information you want.
On Unix info numbers 5 and 6 are last mod date and size. The info numbers could be different on other OS.
data filelist;
length dname filename $256 dir level 8 lastmod size 8;
format lastmod datetime20.;
input dname;
retain filename ' ' level 0 dir 1;
cards4;
~/temp
;;;;
data filelist;
modify filelist;
rc1=filename('tmp',catx('/',dname,filename));
rc2=dopen('tmp');
dir = not not rc2;
if not dir then do;
fid=fopen('tmp','i',0,'b');
lastmod=input(finfo(fid,foptname(fid,5)),datetime24.);
size=input(finfo(fid,foptname(fid,6)),32.);
fid=fclose(fid);
end;
else do;
dname=catx('/',dname,filename);
filename=' ';
end;
replace;
if dir;
level=level+1;
do i=1 to dnum(rc2);
filename=dread(rc2,i);
output;
end;
rc3=dclose(rc2);
run;
Now you can use SAS to summarize the data.
proc summary data=filelist ;
class dname ;
var lastmod size;
output out=summary
sum(size)=total_size
max(lastmod)=max_ts
;
run;
proc print width=min;
format total_size comma20.;
run;
Obs dname _TYPE_ _FREQ_ total_size max_ts 1 0 85 12,055,127 10JUL2019:11:34:31 2 ~/temp 1 43 9,885,263 10JUL2019:11:34:31 3 ~/temp/autocutsel-0.9.0 1 38 1,728,684 11NOV2013:16:05:57 4 ~/temp/drg 1 4 441,180 22FEB2014:10:37:17
Great stuff, @Tom !
May I use your code as an example in my paper (it was planned for #SASGF2020, but I intend to present a revised and expanded version at #SASGF2021), as an alternative to the macro?
And I have found a useful modification to make it work in different locales:
lastmod=input(finfo(fid,foptname(fid,5)),NLDATMW.);
This worked at least with my German UE.
Actually you don't need the W version. But the length matters. NLDATM100 works best. Out of the 135 values for LOCALE I found it could read the strings generated for 87 of them. Much better than DATETIME or ANYDTDTM.
The MEANS Procedure N Variable N Miss ----------------------- datetime 3 132 NLDATM 87 48 anydtdtm 33 102 -----------------------
I have found another improvement to your code:
data filelist;
modify filelist;
rc1=filename('tmp',catx('/',dname,filename));
rc2=dopen('tmp');
dir = not not rc2;
if not dir then do;
fid=fopen('tmp','i',0,'b');
lastmod=input(finfo(fid,foptname(fid,5)),NLDATM100.);
size=input(finfo(fid,foptname(fid,6)),32.);
fid=fclose(fid);
end;
else do;
dname=catx('/',dname,filename);
filename=' ';
lastmod=input(dinfo(rc2,doptname(rc2,5)),NLDATM100.);
end;
replace;
if dir;
level=level+1;
do i=1 to dnum(rc2);
filename=dread(rc2,i);
output;
end;
rc3=dclose(rc2);
run;
It will now record the modification timestamp for directories. The only thing that is missing is the size of the directory file itself, which cannot be determined with SAS functions; for this one would need the external command.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.