BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sathya66
Barite | Level 11

All,

How to find the full size of a folder through SAS code(in UNIX ).

Can't use pipe as we are constrained by SAS NOXCMD option.

below code has pipe so no luck with this.

filename du pipe "du -q /data/team";
data work.diskusage;
infile du;
input @;
put _infile_;
if ( _infile_ =: 'Size:' ) then do;
    sizeInBytes = input(scan(_infile_,2,' '), comma32.);
    output;
end;
input;
run;

Thanks,

SS

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The macro will work well when you just want to get the one number out.

To get the full list of files you should stick with data step code.  Here is method using the MODIFY statement to process all of the entries in a directory tree(s) based on this post:

https://communities.sas.com/t5/SAS-Programming/listing-all-files-within-a-levelectory-and-sublevelec...

 

First you make a dataset with the structure you want and any variables you are going to keep.  Then in a second step use MODIFY to check each observation if it is a directory or not. For directories you append the directory entries and for non-directories you gather the information you want. 

On Unix info numbers 5 and 6 are last mod date and size.  The info numbers could be different on other OS.

data filelist;
  length dname filename $256 dir level 8 lastmod size 8;
  format lastmod datetime20.;
  input dname;
  retain filename ' ' level 0 dir 1;
cards4;
~/temp
;;;;

data filelist;
  modify filelist;
  rc1=filename('tmp',catx('/',dname,filename));
  rc2=dopen('tmp');
  dir = not not rc2;
  if not dir then do;
    fid=fopen('tmp','i',0,'b');
    lastmod=input(finfo(fid,foptname(fid,5)),datetime24.);
    size=input(finfo(fid,foptname(fid,6)),32.);
    fid=fclose(fid);
  end;
  else do;
    dname=catx('/',dname,filename);
    filename=' ';
  end;
  replace;
  if dir;
  level=level+1;
  do i=1 to dnum(rc2);
    filename=dread(rc2,i);
    output;
  end;
  rc3=dclose(rc2);
run;

Now you can use SAS to summarize the data.

proc summary data=filelist ;
  class dname ;
  var lastmod size;
  output out=summary
   sum(size)=total_size
   max(lastmod)=max_ts
 ;
run;
proc print width=min;
 format total_size comma20.;
run;
Obs    dname                      _TYPE_    _FREQ_    total_size          max_ts

 1                                   0        85      12,055,127    10JUL2019:11:34:31
 2     ~/temp                        1        43       9,885,263    10JUL2019:11:34:31
 3     ~/temp/autocutsel-0.9.0       1        38       1,728,684    11NOV2013:16:05:57
 4     ~/temp/drg                    1         4         441,180    22FEB2014:10:37:17

View solution in original post

21 REPLIES 21
Jagadishkatam
Amethyst | Level 16

you can try something as below without use of any pipe symbol

 

1) use the x command with unix commands like du -h with the path of the directory for which we need to identify the size and also direct the size of directory to a file txt file with symbol >. 

2) import the txt file with the filesize

3) use scan function to get the size into a separate variable.

 

x "du -h ¬path > ¬path/text.txt";

proc import datafile='¬path/text.txt' out=have dbms=dlm replace;
getnames=no;
run;

data want;
set have;
size=scan(var1,1,'/');
run;
Thanks,
Jag
Kurt_Bremser
Super User

Try this macro:

 

%macro size(directory);
%local did i name subdir fref fref2 size;
%let size = 0;
%let did = %sysfunc(filename(fref,&directory));
%let did = %sysfunc(dopen(&fref));
%if &did ne 0
%then %do;
  %do i = 1 %to %sysfunc(dnum(&did));
    %let name = &directory/%sysfunc(dread(&did,&i));
    %let subdir = %sysfunc(filename(fref2,&name));
    %let subdir = %sysfunc(dopen(&fref2));
    %if &subdir ne 0
    %then %do;
      %let subdir=%sysfunc(dclose(&subdir));
      %let size = %eval(&size + %size(&name));
    %end;
    %else %do;
      %let fid = %sysfunc(fopen(&fref2));
      %let size = %eval(&size + %sysfunc(finfo(&fid,Dateigröße (Byte))));
      %let fid = %sysfunc(fclose(&fid));
    %end;
    %let subdir=%sysfunc(filename(fref2));
  %end;
  %let did=%sysfunc(dclose(&did));
%end;
%let did=%sysfunc(filename(fref));
&size
%mend;
%put size=%size(/folders/myfolders);

This code is tested on SAS UE on a Mac with a German locale. That's why the name of the file information item is "Dateigröße (Byte)".

You need to determine the name of the information item in your locale by running

data items;
length fref $8;
rc = filename(fref,'/name_of_an_existing_file');
fid = fopen(fref);
do i = 1 to foptnum(fid);
  item = foptname(fid,i);
  output;
end;
fid = fclose(fid);
rc = filename(fref);
run;

 

ballardw
Super User

Something similar to this should work: You provide the "physical-name" of your folder. It should start at a mount point. This should get all of the directory properties.

data diropts;
   length optname $ 12 optval $ 40;
   keep optname optval;
   rc=filename("mydir", "physical-name");
   did=dopen("mydir");
   numopts=doptnum(did);
   do i=1 to numopts;
      optname=doptname(did, i);
      optval=dinfo(did, optname);
      output;
   end;
   run;

DOPEN function opens a directory, Doptname is the name of an option or charactersitic, Dinfo is the value of the option. There really should be a DCLOSE after getting the options read.

Kurt_Bremser
Super User

On a UNIX system, this does not return any size at all, not even the size of the directory file itself:

data diropts;
   length optname $ 30 optval $ 40;
   keep optname optval;
   rc=filename("mydir", "/folders/myfolders");
   did=dopen("mydir");
   numopts=doptnum(did);
   do i=1 to numopts;
      optname=doptname(did, i);
      optval=dinfo(did, optname);
      output;
   end;
run;

Resulting dataset:

	
optname
optval
1	Verzeichnis	/folders/myfolders	
2	Besitzername	root	
3	Gruppenname	vboxsf	
4	Zugriffsberechtigung	drwxrwx---	
5	Zuletzt geändert	21. April 2020 17.09 Uhr
Tom
Super User Tom
Super User
The SIZE of a directory is pretty meaningless. It just is reflection of how many bytes it takes to store the list of filenames and links.
You need to check each individual file in the directory using FINFO() function.
Kurt_Bremser
Super User

@Tom wrote:
The SIZE of a directory is pretty meaningless. It just is reflection of how many bytes it takes to store the list of filenames and links.
You need to check each individual file in the directory using FINFO() function.

That's exactly what my macro does. Run it for a test.

Tom
Super User Tom
Super User

The macro will work well when you just want to get the one number out.

To get the full list of files you should stick with data step code.  Here is method using the MODIFY statement to process all of the entries in a directory tree(s) based on this post:

https://communities.sas.com/t5/SAS-Programming/listing-all-files-within-a-levelectory-and-sublevelec...

 

First you make a dataset with the structure you want and any variables you are going to keep.  Then in a second step use MODIFY to check each observation if it is a directory or not. For directories you append the directory entries and for non-directories you gather the information you want. 

On Unix info numbers 5 and 6 are last mod date and size.  The info numbers could be different on other OS.

data filelist;
  length dname filename $256 dir level 8 lastmod size 8;
  format lastmod datetime20.;
  input dname;
  retain filename ' ' level 0 dir 1;
cards4;
~/temp
;;;;

data filelist;
  modify filelist;
  rc1=filename('tmp',catx('/',dname,filename));
  rc2=dopen('tmp');
  dir = not not rc2;
  if not dir then do;
    fid=fopen('tmp','i',0,'b');
    lastmod=input(finfo(fid,foptname(fid,5)),datetime24.);
    size=input(finfo(fid,foptname(fid,6)),32.);
    fid=fclose(fid);
  end;
  else do;
    dname=catx('/',dname,filename);
    filename=' ';
  end;
  replace;
  if dir;
  level=level+1;
  do i=1 to dnum(rc2);
    filename=dread(rc2,i);
    output;
  end;
  rc3=dclose(rc2);
run;

Now you can use SAS to summarize the data.

proc summary data=filelist ;
  class dname ;
  var lastmod size;
  output out=summary
   sum(size)=total_size
   max(lastmod)=max_ts
 ;
run;
proc print width=min;
 format total_size comma20.;
run;
Obs    dname                      _TYPE_    _FREQ_    total_size          max_ts

 1                                   0        85      12,055,127    10JUL2019:11:34:31
 2     ~/temp                        1        43       9,885,263    10JUL2019:11:34:31
 3     ~/temp/autocutsel-0.9.0       1        38       1,728,684    11NOV2013:16:05:57
 4     ~/temp/drg                    1         4         441,180    22FEB2014:10:37:17
ghosh
Barite | Level 11
Thanks @Tom this is really useful to get a list of directory contents with the corresponding lastmod date which is not shown in SAS EG 7.x. I just stuck a proc print after the second data step. Just to let you know the first data step throws a warning so I included lastmod size 0 in the retain line;
Tom
Super User Tom
Super User
I would missing instead of zero. Zero is a valid datetime (1960). And zero is also a valid filesize.
You can use CALL MISSING() if you are worried about uninitialized variable informational notes.
Kurt_Bremser
Super User

Great stuff, @Tom !

May I use your code as an example in my paper (it was planned for #SASGF2020, but I intend to present a revised and expanded version at #SASGF2021), as an alternative to the macro?

Kurt_Bremser
Super User

And I have found a useful modification to make it work in different locales:

    lastmod=input(finfo(fid,foptname(fid,5)),NLDATMW.);

This worked at least with my German UE.

Tom
Super User Tom
Super User

Actually you don't need the W version.  But the length matters.  NLDATM100 works best.  Out of the 135 values for LOCALE I found it could read the strings generated for 87 of them. Much better than DATETIME or ANYDTDTM.

The MEANS Procedure

                      N
Variable      N    Miss
-----------------------
datetime      3     132
NLDATM       87      48
anydtdtm     33     102
-----------------------

 

Kurt_Bremser
Super User

I have found another improvement to your code:

data filelist;
  modify filelist;
  rc1=filename('tmp',catx('/',dname,filename));
  rc2=dopen('tmp');
  dir = not not rc2;
  if not dir then do;
    fid=fopen('tmp','i',0,'b');
    lastmod=input(finfo(fid,foptname(fid,5)),NLDATM100.);
    size=input(finfo(fid,foptname(fid,6)),32.);
    fid=fclose(fid);
  end;
  else do;
    dname=catx('/',dname,filename);
    filename=' ';
    lastmod=input(dinfo(rc2,doptname(rc2,5)),NLDATM100.);
  end;
  replace;
  if dir;
  level=level+1;
  do i=1 to dnum(rc2);
    filename=dread(rc2,i);
    output;
  end;
  rc3=dclose(rc2);
run;

It will now record the modification timestamp for directories. The only thing that is missing is the size of the directory file itself, which cannot be determined with SAS functions; for this one would need the external command.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 21 replies
  • 6268 views
  • 12 likes
  • 7 in conversation