BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Augusto
Obsidian | Level 7

Hello guys,

I need a big help.

Here at work, we identified some ways to optimize our disk space.


Below I have listed what has been identified as well and what needs to be done. I would like to know of you how to do it

1)

a) Identifield: Sas data file without access by users

b) Needed: List identifying the sas data file with no access..


2)

a) Identifield: Filesystems unused

b) Needed: List with identification of unused Filesystems.


3)

a) Identifield: Existence of compacted sas data file

b) Needed:List with identification of compacted sas data files and processes that use them.


4)

a) Identifield: Existence of sas data file with the same name

b) Needed: List identifying the sas data files with the same name but in different directories


5)

a) Identifield: Compression sas data file

b) Needed: A list of sas data file which may be compressed


6)

a) Identifield: Shared filesystems

b) Needed: A list of Shared filesystems


If you have some suggestions for best practices, please feel free to share,


Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
jakarman
Barite | Level 11

Augusto,    your list is looking like some chosen checkpoint made by an auditor.

1/  data  without access by users

*) Needed: List identifying the sas data file with  no access..

This is monitoring of all activity on all data on the OS level. With Arm log4j being  active SAS can do a lot of this.   

You only can get a list if data that is used. Unused ones will cause no events.  Use the list of all data that is present  with that. (3,4,5)  

2) Unused space / Filesystems

*) Needed: List with identification of unused Filesystems.

This is an OS task (xcmd usage)  the df command woll show all filesystems. With a decent setup you should see what is stores where and how much is free.

The du command will make a list of the space in use. 

    

3) Existence of compacted sas data file

*) Needed:List with identification of compacted sas data files and processes that use them.

Make a list of all directories with sas-datasets. The attributes off all datasets can be gathered as a new datasets. One of the elements is compress

http://support.sas.com/documentation/cdl/en/proc/67916/HTML/default/viewer.htm#p1sy9ca8n2tv03n1savk4...

Also available as a SASHELP.v----- view member. 

The files that are compacted using zip or another to be analyzed  at OS elvel. Make a list at OS level (ls command using xmcd) of all OS files (recursive option is there)

    

4) Existence of sas data file with the same name

* ) Needed: List identifying the sas data files with the same name but in different directories

Having a list of all OS files have that on split in path name and filename.  Just ordering will give you an answer

    

5)  Compression sas data file

*) Needed: A list of sas data file which may be compressed

  Having a list of all SAS datasets with all their attributes. look for the ones that are not compressed and having long records/many variables many observations

 
6) Shared filesystems

*b) Needed: A list of Shared filesystems

Shared filesystems is an OS task. At Windows it is very common usage to propagate to a desktop.  Within in a –Nix system it is a “no  done” situation as of too many possible security holes. (mostly forbidden)

With a SAS grid approach it could be there. Check with the storage admin / system admin.

---->-- ja karman --<-----

View solution in original post

4 REPLIES 4
jakarman
Barite | Level 11

Augusto,    your list is looking like some chosen checkpoint made by an auditor.

1/  data  without access by users

*) Needed: List identifying the sas data file with  no access..

This is monitoring of all activity on all data on the OS level. With Arm log4j being  active SAS can do a lot of this.   

You only can get a list if data that is used. Unused ones will cause no events.  Use the list of all data that is present  with that. (3,4,5)  

2) Unused space / Filesystems

*) Needed: List with identification of unused Filesystems.

This is an OS task (xcmd usage)  the df command woll show all filesystems. With a decent setup you should see what is stores where and how much is free.

The du command will make a list of the space in use. 

    

3) Existence of compacted sas data file

*) Needed:List with identification of compacted sas data files and processes that use them.

Make a list of all directories with sas-datasets. The attributes off all datasets can be gathered as a new datasets. One of the elements is compress

http://support.sas.com/documentation/cdl/en/proc/67916/HTML/default/viewer.htm#p1sy9ca8n2tv03n1savk4...

Also available as a SASHELP.v----- view member. 

The files that are compacted using zip or another to be analyzed  at OS elvel. Make a list at OS level (ls command using xmcd) of all OS files (recursive option is there)

    

4) Existence of sas data file with the same name

* ) Needed: List identifying the sas data files with the same name but in different directories

Having a list of all OS files have that on split in path name and filename.  Just ordering will give you an answer

    

5)  Compression sas data file

*) Needed: A list of sas data file which may be compressed

  Having a list of all SAS datasets with all their attributes. look for the ones that are not compressed and having long records/many variables many observations

 
6) Shared filesystems

*b) Needed: A list of Shared filesystems

Shared filesystems is an OS task. At Windows it is very common usage to propagate to a desktop.  Within in a –Nix system it is a “no  done” situation as of too many possible security holes. (mostly forbidden)

With a SAS grid approach it could be there. Check with the storage admin / system admin.

---->-- ja karman --<-----
Kurt_Bremser
Super User

4) (UNIX)

filename oscmd pipe 'find / -name \*.sas7bdat -print';

data files;

infile oscmd lrecl=500 truncover;

length

  path $500

  fname $200

;

input path;

fname = lowcase(scan(path,-1,'/'));

run;

proc sort data=files;

by fname;

run;

data int (keep=fname);

set files;

retain count;

if first.fname then count=1; else count+1;

if last.fname and count > 1 then output;

run;

data result;

merge

  files

  int (in=_int)

;

by fname;

if _int;

run;

5)

Get the attributes from sashelp.vtables (make sure that ALL directories containing SAS fileshave a libname assigned, as sashelp.vtables is dynamically created from all currently defined&assigned libraries.

LinusH
Tourmaline | Level 20

I think that Jaap covered the bullets quite extensively.

A general thought. It sounds like that your users have quite some freedom to create data. Instead of chasing ghosts you could centralise your data management. Allowing data to be stored in specific locations.

Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 969 views
  • 6 likes
  • 4 in conversation