Architecting, installing and maintaining your SAS environment

How do I find the stats of the largest dataset created by a user?

Reply
New User
Posts: 1

How do I find the stats of the largest dataset created by a user?

As a new SAS Admin, how do I go about doing an audit to find what the largest dataset uploaded and/or created was (dataset size, user whom created it, and when it was created)?

Frequent Contributor
Posts: 134

Re: How do I find the stats of the largest dataset created by a user?

Not easy to tell if you don't specify which product you are in charge of.

There was a collection of reports called "SAS Audit and Performance and Measurement" to be implemented to your EBI installment ,now installed with the SAS Platform :

SAS Audit, Performance and Measurement

I am not sure this reporting tool included a metric about Dataset size (measured in Bytes and/or in Number of observations).

Then there is the ARM logging API provided with SAS Foundation. Using this, you can enable your own logging but it requires some further coding & parameter changes :

35165 - Using restricted options to create dynamic Application Response Measurement (ARM) log files ...

https://support.sas.com/rnd/scalability/tools/arm/arm.html

SAS® 9.4 Interface to Application Response Measurement (ARM) :: SAS(R) 9.4 Interface to Application ...

I've become quite familiar with the ARM interface (on Windows, but this applies also broadly speaking to Unix/Linuw as well); A few guidelines :

- be careful if you enable the whole set of ARM subsystems (eg ARM_ALL)  : your ARM log files shall grow very quickly in size;

it's better to select the minimal subsystem which gives you the so-called 'metrics' (performances measures) you need & only thoses ones

- with ARM_DSIO + ARM_PROC, you could trace the SAS table size in bytes (source table and target table)

- the ARM refererence documentation (see above) itself does not cover everything; in particular, not only OLAP cube requests but SAS Data set size metrics are also recorded by ARM

in addition to the volume of memory (RAM) used, for instance. I had to build my own technical doc in order to understand this

Another possibility if you already have SAS batch sessions logs files at your disposal :

enable FULLSTIMER option and parse the log files to store the relevant information.

One last trick :you can request on-the-fly at any time the largest SAS table created in your session using the TABLES dictionary view

proc sql;

select max(filesize) as maxsize,libname,memname

from dictionary.tables

where not(libname in ('SASHELP','MAPS'))

group by memname, libname

order by maxsize descending

;

quit;

But, obviously, this is of a limited usage if you are looking for the largest SAS table created/accessed during every SAS session.

Valued Guide
Posts: 2,177

Re: How do I find the stats of the largest dataset created by a user?

I would have thought as a unix admin, you would run something better than analysing the output from

ls -lR /*.sas7* > ~/sasfile_list.txt .

Parsing that for the large tables by user account is the approach I take because I don't know better.

Super User
Posts: 7,809

Re: How do I find the stats of the largest dataset created by a user?

Actually, I'd run that command periodically as superuser, so the output is complete regardless of any permissions set on files/subdirs. The report file would end up probably somewhere in /var, to be picked up by a SAS program. Both actions can be scheduled with crontab.

When I come to think of it, I've just given a description how I solve exactly the same problem around here.

- create a complete file listing of all interesting directory trees

- read that into a SAS dataset

- from that, create HTML reports for tree and user, where every user receives a listing sorted by name/size/modification time

- create a .htaccess file that gives access only to users having at least one entry in a given tree

Of course, that only renders a report on the current contents. Temporary files are not covered.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Moderator
Posts: 252

Re: How do I find the stats of the largest dataset created by a user?

Ronan's suggestions is a good "SAS" method but only works on currently assigned libraries, not on all SAS tables on the file system.

PeterC's suggestion would work but assumes that the user has the ability to pipe O/S commands, typically disabled for end-users, possibly available to SAS admins - just depends on the relationship between the O/S admins and the application admins (e.g, SAS).

Frequent Contributor
Posts: 134

Re: How do I find the stats of the largest dataset created by a user?

Posted in reply to AndrewHowell

Good point, Andrew. I interpreted - perhaps completely off the mark - the need in terms of usage statistics with complete coverage. Launching, on a regular basis , system commands on the filesystem storing SAS tables would give the information, of course, but only part of it : that kind of reports would display the size of permanent SAS tables; let's figure out a counter-example : if, between two audits (daily, weekly, monthly ?), the user removes or shrinks any SAS table permanently stored  then the corresponding information would be missing altogether. That's why it all depends on the admin's needs or expectations : the ARM API for instance, properly used, could provide extensive coverage of any SAS tables accessed , not only the SAS tables permanently stored but any version of it stored at any time and even the temporary tables used in SASWORK and deleted in the ensuing process.

I know this distinction sounds a little bit like hair-splitting but, sometimes, admin experience SASWORK saturation caused by some reckless users who don't spare any space in their SAS EG session or even caused by concurrent SAS batch sessions consuming too much SASWORK put together.

The complete coverage approach relates more to tracking & disk footprint measurement than simple auditing, sure.

In that regard, being able to measure up the maximal footprint of a SAS session seems very useful for the admin though difficult to get. I know only of the ESM developed & sold by UK based Boemska company which gives this kind of (precious) details :

https://boemskats.com/esm/

Good point also for reminding us about ALLOWXCMD. In the meanwhile, system commands can be replaced with SAS code for getting SAS tables sizes, even recursive research on large folder hierarchies :

SAS Filesystem Toolbox - sasCommunity

Trusted Advisor
Posts: 3,214

Re: How do I find the stats of the largest dataset created by a user?

Yep Ronan, It is the madness of i/ disabling XCMD and than ii/ giving all functions to rebuild those in own code iii/ disabling it again introducing locked states iv/ adding the information in WH approach using  arm/eventmanager VM tools. Somebody must have lost the vision and direction with IT insights and now going around like a dog for his own tail.

At the same time other tools are getting on to the market thiese can be the SIEM big data analytics area (many tools). These are implementing all the auditing at the OS level. We could see the days monitoring of SAS usage an other tools at OS level and doing the analytics with any of those leaders (all not SAS) http://securityintelligence.com/gartner-2014-magic-quadrant-siem-security/#.VVJsgKUcSUl

http://securityintelligence.com/gartner-2014-magic-quadrant-siem-security/#.VVJsI6UcSUk

---->-- ja karman --<-----
Ask a Question
Discussion stats
  • 6 replies
  • 823 views
  • 1 like
  • 6 in conversation