As a new SAS Admin, how do I go about doing an audit to find what the largest dataset uploaded and/or created was (dataset size, user whom created it, and when it was created)?
Not easy to tell if you don't specify which product you are in charge of.
There was a collection of reports called "SAS Audit and Performance and Measurement" to be implemented to your EBI installment ,now installed with the SAS Platform :
I am not sure this reporting tool included a metric about Dataset size (measured in Bytes and/or in Number of observations).
Then there is the ARM logging API provided with SAS Foundation. Using this, you can enable your own logging but it requires some further coding & parameter changes :
I've become quite familiar with the ARM interface (on Windows, but this applies also broadly speaking to Unix/Linuw as well); A few guidelines :
- be careful if you enable the whole set of ARM subsystems (eg ARM_ALL) : your ARM log files shall grow very quickly in size;
it's better to select the minimal subsystem which gives you the so-called 'metrics' (performances measures) you need & only thoses ones
- with ARM_DSIO + ARM_PROC, you could trace the SAS table size in bytes (source table and target table)
- the ARM refererence documentation (see above) itself does not cover everything; in particular, not only OLAP cube requests but SAS Data set size metrics are also recorded by ARM
in addition to the volume of memory (RAM) used, for instance. I had to build my own technical doc in order to understand this
Another possibility if you already have SAS batch sessions logs files at your disposal :
enable FULLSTIMER option and parse the log files to store the relevant information.
One last trick :you can request on-the-fly at any time the largest SAS table created in your session using the TABLES dictionary view
select max(filesize) as maxsize,libname,memname
where not(libname in ('SASHELP','MAPS'))
group by memname, libname
order by maxsize descending
But, obviously, this is of a limited usage if you are looking for the largest SAS table created/accessed during every SAS session.
Actually, I'd run that command periodically as superuser, so the output is complete regardless of any permissions set on files/subdirs. The report file would end up probably somewhere in /var, to be picked up by a SAS program. Both actions can be scheduled with crontab.
When I come to think of it, I've just given a description how I solve exactly the same problem around here.
- create a complete file listing of all interesting directory trees
- read that into a SAS dataset
- from that, create HTML reports for tree and user, where every user receives a listing sorted by name/size/modification time
- create a .htaccess file that gives access only to users having at least one entry in a given tree
Of course, that only renders a report on the current contents. Temporary files are not covered.
Ronan's suggestions is a good "SAS" method but only works on currently assigned libraries, not on all SAS tables on the file system.
PeterC's suggestion would work but assumes that the user has the ability to pipe O/S commands, typically disabled for end-users, possibly available to SAS admins - just depends on the relationship between the O/S admins and the application admins (e.g, SAS).
Good point, Andrew. I interpreted - perhaps completely off the mark - the need in terms of usage statistics with complete coverage. Launching, on a regular basis , system commands on the filesystem storing SAS tables would give the information, of course, but only part of it : that kind of reports would display the size of permanent SAS tables; let's figure out a counter-example : if, between two audits (daily, weekly, monthly ?), the user removes or shrinks any SAS table permanently stored then the corresponding information would be missing altogether. That's why it all depends on the admin's needs or expectations : the ARM API for instance, properly used, could provide extensive coverage of any SAS tables accessed , not only the SAS tables permanently stored but any version of it stored at any time and even the temporary tables used in SASWORK and deleted in the ensuing process.
I know this distinction sounds a little bit like hair-splitting but, sometimes, admin experience SASWORK saturation caused by some reckless users who don't spare any space in their SAS EG session or even caused by concurrent SAS batch sessions consuming too much SASWORK put together.
The complete coverage approach relates more to tracking & disk footprint measurement than simple auditing, sure.
In that regard, being able to measure up the maximal footprint of a SAS session seems very useful for the admin though difficult to get. I know only of the ESM developed & sold by UK based Boemska company which gives this kind of (precious) details :
Good point also for reminding us about ALLOWXCMD. In the meanwhile, system commands can be replaced with SAS code for getting SAS tables sizes, even recursive research on large folder hierarchies :
Yep Ronan, It is the madness of i/ disabling XCMD and than ii/ giving all functions to rebuild those in own code iii/ disabling it again introducing locked states iv/ adding the information in WH approach using arm/eventmanager VM tools. Somebody must have lost the vision and direction with IT insights and now going around like a dog for his own tail.
At the same time other tools are getting on to the market thiese can be the SIEM big data analytics area (many tools). These are implementing all the auditing at the OS level. We could see the days monitoring of SAS usage an other tools at OS level and doing the analytics with any of those leaders (all not SAS) http://securityintelligence.com/gartner-2014-magic-quadrant-siem-security/#.VVJsgKUcSUl
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.
Find more tutorials on the SAS Users YouTube channel.