04-17-2014 08:02 AM
We are running SAS 9.3 on a Linux server and we have about 250 DataMarts spread across multiple file systems/directories. We have approx 3,000 users running adhoc queries against these DataMarts. These DataMarts contain multiple tables and views. Is there a way to determine what tables are being used and what columns are being used? We are in the process of "consolidating/house-cleaning". We want to identify "what" is being used and "what" is NOT being used at the COLUMN level.
04-17-2014 09:13 AM
What do you mean by 'what is/not being used at the column level'? Do you mean is being accessed at any specific time point, or is utilized in the queries? If it is file access I haven't got a clue I am afraid, especially at the column level. At a file level you could check the file handles to see if it is being accessed.
If it is at the query level, then those queries are will be stored somewhere, text files, metadata server etc. You could read those and process the 'select' statements to find out what tables/columns are used in the query.
04-17-2014 09:51 AM
To simplify, let's say i have 3 SAS Datasets under 3 differently directory paths called CUSTOMER. One has 50 columns, one has 60 columns, and one has 90 columns. Is there anyway to identify, who is accessing these tables via SAS Data or SAS PROCs and identify which columns are being requested (read). We are in the early stages of trying to consolidate "liked" data and remove columns not being used I know in NETEZZA, there is a system table that contains all SQL calls to a table. Was wondering if anything existed for SAS.
04-17-2014 10:25 AM
I am by no means an expert on this, but I think you would have to save all the SAS logs and post process them. I don't know of a magic switch that you can turn on to monitor a SAS data set in this fashion.
My comment is from a SAS Foundation, not a SAS solution, perspective. I assume the question is regarding SAS Foundation.
Message was edited by: Jeff Bailey
04-18-2014 06:48 AM
- They are SAS Datasets and SAS DataViews. They are NOT defined in Metadata.
- Users are using BATCH SAS, Enterprise Guide, SAS/Connect (rsubmit/endrsubmit).
- Parsing logs would be considered, but not from logs from 3000 users. Is there a more GLOBAL approach to turn on "logging" on? The Audit package seems to capture access rights, and capture when tables are updated (tables/columns), not so much from a READ perspective (I may be wrong). I came across RTRACE last night and still researching. I'm I wasting my time with this? Thanks for all the comments.
04-18-2014 07:32 AM
Without metadata, you could only audit table (file) access via the audit subsystem of the operating system (if you have one, I come from AIX) or with the RTRACE/RTRACELOC option. Be aware that (at least in 9.2) the RTRACELOC option in the sasv9.cfg does not support dynamically generated filenames as the LOG option does, so you may have to do that in the script that invokes sas and use the -rtraceloc command line option.
Access of columns could only be audited by working through the logs, IMO.
04-18-2014 08:48 AM
Not much way to capture VARIABLE (aka column) usage without analyzing the SAS code (or logs). When you write SAS code to read variable X from dataset Y at the operating system level SAS will need to access the whole dataset and read the complete data vector into memory to find variable Y.
04-18-2014 10:14 AM
04-23-2014 04:14 PM
Just to add one other comment on this, SAS is currently developing capabilities that will make this sort of distributed auditing of access to SAS data sets simple and automated. This is coming in a future release.
Also, I'd like to mention that our Federation Server product can be leveraged to deliver access control to SAS data sets if security is a concern as well.
04-18-2014 10:08 AM
That's going to make things very tricky. If only SQL were used to access the data, logging might be able to give you some insight. But if I'm accessing a dataset with a SET statement, SAS has no way of knowing which variables in the dataset that I want to look at.
My feeling is that there's no good solution to this problem. I might suggest opening a case with Tech Support. They have the expertise to give you a definitive answer.
04-17-2014 10:55 AM
Audit stands for knowing what is used......
My comment is from an Eip perspective (metadataserver workspaceserver etc)
04-18-2014 07:20 AM
As long you are using the EIP approach Workspace servers Connect-services Batch-services you have the option to activate all that logging on the server-side.
It is lesser known area of configuration. See SAS(R) 9.3 Logging: Configuration and Programming Reference (Reports library access audit information.)
The rtrace option could be helpful in determining you have licensed parts that are not uses: http://www2.sas.com/proceedings/sugi30/216-30.pdf - http://support.sas.com/resources/papers/proceedings14/1877-2014.pdf
I you have control on the autoexec/sas-macro/libname behavior you could build your own logging using arm-statements.
The most dirty one of approaches is to replace the libname statement by your own autocall macro statement style.
04-24-2014 02:32 AM
@Mike when you are aiming at the APM and Eventmanager evolvements than sorry..
SAS is missing a holistic view on normal market requirement on auditing/monitoring.
If your only object-type would be SAS datasets, that is very limited view on all the information (possible sensitive). Flat files HTML XML any type of unstructured data having metadata.
This kind of limitation would imply that SAS is not being applicable for a Statistical Analyses System. With the big-data hype at his top that would be big misalignment.
For the holistic view I can refer to http://csrc.nist.gov/publications/nistpubs/800-137/SP800-137-Final.pdf.
A difficult situation as this a federal guideline, how would tools that are not able to align with this kind of requirement being used in that kind environments.
04-24-2014 11:33 AM
Work is underway to develop the kind of auditing and monitoring that you mention at SAS. It is not work that is limited to SAS datasets and is certainly inclusive of big data environments as well as many others.
Thank you for taking the time to make this comment and allow me to respond more fully.