09-29-2014 10:45 AM
I've searched high and low and found nothing on this question.
I work for a large clinical study that posts analytical datasets for investigators. We've discovered that posting a dataset and a parallel data dictionary is fine but that some things fall through the cracks. For instance we need to post a new file that lists participants medications use but the new file is a specific update that has been manipulated. We have to maintain our naming protocol so the variables contained in the data file have to remain constant and the name of the file itself must remain constant. What I'd like to do ideally is create a log message that displays when the data file is read into SAS. In the same way that variable labels are built into the data file I'd like to force a brief message into that data file so that the user will see a brief change history and contact information for the updated file.
I see the the PUTLOG statement here SAS(R) 9.2 Language Reference: Concepts, Second Edition
but that only writes to the log at the point of a data step on my own machine as I understand it, I'd like the statement to be written for anyone using the datafile that I've created.
Thanks in advance for any help you can offer.
09-29-2014 10:59 AM
Other than sending a metadata file with the data, I don't see how you would do this programmatically. The only field I can think of is the dataset label.
I see you need the filename to remain constant, this implies to me that there is version control on that, if so what I suggest is that your log/audit history would be best placed in the version control software. I.e. if I push a file out, I would then commit to SVN, with a note indicating what the update is for. The audit trail on the file within the version system will have who committed, when, and the note to explain the new file.
09-29-2014 11:36 AM
If I were working in an industrial setting I'm sure that we would have version control software in place! Clinical research occasionally bumps into things that were solved by industry best practices BUT we tend to have very little funding and no industry experience; our approach to version control is to post only the most recent dataset. I'll look into dataset label as an approach. thanks for pointing that out.
09-29-2014 12:59 PM
To note, you only have limited length on label (40 chars). What about posting a metadata file which contains history and description, then put in the autoexec.sas file, create libname to area, and using the metadata file print some stuff to the log.
Am afraid your best bet though is version control.
09-29-2014 02:22 PM
It looks like Proc Datasets lets you use more than 40 characters now. The code below works;
proc datasets library=in1 nolist;
title 'Updated medications field, spelling corrected by Dr Jacobs at Temple';
I have an existing dataset that had no title and this titled it as written. The title only appears when the user runs a Proc Contents but I guess that's something! I don't think that the autoexec.sas approach would work because we have at least one hundred researchers at twenty clinical sites running their own analyses.
Thanks for your help RW9!
09-29-2014 03:18 PM
I would look into the SAS generation data sets, as well as simply adding a field in the database that shows the last updated date. The date is then linked to a metadata table somewhere else.
Or create the table as views where the column was automatically populated with the last modified date/reason, though it would be duplicated for every observation.
Given the size of your data set this may or may not be feasible.
09-29-2014 04:39 PM
For the short term using GDG's SAS(R) 9.4 Language Reference: Concepts, Third Edition would be a reliable way to control updates.
Doing version control on data is not the same as version control for software and that is not using version control applications. A word can have many meanings and there are many dogs called max. You have to be clear on your goal and think on how to achieve that. You are talking about release management change management etc.
This includes something with a good security design being traceable auditable (your question!). The update of the data must be controlled/logged.
Would you upgrade to 9.4 (no additional license cost) you could use on top: SAS(R) 9.4 Language Reference: Concepts, Third Edition (extended attributes)