03-02-2016 12:40 AM
I have build a SAS program which, on quarterly basis will execute and program will import all the new quarter data files from PC Folder into SAS, and then will perform the calculations and produce required output dataset (and then export the details to PC folder).
But prior to this process, I'd like to check whether the output dataset is already present, if yes then take a backup of the same and store it in a different location (with new name as DataSetName_DateTimeStamp). Furthermore, if need be I'll have to check this backup file for re-run activity.
Can anybody advise what process or steps should I follow to get this done.
Thanks, SAS NewBee..
03-02-2016 12:46 AM
03-02-2016 12:53 AM
03-02-2016 04:31 AM
@LinusH is quite correct, there are plenty of software solutions already available for free and paid which are designed to do such things. For example, whilst TortoiseSVN is really for text files, you can set it up to keep version history on all your files, then you can have multiple branches for the various imports, and when done merge everything back to main development area.
Whilst you write a long program which goes through checking (using %sysfunc(fileexist() for example), and then datasetp setting to new libnames and such like, it both becomes unwieldy - just manging all the different versions, and programming becomes more complicated.
If you don't want to/can't use version control software - and to be honest in this day and age it should be used by everyone regardless of role/type of work - then why not make a collected dataset with each quarter appended to the data already there. Then you could have an additional variable quarter which identifies each block of data - so you only have one dataset, but can still subset out each quarter dataset based on this variable. Far simpler to implement and maintan, but data may get big!
A final methodology would be to emulate a cube structure by putting quarter in the dataset name, e.g. MY_QTR1, MY_QTR2. By doing this you always have all data there, and it takes up the same space on the drive as a big dataset, but would be quicker to access, though programming becomes a bit more difficult, negated if you can know the quarter up front of course.
Finally to go back to my original point, recreating software that is already available and field tested is not a good idea. Even if you followed SDLC process, documented everything, tested fully, the lifecycled the whole thing, you would still end up with a product which is a) harder for users to understand - as its not widely used, b) harder to maintain - as you have to do it, c) buggier.