I have a project where sas data sets are accumulated daily for many years up to over 1 TB. Now we are trying to trim the data to only containing what is absolutely necessary say e.g. 4 years at most if no impact to all calculations etc. Now my question is with many many interwining sas datasets and lots and lots of very very long sas programs, how can I efficiently find out what data is involved in what sas programs such that I can confirmed which data sets I can trim down and not impact any results? Please assist.
I tried to chase how each dataset is created and what is involved etc but it is very hard as the sas programs is very very long. Any one has experience or know a systematic way to do it which would work?
We tried to ask our users they have no idea how calculations are derived or how old data are involved etc...
Thanks.
It is advisable to abandon the idea of changing a program in an environment where testing is not possible.
Whether it is Oracle or SAS, some problems can occur, so a situation where there is no test environment is not very desirable.
First, consider creating a test environment.
I think the most effective way to deal with the current situation is to understand not overview but details of the program.
You haven't explained what your Oracle / DataStage project is going to do. Will it replace what you are currently doing in SAS? If so then you need to understand your current SAS processing to have any hope of reproducing that in Oracle. Once you understand that, then build the same process in Oracle.
Run both SAS and Oracle / DataStage processes, then compare outputs. If outputs are the same then you can switch over to Oracle and no longer run SAS.
I would start by identifying the SAS datasets of interest - the ones from the last four years. Then I would search the SAS programs to see where they are created. From where they are created I would track back to find all of the inputs required to create these SAS datasets. I find Notepad++ is a very good tool to search and navigate SAS programs as they are just text files after all. It's free too.
This does mean you will need to have an understanding of the SAS processes that produce these datasets. If you don't have sufficient SAS experience to understand what the code is doing then engage someone who does. There is no methodology you can employ to overcome limited SAS knowledge or experience.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.