- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a project where sas data sets are accumulated daily for many years up to over 1 TB. Now we are trying to trim the data to only containing what is absolutely necessary say e.g. 4 years at most if no impact to all calculations etc. Now my question is with many many interwining sas datasets and lots and lots of very very long sas programs, how can I efficiently find out what data is involved in what sas programs such that I can confirmed which data sets I can trim down and not impact any results? Please assist.
I tried to chase how each dataset is created and what is involved etc but it is very hard as the sas programs is very very long. Any one has experience or know a systematic way to do it which would work?
We tried to ask our users they have no idea how calculations are derived or how old data are involved etc...
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
And I think some trial and error is in order.
1. Decide "what is absolutely necessary".
2. Try trimming dataset.
3. Submit program step by step and compare trimed dataset with whole dataset to check the impact on results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It is advisable to abandon the idea of changing a program in an environment where testing is not possible.
Whether it is Oracle or SAS, some problems can occur, so a situation where there is no test environment is not very desirable.
First, consider creating a test environment.
I think the most effective way to deal with the current situation is to understand not overview but details of the program.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You haven't explained what your Oracle / DataStage project is going to do. Will it replace what you are currently doing in SAS? If so then you need to understand your current SAS processing to have any hope of reproducing that in Oracle. Once you understand that, then build the same process in Oracle.
Run both SAS and Oracle / DataStage processes, then compare outputs. If outputs are the same then you can switch over to Oracle and no longer run SAS.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would start by identifying the SAS datasets of interest - the ones from the last four years. Then I would search the SAS programs to see where they are created. From where they are created I would track back to find all of the inputs required to create these SAS datasets. I find Notepad++ is a very good tool to search and navigate SAS programs as they are just text files after all. It's free too.
This does mean you will need to have an understanding of the SAS processes that produce these datasets. If you don't have sufficient SAS experience to understand what the code is doing then engage someone who does. There is no methodology you can employ to overcome limited SAS knowledge or experience.