BookmarkSubscribeRSS Feed
HeatherNewton
Quartz | Level 8

I have a project where sas data sets are accumulated daily for many years up to over 1 TB. Now we are trying to trim the data to only containing what is absolutely necessary say e.g. 4 years at most if no impact to all calculations etc. Now my question is with many many interwining sas datasets and lots and lots of very very long sas programs, how can I efficiently find out what data is involved in what sas programs such that I can confirmed which data sets I can trim down and not impact any results? Please assist. 

 

I tried to chase how each dataset is created and what is involved etc but it is very hard as the sas programs is very very long. Any one has experience or know a systematic way to do it which would work?

 

We tried to ask our users they have no idea how calculations are derived or how old data are involved etc...

 

Thanks.

5 REPLIES 5
japelin
Rhodochrosite | Level 12
I don't think you can do anything without understanding overview of each process in the program.
And I think some trial and error is in order.

1. Decide "what is absolutely necessary".
2. Try trimming dataset.
3. Submit program step by step and compare trimed dataset with whole dataset to check the impact on results.
HeatherNewton
Quartz | Level 8
Oh the problem is we cannot test either. We are in the process of moving the project to oracle and using data stage as part of a big integration project. . At this stage nothing has been created yet in data stage and therefore not possible for any testing. Also our goal is to find out how much data to keep before we stsrt building the program out of SAS. What could work under this circumstances?
japelin
Rhodochrosite | Level 12

It is advisable to abandon the idea of changing a program in an environment where testing is not possible.
Whether it is Oracle or SAS, some problems can occur, so a situation where there is no test environment is not very desirable.
First, consider creating a test environment.


I think the most effective way to deal with the current situation is to understand not overview but details of the program.

SASKiwi
PROC Star

You haven't explained what your Oracle / DataStage project is going to do. Will it replace what you are currently doing in SAS? If so then you need to understand your current SAS processing to have any hope of reproducing that in Oracle. Once you understand that, then build the same process in Oracle.

 

Run both SAS and Oracle / DataStage processes, then compare outputs. If outputs are the same then you can switch over to Oracle and no longer run SAS. 

SASKiwi
PROC Star

I would start by identifying the SAS datasets of interest - the ones from the last four years. Then I would search the SAS programs to see where they are created. From where they are created I would track back to find all of the inputs required to create these SAS datasets. I find Notepad++ is a very good tool to search and navigate SAS programs as they are just text files after all. It's free too.

 

This does mean you will need to have an understanding of the SAS processes that produce these datasets. If you don't have sufficient SAS experience to understand what the code is doing then engage someone who does. There is no methodology you can employ to overcome limited SAS knowledge or experience.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 879 views
  • 0 likes
  • 3 in conversation