In SAS DI Studio how to archive sas datasets?

Reply
Contributor SYN
Contributor
Posts: 39

In SAS DI Studio how to archive sas datasets?


I have three sas datasets. Please give me some solution. Thank you.

1st dataset Requirement: I need to store 3 years (36 months) data initially. Every month new data will come (37 months). I need to archive old month data every month to keep always 36 months data alive.

2nd dataset Requirement: I need to store 1 month data initially. Every month new data will come (2 months). I need to archive old month data every month to keep always 1 month data alive.

3rd dataset Requirement: Same like 2nd dataset requirement

Respected Advisor
Posts: 3,887

Re: In SAS DI Studio how to archive sas datasets?

What volumes are you dealing with? What DI version are you using?

There is no standard transformation for rolling months. It actually would be a nice extension for the table loader as your requirement is not that rare.

There is also no standard transformation able to selectively delete rows in a table.

I use to implement such things as separate housekeeping jobs to be run at the very end of an ETL process - or if time is sparse then to be run on another day when there is enough time for it. I don't think it should matter if a table contains 37 months of data for a time. If this is an issue for end-users then you can always also create a view which does the selection on the current 36 months - and it's the view end-users have to use.

If the target tables live in a database like Oracle then I would suggest to have the 36month table partitioned along months (a DBA needs to set that up). I would then ask the DBA to archive old partitions once a month. On the SAS DIS end you wouldn't have to care about archiving at all.

For SAS tables what you could do is: You have a ETL job which just appends data to the target. Then with a housekeeping job have your 36month+1month table as source, use a splitter, create one output with 36 months and one output with 1 month to archive, use table loaders to target (re-creating the target tables).

For the archive table: If you need to create a separate table every month then use in table metadata for the physical name of the table a macro variable and set in the pre-code of the housekeeping job this macrovariable to the physical table name of the archive table you want to create.

For your 2nd and 3rd requirement: Simply first load the existing data into an archive table (same story with macro var for the table name), then load the new data into the target table using "replace". If the archiving area is on the same disk then instead of loading the "old" data into an archive table via table loader simply moving/renaming the table would be considerably faster (as data needs not to be re-written). The downside of this aproach is that it needs to be coded instead of using standard transformations.

Ask a Question
Discussion stats
  • 1 reply
  • 494 views
  • 0 likes
  • 2 in conversation