I have to create an analysis data mart based on a complex query and retrieve the last 20 months of data. Also the information has to be splitted monthly.
So my question is whether to create 20 different data sets i.e. dsnameyyyymm or a single one with 19 versions using the gennum option.
I am keen to do the gennum one but since I am working with EG and V9 engine am not sure whether this is the best option (I heard it'll be phased out at some point but not sure if this applies to V9 engines).
Store the data in a single dataset with an indicator as to month. Then you can use a parameter (e.g. macro variable) to subset for the particular analysis set that you want to do. That is much easier than managing 20 datasets.
Within SAS Base, I humbly disagree, especially considering a possible data/observation count scaling issue - you may want to consider using SAS PROC DATASETS and the AGE processing to create/maintain a defined set of "cycles". Of course, if the cycle location of an observation (for a given MONTH or whatever selection you may use) is undetermined, then AGE / cycles will likely not work effectively and a single file with a WHERE statement for filtering will be more applicable.
Regardless, hopefully this type of "master" file/cycle approach warrants a "frequent" backup copy, in case a recovery/restore situation is warranted?
The AGE statement does the same kind job as the gennum but rather than have #001 to #020 I will have dst01, dst02.... dst20, the AGE statement is used to move data to have last 20 months (in my case) datasets, hence dst20 which was keeping the 21th month of data is deleted and all data sets are shifted by 1 and now the dst01 keeps data for the most recent month and dst20 keeps the 20th month data
am I right?
I forgot to mentioned but each data set is 10 millions obs in average that why I was thinking on the gennum option.
Is there any sugi SGF paper(s) you would recommend me?