09-05-2012 01:38 PM
I'm fairly new to using SAS software, and would appreciate some general directions on a very basic problem.
Imagine that you have a cube with a star schema that stores data on customer relations, and that you're asked to provide end-users with history/snapshots of this data that they can analyze by months and by quarters. The program must provide snapshots of the past 10 months or so, and capture a new snapshot by the end of each future month. All snapshots must be from the 30th/31st, whichever is the last day of the month.
How would you typically approach such a problem? And would you make two programs: one for capturing historical data 10 months back, and one for automatically capturing future data as it becomes available? I guess those would be two separate data flows whose tables would be parsed, in the same ETL job.
I'd especially appreciate advice on what kind of code I might use for the WHERE statement of the first Extract transformation in such a situation, in order to get the correct monthly snapshots from the source table.
Definitely need some more experience in order to wrap my head around this. In any case, thanks for your attention.
09-05-2012 03:13 PM
One of the keys will be understanding how the INTNX function works. It returns the first day of a time period. If we were dealing with SAS data sets, and variables stored as SAS dates, this would be a possibility:
where (intnx('MONTH', "&sysdate9"d, -12) <= snapshot_date < intnx('MONTH', "&sysdate9"d, 0));
In this example, &SYSDATE9 is an automatic macro variable (text string) representing the date the program began, and "&SYSDATE9"d translates that value into SAS's date scale. The INTNX function identifies the first day of the month from 12 months ago, and the second INTNX function identifies the first day of the current month.
This will have to be adapted to tables (which typically contain values that are not on the SAS date scale (either date times or text strings as snapshot dates). And rather than using the current date you may want to let the user enter a date. But either way, this concept would be a good place to start.
09-08-2012 01:53 PM
I've set up a job that successfully creates a snapshot from the end of a single month, using the INTNX function. The month is determined by means of a parameter in the job properties. If the parameter is set to -1, the job creates a snapshot of the previous month (August); if set to -2 it creates a snapshot of July, etc.
I have a problem deciding what to do next, though. I figure I should create an iterative job that places my job inside a loop and runs it until there's no more historical data available (February 2009), and that I should end up with a data set where each monthly snapshot has been appended together. Does that sound correct?
TomKari: Thanks, that makes sense, though if I understand correctly someone else will handle things like hierarchies and MDX after I've finished this part. I'm working exclusively in DI Studio now.
09-05-2012 06:22 PM
Typically, cube / OLAP environments are used for exploratory analysis, in other words the person requiring information uses a tool such as the OLAP viewer in Enterprise Guide or Excel to dynamically drill around the cube looking for information. The language used to do this is MDX.
If I understand your request correctly, you would like to extract data on a regular basis, using criteria such as "extract monthly summaries for dates between x and y". I would suggest the following approach:
1. Using Excel or Enterprise Guide, explore your cube to get exactly the information on the screen that you would like in your extract.
2. Capture the MDX code that retrieves that information. In EG, you "create a slice". If you examine the properties of the slice, you'll see the MDX code.
3. Create a new library for your cube, and send your MDX code to it as a "pass-through" query. That should return your desired data in the form of a SAS dataset, which can then be analyzed, reported on, distributed, etc.
4. Once all of this works, change the variable parts of your query (such as start and end dates) into parameters / macro variables, that can then be created as part of your scheduled ETL process.