Yu,
No problem - there's a lot to consider. Have you read "The Data Warehouse Toolkit", by any chance? If you haven't, I'd highly recommend it since it addresses most of the type of questions you're asking as well as the implications of possible alternatives.
I misread your post - I shouldn't have recommended a "month dimension". You would of course use a Date Dimension that has one record for each day in your time period. This generally includes many descriptive columns like month, year, quarter, etc to support summarization. I've got several programs I've used to generate such tables in the past (you usually do this programmatically), so I'll dig one up and forward it along.
For your other questions:
- I'm not sure what you mean about "prd_mth". In this situation, you generally only keep the sales date in the fact table. If there is a different date besides the sales date, please describe it and I can give further advice.
- A "degenerate dimension" is not a separate table. This just means keep the actual ORDER_ID column as a column in the fact table. This because most other columns in the Order table are probably factored out into the fact or other dimension tables (order date is in the fact table, item price, cost in the fact table, item descriptive info in a "Product Dimension", etc). You still need the order ID to be able to group items of the same order, but since there's only one column, you don't create a separate table for it, but keep it as a column in the fact table.
- How you store the dimension is determined on a dimension-by-dimension basis. While you could just store current value and overwrite, you often need to "version" your dimension records when specific columns change so that you have a history of these attribute values. This is referred to as "Slowly Changing Dimensions" (either Type 2 or Type 3) . This is an extensive topic and I'd refer you to the "Data Warehouse Toolkit" for specifics. DI Studio contains the Type 2 SCD Loader to assist you with maintaining this kind of model.
- I was referring to where you store your dimension/fact tables, yes. Choice of database for a data mart is a HUGE decision and you should take the time to consider many factors. I can't do a complete discussion of this here, but there are many differences between SAS datasets and an RDBMS like Oracle, including security, partitioning, horizontal scalability, bulkloading, query optimization, indexing techniques, etc, etc. The list goes on and on. I'd recommend that you describe your application to a DBA and a SAS expert and get some of their opinions to assist you with your decision.
- Cube design is another fairly extensive topic. I can tell you that, yes, you can use the Star Schema we've been discussing as the basis for one or more cubes. When designing a cube, however, you generally DO NOT want to put the lowest level of granualarity in the cube. Cubes are about summarizing data for quick (sub-second) analysis and drill down. If you do put low level granualarity in a cube (like dimension primary keys) you'll have bad performance during build and query. You want to understand the summarization levels your users require for analysis and they generally don't care for analysis purposes about individual records (1 among millions). When they do need to drill down to an individual record, there are other techniques to employ.
As you said, there's a lot to learn. I'd definitely recommend consulting some reference texts like "Data Warehouse Toolkit" and some experts in your organization to aid your decision.
Thanks,
Tim Stearn