11-12-2014 12:14 PM
Typically, my organisation's SAS database design is a mix of approaches depending on the developer who designed it.
I've been searching for a long time about the preferred style of physical table design within DI which will marry up with the use of the BI tools.
I feel that there is too much focus on how things were done about 5 years ago to navigate around problems with I/O and space constraints.
Currently, many of the tables that are used by analysts with my organisation are monthly tables with a date suffix at the end of either YYYYMM or MON_YYYY. I feel that these don't naturally progress through into the BI tools - would people agree? This also leads to problems with (that is unable to) opening tables within DI as the preferences of tables have macro variables within them.
Please can people share what they think is the best physical table structure within DI which will then allow users to get maximum benefit from the BI tool (OLAP cubes, Web Reports etc).
I personally think that it should be....
Reception Layer contains the unformatted information from either raw files or SQL database connections.
Q: Do people retain the information within SAS? If so, how? Date suffixed tables? (I ask as a SAS form of the raw data might need to be retained in circumstances and doubt people would store in dimensions etc?)
Foundation Layer contains a star schema type approach of dimensions and fact tables using custom transformations (as SCD works slowly).
Exploitation Layer would be tables the analysts frequently use with single table containing multiple observation points allowing trend analysis etc.
Are there any good SAS papers out there for this type of topic?
Many thanks in advance.
11-14-2014 10:31 AM
Hard to give some concrete advice without knowing more about your solution, constraints, requirements.
But I'll try to give some best practice advice, you decide how they apply to your site or not.
It seems that you are in data mart heaven/hell.
It also sounds like you lack of a detailed warehouse architecture, including development and naming standards. That could be a starting point.
Personally, I'm not too found of having the detail layer using dimensional models, they are better suited for the data mart layer. Problems could be handling conformed dimensions together with disparate fact tables which is different levels of granularity.
A more complete architecture would consist of
There is a load of papers and books on these issues.
And no, don't have any suffix/prefix on tables that is connected to some period. That work poorly with standard BI tools and metadata. Perhaps in the information mart layer (where you can rely of some application logic to take care of dynamic naming). The other exception if you store the data in DBMS that can partition by period - and the hide the period part of the table name. These include SAS SPD Server (Dynamic Clusters) or most other RDBMS.
11-14-2014 02:20 PM
Many thanks Linus for taking the time to reply.
I agree that we have a lack of architecture and development standards as each one was developed in its own right.
I'll have a read up about the 3NF and Data Vault styles.
Can you recommend and books or papers on these topics?
11-16-2014 05:45 PM
When it' comes to 3NF modelling, there are tons of books, from Date/Codd until now. Even if it's not solely about modelling, Inmons "Building The Data Warehouse" is still seen as a foundation for most DW architectures.
Data Vault was founded by Dan Linstedt, and he has some good intro papers on Data Vault Series 1 - Data Vault Overview. He also wrote the book "Super Charge Your Data Warehouse" which you got get at http://learndatavault.com/.
There is another more book I wish push for, Hans Hultgrens “Modeling the Agile Data Warehouse with Data Vault”, which is more comprehensive and a bit more pedagogic, on Amazon.
11-14-2014 06:57 PM
If your target data base allows for partitioning then that's the way to go and having dates in table names would be a bad practice.
Partitioning of base SAS tables (SAS files) is not possible so there and if it's about having something like monthly summery tables I sometimes take the approach of having a date in the table name. I then would also create views over these tables - eg. table_current - which get re-created in the same process that creates the monthly summaries. You then use these views with stable names for registering metadata.
You will find a multitude of books and papers around the subject of data modelling and data architecture. There is one site I really like - not for learning the theory but as a starting point when I have to come up with a data model: http://www.databaseanswers.org/data_models/index.htm
Under the above link there is one model I love specifically Father of all Data Models Data Model
It's not a model one would ever seriously implement as loading or querying such a model would be hell. It's a great model though to explain to customer why requirements like "fully flexible and extensible data model" are unreasonable.
11-16-2014 06:49 PM
Plenty of good advice here. However I would disagree that date-suffixed datasets are not best practice under any circumstances.
If you are creating data marts and reporting systems whose entire focus are monthly, for example, monthly-suffixed datasets are worth considering for their simplicity, processing efficiency, easy maintainability and high productivity. They are easy to agreggate across multiple months as well.
I would agree though, that if you require greater flexibility in reporting periods and other dimensions then other architectures, such as those discussed above, are the way to go.
It is worth pointing out that an over-engineeered data warehouse can be as bad as one that is too simplistic and inflexible.
11-17-2014 02:56 AM
If you need a bicycle, don't buy a Ferrari. But with time you might be covering a larger area, and the bike won't do the job no more.
https://communities.sas.com/people/%40thesasgeek describes an under-engineered environment. They tend to be person dependant and cost more to administer in the long run. Also they lack of audibility and traceability and the it's hard to get the "one version of the truth".
11-19-2014 09:12 PM
Like the analogy. In our case the bicycle is more like a Mini (Cooper S?), agile and fit-for-purpose. The Ferrari is available but it has no comprehensive workshop and user manuals. The mechanics therefore don't understand how it works, have difficulty keeping it well-maintained and struggle with repairs.