Gerhard Svolba, SAS Austria
Creating a "one-row-per-subject" data mart is a fundamental task when preparing data for data mining. To answer
the underlying business question, the analyst or data-mart programmer is challenged to distill the relevant
information from various data sources.
Creating a data mart involves more than reading columns from a source table to the data mart. It also includes the
aggregation or transposition of observations from "multiple-row-per-subject" tables like transactional tables and time
histories. This process is a critical success factor for being able to answer the business question or to have good
predictors available for a target event or target value.
This paper shows the details of input data structures for a "one-row-per-subject" data mart. It also discusses the
"one-row-per-subject" paradigm from a technical and a business point of view, and shows how data, some of which
include hierarchical dependencies, is aggregated into a “single-row-per-subject.” A comprehensive example shows
how a "one-row-per-subject" data mart is created from various data sources.
This SAS Global Forum paper has been published as a number of presentations of in my Data Preparation for Data Science webinar series.
https://support.sas.com/resources/papers/proceedings/proceedings/sugi31/078-31.pdf
This paper shows that data preparation is a key success factor for analytical projects. Besides multiple-row-persubject data marts or longitudinal data marts, the one-row-per-subject data mart is frequently used and is central for statistical and data mining analyses.
In many cases the available source data have a one-to-many relationship to the subject itself. Therefore data needs to be transposed or aggregated. The selection of the transposition and aggregation methods is primarily driven by business considerations of what will give meaningful derived variables.
This paper shows you ways to create a one-row-per-subject data mart from a conceptual and a coding point of view.
Anyone who is interested in more details on Data Preparation for Analytics is referred to my book with the same name, which will be published by SAS Press (Book# 60502) in September 2006 References
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.