BookmarkSubscribeRSS Feed

Efficient “One-Row-per-Subject” Data Mart Construction for Data Mining

Started ‎01-23-2022 by
Modified ‎04-18-2022 by
Views 3,985
Authors

 

 

 

Gerhard Svolba, SAS Austria
Paper 078-31 - SUGI 31 in SAS Francisco, CA, 2006
 

Abstract

Creating a "one-row-per-subject" data mart is a fundamental task when preparing data for data mining. To answer
the underlying business question, the analyst or data-mart programmer is challenged to distill the relevant
information from various data sources.
Creating a data mart involves more than reading columns from a source table to the data mart. It also includes the
aggregation or transposition of observations from "multiple-row-per-subject" tables like transactional tables and time
histories. This process is a critical success factor for being able to answer the business question or to have good
predictors available for a target event or target value.
This paper shows the details of input data structures for a "one-row-per-subject" data mart. It also discusses the
"one-row-per-subject" paradigm from a technical and a business point of view, and shows how data, some of which
include hierarchical dependencies, is aggregated into a “single-row-per-subject.” A comprehensive example shows
how a "one-row-per-subject" data mart is created from various data sources.

Watch the presentation

This SAS Global Forum paper has been published as a number of presentations of in my Data Preparation for Data Science webinar series.

 

 

 

 

 

 

 

 

 

 

DOWNLOAD THE FULL PAPER

https://support.sas.com/resources/papers/proceedings/proceedings/sugi31/078-31.pdf

 

DOWNLOAD THE SLIDES

 

CONCLUSION

This paper shows that data preparation is a key success factor for analytical projects. Besides multiple-row-persubject data marts or longitudinal data marts, the one-row-per-subject data mart is frequently used and is central for statistical and data mining analyses.
In many cases the available source data have a one-to-many relationship to the subject itself. Therefore data needs to be transposed or aggregated. The selection of the transposition and aggregation methods is primarily driven by business considerations of what will give meaningful derived variables.
This paper shows you ways to create a one-row-per-subject data mart from a conceptual and a coding point of view.
Anyone who is interested in more details on Data Preparation for Analytics is referred to my book with the same name, which will be published by SAS Press (Book# 60502) in September 2006 References

 

 

Recommended Reading

 

ASK THE EXPERT SEMINAR (in German)

 

SAS Press Books

Version history
Last update:
‎04-18-2022 04:35 PM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags