Efficient “One-Row-per-Subject” Data Mart Construction for Data Mining

Authors

Gerhard Svolba, SAS Austria

Paper 078-31 - SUGI 31 in SAS Francisco, CA, 2006

Abstract

Creating a "one-row-per-subject" data mart is a fundamental task when preparing data for data mining. To answer
the underlying business question, the analyst or data-mart programmer is challenged to distill the relevant
information from various data sources.
Creating a data mart involves more than reading columns from a source table to the data mart. It also includes the
aggregation or transposition of observations from "multiple-row-per-subject" tables like transactional tables and time
histories. This process is a critical success factor for being able to answer the business question or to have good
predictors available for a target event or target value.
This paper shows the details of input data structures for a "one-row-per-subject" data mart. It also discusses the
"one-row-per-subject" paradigm from a technical and a business point of view, and shows how data, some of which
include hierarchical dependencies, is aggregated into a “single-row-per-subject.” A comprehensive example shows
how a "one-row-per-subject" data mart is created from various data sources.

Watch the presentation

This SAS Global Forum paper has been published as a number of presentations of in my Data Preparation for Data Science webinar series.

DOWNLOAD THE FULL PAPER

https://support.sas.com/resources/papers/proceedings/proceedings/sugi31/078-31.pdf

DOWNLOAD THE SLIDES

Slide Collection: Slides Deck #121 and #122 at https://github.com/gerhard1050/DataScience-Presentations-By-Gerhard
Find the original presentations from SUGI 31 (2006) in the attachment. Note that you find the re-worked presentation in the link above.

CONCLUSION

This paper shows that data preparation is a key success factor for analytical projects. Besides multiple-row-persubject data marts or longitudinal data marts, the one-row-per-subject data mart is frequently used and is central for statistical and data mining analyses.
In many cases the available source data have a one-to-many relationship to the subject itself. Therefore data needs to be transposed or aggregated. The selection of the transposition and aggregation methods is primarily driven by business considerations of what will give meaningful derived variables.
This paper shows you ways to create a one-row-per-subject data mart from a conceptual and a coding point of view.
Anyone who is interested in more details on Data Preparation for Analytics is referred to my book with the same name, which will be published by SAS Press (Book# 60502) in September 2006 References

ASK THE EXPERT SEMINAR (in German)

SAS Press Books

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

SAS Communities Library

Efficient “One-Row-per-Subject” Data Mart Construction for Data Mining

Abstract

Watch the presentation

DOWNLOAD THE FULL PAPER

DOWNLOAD THE SLIDES

CONCLUSION

Recommended Reading

ASK THE EXPERT SEMINAR (in German)

SAS Press Books

HEX - Data for Good

Automation in SAS® Visual Data Mining and Machine Learning

NMG Consulting - AI Powered Text Mining Strategy

Data Crew - Digital Inclusion

Unsupervised Learning in SAS Visual Data Mining and Machine Learning

Follow Us

What is...

SAS Communities Library

Efficient “One-Row-per-Subject” Data Mart Construction for Data Mining

Abstract

Watch the presentation

DOWNLOAD THE FULL PAPER

DOWNLOAD THE SLIDES

CONCLUSION

Recommended Reading

ASK THE EXPERT SEMINAR (in German)

SAS Press Books

Join us for our biggest event of the year!

SAS AI and Machine Learning Courses

HEX - Data for Good

Automation in SAS® Visual Data Mining and Machine Learning

NMG Consulting - AI Powered Text Mining Strategy

Data Crew - Digital Inclusion

Unsupervised Learning in SAS Visual Data Mining and Machine Learning

Follow Us

What is...