Rule-Based Codebook Generation for Exploratory Data Analysis
- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
A codebook is a summary of a collection of data that reports significant features of the assembled data. It can be used to provide insights into the data collection. Survey data are often assembled into codebooks for rapid understanding of the questions asked, and typically are limited to variable name, variable label, categorical variable values and their frequency counts, and simple descriptive statistics for continuous variables.
Ideally, a codebook provides more value to a statistician or data miner when it presents variables in a format suitable for exploratory data analysis. The %CODEBOOK macro described in this paper uses the SAS® Enterprise Miner heuristic rules for producing metadata, and classifies variables according to the Enterprise Miner rules into nominal, ordinal, or interval measurement scales. It creates presentations of the variables appropriate to their measurement scale and may be used as an exploratory data analysis (EDA) tool for a first look at a set of data.
I am sharing the %CODEBOOK macro with the SAS community to assist with the EDA phase of a project. I hope that you will find it useful in your work.