Non-negative Matrix Factorization (Part 1): Understanding Its Importance and Applications

Non-negative Matrix Factorization (NMF) is a linear algebra technique that decomposes a non-negative matrix into two lower-rank non-negative matrices, capturing underlying patterns in data and enabling feature engineering. It is a powerful dimensionality reduction and feature extraction technique used in machine learning and data analysis.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

NMF aims to approximately represent a nonnegative data matrix as the product of two low-rank nonnegative factor matrices. Given a dataset with n observations and p variables, NMF seeks to express X ≈ W H, where X is the n×p nonnegative data matrix, and W and H are the two low-rank nonnegative factor matrices of dimensions n×r and r×p respectively. Here r represents the rank of the low-dimensional factor matrices. The matrix W is usually called the features (or basis) matrix, while H is commonly referred to the weights (or coefficients) matrix.

Where Can NMF be Applied, and What Types of Data Is It Most Suitable For?

NMF is applied to non-negative data because it inherently enforces a parts-based, additive representation. Unlike other factorization methods that allow negative values and rely on cancellation effects, NMF decomposes data into non-negative components, making it particularly useful for applications where negative values have no meaningful interpretation.

Some common data types and their corresponding applications include:

Image data, for instance, consists of pixel intensity values, often represented in grayscale or RGB format, where all values are non-negative. NMF is frequently used for image decomposition and feature extraction, particularly in face recognition and object detection. For example, in facial recognition tasks using Eigenfaces, it helps separate significant image components, such as different facial features.
Document-term matrices that typically includes word frequency counts and term frequency-inverse document frequency (TF-IDF) scores. NMF plays a crucial role in extracting features for natural language processing (NLP). It is also widely used in topic modeling, where it decomposes documents into latent topics represented by word distributions. For example, automatic categorization of research papers, news articles, and social media content.
Biological data such as gene expression data, which includes microarray measurements and RNA-sequence count matrices, offering insights into genetic activity. NMF is applied to gene expression datasets to identify gene clusters and biological pathways. For instance, it aids in detecting hidden structures in biomedical data, such as different cancer subtypes.
User-item rating matrices store preference scores, making them essential for recommendation systems that personalize content. In collaborative filtering, NMF extracts latent features from user-item rating matrices. This enables personalized recommendations for movies, music, and products by learning underlying factors influencing user preferences.
Wavelength spectrograms encapsulate frequency-time representations of signals, commonly used for analyzing sound waves, electromagnetic waves, or any other wave-based phenomena. NMF is commonly used in analyzing spectral data and signal processing, where it extracts meaningful information from frequency, wavelength, or energy-based data. For example, it enhances speech by decomposing audio signals into essential components.

Beyond these industry-specific applications, NMF is also valuable for clustering, as its factorized components can be interpreted as cluster representations. This makes it useful for detecting outliers and anomalies in various datasets, such as fraud detection in financial transactions.

A Feature Engineering Technique

NMF is a feature engineering technique commonly used in machine learning and data analysis for dimensionality reduction, feature identification, and feature extraction. Here’s how NMF serves these purposes:

Dimensionality Reduction: NMF decomposes a high-dimensional dataset into two lower-dimensional matrices, effectively reducing the number of features while preserving essential structures in the data thereby reducing computational complexity and memory usage.
Feature Identification: NMF identifies latent structures within data by breaking it into understandable parts. Each column in the factorized matrices corresponds to an extracted feature that represent common patterns in the data. For example, in topic modeling, NMF can uncover underlying topics in a corpus by identifying meaningful word groupings and in image processing, it can reveal distinct parts of an object or scene.
Feature Extraction: NMF extracts meaningful and interpretable features by learning a parts-based representation of the data. The extracted features can be used as inputs for machine learning models to improve performance in classification, clustering, or anomaly detection tasks. For example, it extracts biologically relevant patterns from large genomic datasets.

How do we Control Approximation Accuracy?

In NMF, rank (r) refers to the number of latent features or components used to approximate the original matrix. You must specify rank, which must be an integer greater than zero.

The choice of r determines the complexity and quality of the approximation. It basically tells you how many “essential” rows or columns are needed to define the two low-rank nonnegative factor matrices. A small rank results in a more compressed representation but may lose important details. A large rank provides a more detailed approximation but may lead to overfitting. Choosing the optimal rank is crucial and often done empirically or using techniques like cross-validation, the elbow method, or domain knowledge.

An Optimization Algorithm

NMF is essentially an optimization algorithm because it seeks to find two lower-rank nonnegative matrices whose product best approximates the original matrix. This involves minimizing a reconstruction error, typically measured using a loss function like the Frobenius norm or Kullback-Leibler divergence, through iterative updates of the factor matrices. SAS Viya implements two optimization methods for performing NMF – Alternating Proximal Gradient (APG) and Compressed Alternating Proximal Gradient (CAPG) using Random Projections.

How is Data Prepared for the NMF Procedure?

The NMF procedure performs nonnegative matrix factorization in SAS Viya and requires input data to follow a specific structure. Sparse matrices typically contain a large number of zero values. To save memory, such data is commonly stored in the COO (Coordinate List) format, which records only the non-zero entries using triplets of the form (row, column, value). This compact representation significantly reduces storage requirements compared to storing the full matrix. For example, in text data, rows may represent terms, columns represent documents, and the values are term counts. Similarly, in recommender systems, rows represent users, columns represent items (e.g., movies), and values represent ratings.

However, the NMF procedure does not support input in COO format. It requires input as a dense matrix, where all values (including zeros) are stored in a contiguous two-dimensional array. In this grid-like representation, values such as counts or ratings are stored across multiple columns within each row.

Therefore, data stored in COO format must first be converted to dense format before applying NMF. This conversion can be done in several ways: using a SAS DATA step for small datasets, using the PYTHON procedure in SAS to leverage Python code within SAS, using the FEDSQL procedure, which allows high-performance, ANSI SQL:1999-compliant queries across diverse data sources, and more.

Additionally, from an implementation perspective, working solely with IDs can be challenging. It is often useful to maintain row and column labels, which can be stored separately in row metadata and column metadata files to aid interpretation and downstream analysis.

The upcoming posts in this series — Nonnegative Matrix Factorization (Part 2): Discovering Topics from Documents and Nonnegative Matrix Factorization (Part 3): Making Recommendations Using Matrix Completion — will demonstrate the implementation of PROC NMF for topic modeling on text data and for building a recommender system using user-item ratings, respectively.

Concluding Remarks

Nonnegative Matrix Factorization (NMF) offers interpretable, parts-based representations by enforcing nonnegativity, often making it more intuitive than methods like PCA or SVD. It effectively reduces dimensionality while preserving structure, making it valuable in fields like text mining, audio separation, and bioinformatics. Its nonlinear nature captures local patterns, aiding in clustering. However, NMF has limitations: solutions are not unique, computations can be intensive, optimization may get stuck in local minima, and results are sensitive to initialization. Choosing the right rank also requires care, often needing domain expertise or validation.

Find more articles from SAS Global Enablement and Learning here.

Non-negative Matrix Factorization (Part 1): Understanding Its Importance and Applications

2025 SAS Hackathon: There is still time!

SAS AI and Machine Learning Courses