Non-negative Matrix Factorization (Part 3): Making Recommendations Using Matrix Completion

In earlier installments of this series, Nonnegative Matrix Factorization (Part 1): Understanding Its Importance and Applications, we introduced the concept of non-negative matrix factorization, highlighting its benefits and diverse applications. Then, in Nonnegative Matrix Factorization (Part 2): Discovering Topics from Documents, we focused on the practical implementation of NMF, demonstrating how it can be applied to topic modeling using the NMF procedure.

We will now use Non-negative Matrix Factorization (NMF) on the rating matrix commonly found in recommender systems. A recommender system suggests relevant items to users by analyzing their preferences, behavior, or similarities with other users. One common example is a movie recommendation system, like the ones used by streaming platforms such as Netflix. These systems analyze user ratings of movies—such as giving a film 4 out of 5 stars—to learn about individual tastes. By comparing the ratings and viewing patterns of many users, the system can recommend movies that a user is likely to enjoy but hasn't seen yet. Here we assume that people who agreed in the past will agree in the future, and that they will like similar types of items as they liked in the past. So, in our movie analogy, the recommended movies come from people like you who also enjoyed movie A and movie B. This helps users discover new content tailored to their interests, enhancing their overall experience.

Shown here is a simple example of ratings data. There are two categorical variables: u for users and i for movies. All possible combinations of users and movies with their corresponding ratings results are displayed in a matrix.

As expected, most of the cells in the matrix would be missing. This is because most users have not seen more than a few movies and most movies have been seen by only a few users. The challenge is to estimate a user’s rating for a movie. The intuition that NMF used to solve this problem is that there should be some latent features that determine how a user rates a movie.

The IMPUTE statement in PROC NMF invokes low-rank matrix completion by using nonnegative matrix factorization to recover missing entries of the input data table. But, how?

A key advantage of low-rank matrices is that their essential information—the degrees of freedom—is much smaller than the total number of entries. This makes it possible to recover the entire matrix even from a limited number of observations. Without a rank constraint, recovering missing entries becomes ambiguous, since infinitely many completions can match the known values. The low-rank assumption provides the structure needed to make this recovery feasible.

Consider a simple example from Nguyen, Kim, and Shim (2019), a 2×2 matrix M with one unknown entry denoted by x. If M is a full rank, i.e., the rank of M is two, then any value except 10 can be assigned to x. Whereas, if M is a low-rank matrix (the rank is one in this trivial example), two columns differ by only a constant and hence unknown element x can be easily determined using a linear relationship between two columns (x= 10). This example is obviously simple, but the fundamental principle to recover a large dimensional matrix is not much different from this and the low-rank constraint plays a pivotal role in recovering unknown entries of the matrix.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

In a recommender system application, NMF factorizes a user-item rating matrix. As shown here, we decompose the user-item matrix X into two matrices. The first matrix W represents each feature, and the users associated with it. The second matrix H represents each item and the features it contains.

Here the features generated by NMF are typically referred to as latent features or latent factors. These are hidden patterns or characteristics inferred from user-item interaction data, such as viewing or rating history. These features are not directly observable but are discovered during matrix factorization.

Recommender system helps in multiple ways:

It predicts missing user-item ratings using learned latent features.
It suggests the top N items a user is likely to prefer based on predicted scores.
It represents users by their affinity to latent features for personalization and segmentation.
It identifies items with similar latent characteristics for related-item suggestions.
It helps make early recommendations for new users or items using inferred patterns.

Here we will illustrate how you can use the NMF procedure to make recommendations using matrix completion from ratings data.

To build a simple recommender system using NMF, we begin by loading the movie ratings data using PROC PYTHON. The data is reshaped into a dense matrix format, where each row corresponds to a user and each column to a movie. If a user hasn’t rated a movie, the value is missing. This matrix is then stored as a SAS data table.

Next, we apply PROC NMF with the IMPUTE statement to perform low-rank matrix completion. This step helps us fill in the missing ratings—essentially predicting how a user might rate movies they haven’t seen. The results are saved to a new table.

In the third step, we use PROC PYTHON again to pull the predicted ratings from the output table. We convert the SAS data into a Pandas DataFrame so we can analyze it and then we bring in additional movie metadata.

Finally, use the predicted ratings to generate recommendations—like the top 10 movies for a specific user, and a summary of top recommendations for each of the first 10 users.

Demonstration Example

This use case example is adapted from the SAS documentation. The data are derived from the MovieLens data set, which was developed by the GroupLens project at the University of Minnesota and is available at http://grouplens.org/datasets/movielens. This example uses the MovieLens 100K version. You can download the compressed archive file from the website at http://files.grouplens.org/datasets/movielens/ml-100k.zip and use any third-party unzip tool to extract all the files in the archive to the destination directory of your choice. The file that contains the movie ratings is u.data, which lists four columns:

User ID	Item ID	Rating	Timestamp
196	242	3	881250949
186	302	3	891717742
22	377	1	878887116
244	51	2	880606923
166	346	1	886397596
. . .	. . .	. . .	. . .

Overall, the dataset includes 943 users and 1,682 movies. It is highly sparse because most combinations of users and movies are not rated.

The PYTHON procedure is used to load data from a directory, convert it into a dense matrix format, and then transfer it to a SAS data table named mylib.ratings within your CAS session. Within PROC PYTHON, the SAS.df2sd method is employed to move data from a Python Pandas DataFrame to a SAS data set. The resulting mylib.ratings table includes the columns UserID, M1, M2, M3, and so on—each representing a different movie. Each row corresponds to a user's ratings for those movies. If a user did not rate a particular movie, the associated cell contains a missing value.

cas;
libname mylib cas;

proc python;
submit;

import pandas as pd
import numpy as np

# load data from the file
colname = ['userID', 'movieID', 'rating']
colpick = [0, 1, 2]
df = pd.read_csv('/path/to/your/directory/u.data', delimiter='\t', usecols=colpick, names=colname)

# store data in dense matrix format
nrow = max(df.loc[:, 'userID'])
ncol = max(df.loc[:, 'movieID']) + 1
mat = np.full((nrow, ncol), np.nan)

for i in range(0, nrow):
mat[i, 0] = i+1

for idx, rowSeries in df.iterrows():
val = rowSeries.values
mat[val[0]-1, val[1]] = val[2]

# transfer data to a SAS data table
cols = ['UserID'] + ['M%d' %i for i in range(1, ncol)]
matdf = pd.DataFrame(mat, columns=cols)

SAS.df2sd(matdf, 'mylib.ratings')

endsubmit;
run;

We then specify the IMPUTE statement in the NMF procedure to enable low-rank matrix completion to fill in missing values in the mylib.ratings table. The code below runs PROC NMF, outputting the imputed results to mylib.outX. Since the genre file (u.genre) lists 19 genres, a rank of 19 is used to generate 19 feature vectors. Given the sparsity of mylib.ratings, the procedure sets maximum iterations to 600 for the APG method and applies L₂-norm regularization to improve convergence and reduce sensitivity to initialization. The IMPUTEDROWSONLY and PREDONLY options ensure that only rows with imputed values are included in mylib.outX, with original ratings in those rows set to missing.

# perform low-rank matrix completion and output the imputed movie ratings 
proc nmf data=mylib.ratings rank=19 seed=6789
method=apg(maxiter=600) reg=L2(alpha=5 beta=5);
var m:;
impute out=mylib.outX imputedRowsOnly predOnly copyvar=UserID;
run;

The Model Information table displays basic information about the model, including the input data table, the number of VAR statement variables, the target rank of factor matrices, the factorization method, and the stopping criterion used in the computation. It also displays information about the factorization method, including the maximum number of iterations, the number of matrix updates at each iteration, the convergence tolerance, the random number seed, whether to scale input data or not, the way to handle the missing values, and the coefficient for extrapolation weight.

The Regularization Information table displays the regularization method and the regularization weight values.

The Imputation Summary table displays the degree of freedom of the recovered data matrix, the number of observed values of the data matrix, the number of imputed values, and the proportion of imputed values. The degree of freedom of an matrix with rank r is , which is a lower bound of the number of observed values to ensure the exact recovery of the rank r matrix.

The Iteration Results table displays the matrix factorization accuracy information, which includes the number of iterations, the relative error at which the iteration stops, and the stopping criterion.

Now invoke PROC PYTHON again to fetch the first 10 observations from the mylib.outX table, which contain predicted ratings for the first 10 users. The SAS.sd2df method transfers this data into a Pandas DataFrame. Then, load movie information from the u.itemfile. Produce the top 10 recommended movies for the 9th user; and generate a table that contains the top 5 recommended movies for each of the first 10 users:

proc python;
submit;

import pandas as pd
import numpy as np
import csv

# fetch the first 10 observations
df = SAS.sd2df('mylib.outX(obs=10)')

# load information about the movies
movieDict = {}
csvFile = csv.reader(open('/path/to/your/directory/u.item', encoding='latin-1'), delimiter='|')
for row in csvFile:
key = 'M' + row[0]
movieDict[key] = row[1]

# top 10 recommended movies for a single user
row = 8
uid = df.iloc[row, 0]
rating = df.iloc[row, 1:].sort_values(ascending=False, inplace=False)
colUid = [uid]*10
colRank = np.arange(1, 11).tolist()
colMid = rating.index.tolist()[0:10]
colRate = rating.values.tolist()[0:10]
colTitle = []
for i in range(0, 10):
colTitle.append(movieDict[rating.index[i]])

cols = ['UserID', 'Rank', 'MovieID', 'Title', 'PredictedRating']
topRating = pd.DataFrame(list(zip(colUid, colRank, colMid, colTitle, colRate)),
columns=cols)
SAS.df2sd(topRating, 'topRating')

# top 5 recommended movies for each of the 10 users
movies = []
for idx, rowSeries in df.iterrows():
uid = rowSeries.pop('UserID')
rowSeries.sort_values(ascending=False, inplace=True)
row = [uid]
for i in range(0, 5):
row.append(movieDict[rowSeries.index[i]])
movies.append(row)

cols = ['UserID', '_1_', '_2_', '_3_', '_4_', '_5_']
topRecom = pd.DataFrame(movies, columns=cols)
SAS.df2sd(topRecom, 'topRecom')

endsubmit;

The topRating output data shown here contains the top 10 recommended movies along with the predicted ratings for the 9th user:

Also, the topRecom output data shown here contains the top 5 recommended movies (sorted by descending predicted ratings) for each of the 10 users:

Concluding Remarks

Non-negative Matrix Factorization (NMF) is a powerful technique used in recommender systems to uncover latent relationships between users and items. By decomposing the user-item rating matrix into lower-dimensional, non-negative feature matrices, NMF captures underlying patterns in user preferences and item characteristics. These learned features enable the system to predict missing ratings and generate personalized recommendations, even in sparse datasets. Its interpretability and effectiveness make NMF a valuable tool for building robust recommendation engines. In SAS, you can also use the FACTMAC procedure to predict movie ratings. For a demonstration, check out the video "Factorization Machines Made Easy with SAS Visual Interfaces" on the SAS Users YouTube channel.

Find more articles from SAS Global Enablement and Learning here.

Non-negative Matrix Factorization (Part 3): Making Recommendations Using Matrix Completion

Registration is open

SAS AI and Machine Learning Courses