BookmarkSubscribeRSS Feed
Antti_Heino
SAS Employee

Santa needs to decide what presents everyone gets. That is a momentous task which might require an efficient machine learning (ML) model. Factorization machines fit this description as they can handle large data with high volumes of missing values (sparse data).

 

This ML model can learn preferences of people to recommend any products such as movies or books. The purpose is to predict how a person would react to a set of products, and then recommend products that the person is likely to prefer the most.

 

Let's look at an example about books as they are a common xmas present. Typically, people are more likely to buy a book if their predicted rating for the book is higher than 4.5. Therefore, we predict ratings for multiple books and recommend the ones that the person is likely to rate 4.5 or above. If rating data would not be available to Santa, other indicators of popularity like pageviews in online stores could be used instead.

 

Santa's database could look like the table below. Each person has read and rated some books, but most observations are empty. This type of data can be challenging if there are a lot of products to choose from as that causes also a lot of empty observations. However, factorization machine is built for this type of sparse data and can calculate the predictions for unrated books based on the available data.

 

  Book A Book B Book C Book D
Person 1 3 5 - -
Person 2 - 3 1 -
Person 3 4 - 3 5
Person 4 5 3 - -

 

The Factorization Machine procedure estimates factors for each of the predictors (Book and Person), in addition to estimating a global and a level bias. After specifying the target variable (Book Rating), the procedure computes the biases and factors by using the stochastic gradient descent algorithm, which minimizes the prediction error during the learning process.

 

As a result of training the model, Santa gets statistics how well the predicted ratings match the actual values overall. The trained model seems to be quite good at predicting the ratings and it can now be used to predict ratings that a person would give to the selection of books Santa has available this year. Santa can then easily select the books that have the highest predicted rating and are not yet owned by the person.

 

Predicted vs Actual RatingsPredicted vs Actual Ratings