2. An introduction to Recommender Systems - Basic Models
Having introduced the goals of recommender systems in my previous article, let's carry on talking about the basic models used to develop recommender systems.
The basic models for recommender systems work with two kinds of data:
- User-Item interactions, such as ratings or buying behaviour
- Attribute information about the users and items
Methods that use the former are referred to as collaborative filtering methods, whereas methods that use the latter are referred to as content-based methods.
Collaborative Filtering Models
Collaborative Filtering Models use the power of the ratings provided by different users to make recommendations. The biggest challenge in designing CF methods is that generally, the rating matrix they rely on is sparse, (most users have rated only a small fraction of the large universe of available items, so most of the ratings in the rating matrix are unknown).
The basic idea of CF methods is that these unknown ratings can be inferred (predicted) because the known ratings are often highly correlated across various users and items. For example, consider two users who have very similar tastes (similar preferences). If the ratings, which both have specified, are similar, then their similarity can be identified by an algorithm, and it is very likely that the ratings where only one of them has specified a value, are also likely to be similar. This similarity can be used to make predictions about unknown ratings. Most of the models for CF leverage either inter-item correlations or inter-user correlations to generate predictions. Furthermore, some models use optimization and machine learning techniques to create a training model which is then used to predict the missing values in the matrix.
There are two types of methods that are commonly used in CF:
- Memory-based methods: These methods are also referred to as neighborhood-based collaborative filtering methods and can be defined in one of two ways:
1. User-based CF:
The basic idea is to determine users (peer group), who are similar to the active user A (target user), and predict ratings for the missing ratings of A by computing weighted averages of the ratings of this peer group. If Alice and Bob have rated movies in a similar way in the past, then one can use Alice's known ratings on the movie Terminator to predict Bob's missing rating on this same movie. Similarity functions are computed between the columns of the rating matrix to discover similar users.
2. Item-based CF:
In order to make the rating predictions for the target item B by user A, the first step is to determine a set S of items that are most similar to the target item B. The ratings in the item set S, which are specified by A, are used to predict whether the user A will like item B. Therefore, Bob's ratings on similar science fiction movies like Alien and Predator can be used to predict his rating on Terminator. Similarity functions are computed between the rows of the rating matrix to discover similar items.
Memory-based techniques are easy to implement and the resulting recommendations are often easy to explain. On the other hand, memory-based algorithms do not work very well with sparse rating matrices.
- Model-based methods: In model-based methods, machine learning and data mining methods are used in the context of predictive models. A model of the data is created up front, with supervised or unsupervised machine learning methods. Therefore, the training (or model building phase) is separate from the prediction phase. These methods deliver very accurate missing rating predictions and therefore very high-quality recommendations, however, these recommendations are more difficult to explain to the user.
Content Based Recommender Systems
Collaborative Filtering methods use the correlation in the ratings patterns across users to make recommendations but they don't take into account item attributes for generating predictions. This could seem rather wasteful; after all, if a user likes the science fiction movie Alien, then there is a very good chance that he could like a movie from a similar genre. In these cases, the ratings from other users may not be needed to make meaningful recommendations. Content-based systems are designed to exploit situations where items can be described using sets of attributes. In such cases, the user's own ratings and actions on other items are sufficient to discover meaningful recommendations.
For example, consider a situation where a user has rated the movie Terminator highly, but we don't have access to the ratings of other users. Therefore, collaborative filtering methods are ruled out. However, the movie Terminator can be described using a set of attributes (like genre keywords extracted from the movie's description) that will be common to other science fiction movies, such as Alien and Predator. In such cases, these movies can be recommended to the user.
This approach is particularly useful when the item is new, and there are few ratings available for that item. This is because other items with similar attributes might have been rated by the active user.
Content-based methods do have some disadvantages as well:
- They quickly tend to provide obvious recommendations. If a user has never viewed or purchased an item with a particular set of attributes, such an item has no chance of being recommended. This is because the model is specific to the user at hand, and the community knowledge from similar users is not leveraged. This generally tends to reduce recommendation diversity, which is undesirable.
- Even though content-based methods are effective at providing recommendations for new items, they are not effective at providing recommendations for new users.
Sometimes content-based models can leverage users' preferences/interests that can be specified via relevant keywords in their own profiles. These profiles can be matched with item descriptions in order to make recommendations. Such an approach does not use ratings in the recommendation process, and it is, therefore, useful in cold-start scenarios. However, these methods are often viewed as a distinct category of recommendation systems, known as knowledge-based systems.
Each of the aforementioned methods has pros and cons thus recommendation systems nowadays tend to be hybrid systems that combine the strengths of various types of recommendation systems to create algorithms that can perform robustly in a wide variety of situations.
This article is written by Riccardo Saccomandi, Co-founder and CTO of Kickdynamic.