“People you may know.” “Other products you may like.” “Customers Who Bought This Item Also Bought…” We’ve all seen these suggestions when browsing the web, be it on Facebook or Amazon or some other platform. But how do the sites come up with these recommendations? Sometimes they seem very far off (why should I become friends with someone when we only have one mutual friend?) to eerily tailored (how did you know my favorite band!?!?). This area of research falls under the broad category of recommender systems.
Recommender systems is a relatively new area of research in machine learning. There are two main ways that recommender systems produce a list of recommendations for a user – collaborative or content-based filtering. Collaborative filtering uses past behavior (items that a user previously viewed or purchased, in addition to any ratings the user gave those items) and similar decisions made by other users to create a model. This model then predicts items that the user may find interesting. In content-based filtering the model uses a series of discrete characteristics of an item in order to recommend additional items with similar properties. One of the benefits of recommender systems over search algorithms is that recommender systems help users discover items that they might not have otherwise found. Recommender systems is an active research area in data mining and machine learning.
Collaborative filtering methods are based on collecting and analyzing a large amount of data pertaining to users’ behaviors, activities, or preferences and predicting what users will like based on their similarity to other users. One advantage to this approach is that a machine can give good recommendations about complex things (such as movies) without being able to understand the item itself. These types of methods are also called user-based nearest neighbor recommendations and are based on the assumption that users with similar tastes in the past will continue to have similar tastes in the future. In order for these methods to be useful and accurate, information must be accumulated from a user’s profile. This can be done explicitly (by asking a user to rate an item, by asking a user to rank a set of items, or by presenting two items to a user and asking him/her to choose which one he/she prefers) or implicitly (by observing the items a user views, by analyzing item/user viewing times, or by keeping a record of the items that a user has purchased previously). This information can be continuously updated to keep the user’s profile current. Once the data is collected, the collaborative filtering method identifies other customers that had similar preferences to those of the current user in the past and calculates a list of recommended items for the user. One common similarity measure is the Pearson correlation coefficient (calculated by taking the covariance of the two variables divided by the product of their standard deviations).
One of the most famous examples of collaborative filtering is an algorithm popularized by Amazon’s recommender system The idea behind their method is item-to-item collaborative filtering. That is, if a user bought item x, then he/she will also (probably) buy item y. Facebook, LinkedIn, and other social networking sites examine the network of connections between a user and his/her friends and then implement collaborative filtering algorithms to suggest new groups the user might be interested in joining or people that the user may know.
No algorithm is perfect, and collaborative filtering approaches often suffer from three main problems:
- Cold Start: Collaborative filtering systems must build a profile for each user but this takes a large amount of data in order for the recommendations to be meaningful and accurate.
- Scalability: Collaborative filtering systems are used in areas where there are lots of choices. For example, with Amazon there are millions of products from which to choose. A large amount of computational power is often needed to calculate recommendations.
- Sparsity: Again, thinking about Amazon, there are millions of products available (which is true for other major e-commerce sites, as well). Even the most active users will only have rated a small subset of the overall database of products. This means that a particular item (even if it is extremely popular) will have very few ratings.
Another popular approach to implementing recommender systems is content-based filtering. In content-based filtering a user profile is built to provide information about the types of items that the user likes based on keywords used to describe the items. In these systems a recommendation is made by presenting similar items to those the user liked in the past (or items that are similar to what the user is currently looking at). This approach has roots in information retrieval and information filtering research.
The content-based filtering method creates a profile for each item (based on a set of discrete attributes and features) which is used to characterize the item in the system. The system then creates a content-based profile for the user based on a weighted vector of item features (from items the user has previously rated or purchased and from items the user is currently viewing). The weights denote the importance of each feature to the user. There are many possible ways of computing these weights, from Bayesian Classifiers to cluster analysis to decision trees to artificial neural networks. Whatever the calculation technique the goal of the weighted vector is the same – to estimate the probability that the user is going to like a suggested item.
One example of a content-based filtering is Pandora Radio. When a user goes to the Pandora website, he/she is prompted to “Enter artist, genre or composer to create a station.” Pandora then uses content-based filtering to find music with similar qualities to the song, artist, or genre that the user provided.
The main issue with content-based filtering is that the system is unable to apply user preferences about one type of item to another item type. For example, how does a user’s music preferences impact his or her preference in reading material? When the system is limited to recommending content of they same type, the value of the recommendations is limited.
So now you know how websites make suggestions tailored to your likes and dislikes. As research into recommender systems progresses the suggestions will just keep getting better and better!
Allen, R.B. (1990). “User Models: Theory, Method, Practice”. International J. Man-Machine Studies.
Montaner, M.; Lopez, B.; de la Rosa, J. L. (June 2003). “A Taxonomy of Recommender Agents on the Internet”. Artificial Intelligence Review 19 (4): 285–330. doi:10.1023/A:1022850703159.