Recommendation Systems improve both customer experience and sales. Recommendation System is a must-have for modern e-commerce. A simple system can be built in less than an hour
Why Python Recommendations Matter for eCommerce Success?
You may not always realize it, but so many of the websites you use on a daily basis have built-in Recommendation Systems that are driving your experience — as well as nudging you towards purchases.
They are a must-have feature for any e-commerce website. We recently built one for a major apparel retailer which increased conversion rate by 1% and improved average order value by 5.55%. It helps if you have an expert team behind your implementation, but we believe that most people can get a handle on the concepts and even have a go at building their own simple systems. Let’s get down to business.
Understanding Collaborative Filtering
The fundamental idea used in recommendation systems — Collaborative Filtering — works on the assumption that if two (or more) users rate common items the same way, they probably have similar taste. It is a mathematical equation with many unknowns — and the bigger the database of users and items, the more it sprawls towards infinity. But don’t let the math scare you off.
Sudoku is a mathematical equation with nine unknowns. You can do it the nerdy way, reducing it to nine linked equations, but it takes a lot of work before you get down to real business. In fact, the quickest way to complete the puzzle is usually through logical thinking and risk (guessing). Filtering uses a similar mix of math and intuition.
A Practical Example for Movie Recommendations
Let’s imagine that cinephiles Tom and Ben both use our movie website. They don’t know each other but were equally excited by Gal Gadot running through no-man’s land in Wonder Woman, both rated Harry Potter as more “Accio” than just okay, both loved Avatar, and agreed that Godzilla kind of sucked.
However, when it came to the new Star Wars movie, things went a different way. Ben rated it first and loved it. We’d assume from what we know so far that Tom would feel the same, but he wasn’t into it at all. In fact, the algorithm now thinks that Tom’s preferences are more in line with Caitlyn’s.
The system would not be wrong to recommend Star Wars to Tom based on Ben’s rating. All the data suggests that Tom will like it; but there is still always an element of guessing, as there is no real accounting for taste. The system will never be right 100% of the time but, with enough data, we can find full or partial taste matches for people; learning as much from where people’s reactions are the same as from where they differ. It can collect people in pairs or groups and make the best possible guess with the information at hand.
Importance of Data in Python-Based Recommendation Systems
We can use two different types of customer feedback when to create data. The first, ‘explicit’ feedback, is when users provide clear, affirmative information through actions like rating or buying a product or watching a film on a service like Netflix. These are obvious choices, but human activity is often more subtle.
‘Implicit’ feedback is when a user gives us a suggestion of their interest by perhaps watching a trailer or reading a review. A user might click on a product but not buy it. They have signaled intent but not committed an action.
When building recommendation systems, we need to decide whether explicit or implicit feedback is of most value to us, and also how it should be weighted. Can we learn as much from intention as we can from a completed action? And how do we factor in negative implicit feedback like a user watching only the first few seconds of a movie trailer? It’s a complex area that we will debate in another post. For our current example, we can assume that rating a movie is sufficient user feedback.
Handling the Cold-Start Problem
To get the ball rolling, we might make some educated guesses or ask new users a few questions when they sign up to start feeding data into the algorithm. On Netflix, you are asked to choose a few titles you like to help “jump-start” your recommendations as a new user. If you choose none, you’ll be shown a generic choice of popular titles and your activity from that point will be the basis for the process.
Interpreting User Data
This matrix, called rating matrix (R), has some missing elements because, in real life, nobody has seen every movie. At this point, we can define what our recommendation system should do. We want the system to guesstimate how Caitlyn would rate Wonder Woman and Avatar so it can then recommend one or both to her if it decides that she would give them a high rating.
To do so, we can apply a technique called matrix factorization, more specifically, SVD (Singular Value Decomposition). It is a method of grouping items from the original matrix R into abstract concepts. It breaks down the elements of the matrix into single factors, removing all the information such as names and movie titles, to create pure mathematical results. These determine how each user correlates with each value. With this information, the system can try to predict missing fields in the R matrix by combining users' preferences with movie summaries. Of course, ours is only a simplification of what is actually a much more complex, automated process.
More than 80 per cent of the TV shows and movies people watch on Netflix are discovered through the platform’s recommendation system.
Josephina Blattmann, UX Planet
Implementing SVD for Robust Python Recommendation Systems
from surprise import SVD
from surprise import Dataset
data = Dataset.load_builtin(‘ml-100k’)
trainset = data.build_full_trainset()
Of course, this is just an example, in real life we won’t be using MovieLens. Surprise documentation provides a nice tutorial for loading custom datasets.
The library comes with the SVD technique we discussed earlier straight out of the box:
svd = SVD()
svd.fit(trainset)
testset = trainset.build_anti_testset()
predictions = algo.test(testset)
[Prediction(uid=’196', iid=’302', r_ui=3.52, est=3.99, details={‘was_impossible’: False}),
Prediction(uid=’196', iid=’377', r_ui=3.52, est=2.75, details={‘was_impossible’: False}),
Prediction(uid=’196', iid=’51', r_ui=3.52, est=3.73, details={‘was_impossible’: False}),
Prediction(uid=’196', iid=’346', r_ui=3.52, est=3.50, details={‘was_impossible’: False}),
Prediction(uid=’196', iid=’474', r_ui=3.52, est=4.16, details={‘was_impossible’: False}),
Prediction(uid=’196', iid=’265', r_ui=3.52, est=3.76, details={‘was_impossible’: False}),
…]
This list might look overwhelming, but we are only interested in three fields:
- uid — the user ID, for whom we carry out predictions
- iid — item ID (here we treat movies as items)
- est — estimated rating for an item, as we expect the user to give
The actual recommendation happens when we display the top rated results to the user as something they might be interested in. For a detailed guide, refer to the Surprise documentation.
Discover More Insights for Building a Powerful Recommendation System
This article is part of a series. For more insights about building a recommendations system, read also:
- Everything You Need to Know Before Building a Recommendation System
- The Difference Between Implicit and Explicit Data for Business
- In future, we’ll talk about how to display recommendations in a more effective way, as well as a post on choosing the right data for your system. If you want to learn a little more right now, these links are a pretty good place to start:
- Surprise library
- Understanding Matrix Factorization for Recommendation
- LightFM — a hybrid recommendation system helping with cold start problem
Mirumee guides clients through their digital transformation by providing a wide range of services from design and architecture, through business process automation, to machine learning. We tailor services to the needs of organizations as diverse as governments and disruptive innovators on the ‘Forbes 30 Under 30’ list. Find out more by visiting our services page.