Discover how hybrid recommendations, combining user data with product attributes, effectively solve the cold-start problem. This approach ensures personalized suggestions for both new users and products.
In our recent series of blog posts on recommendation systems, we covered the basic concept and gave some examples of what happens behind the scenes. If you're curious about how these systems work, you'll find these posts quite interesting:
- Everything You Need to Know Before Building a Recommendation System
- Building Python Recommendation Systems that Work
- The Difference Between Implicit and Explicit Data for Business
This time around, let’s consider a more practical, albeit hypothetical, scenario taken from the e-commerce sector.
Addressing the Cold-Start Problem and Hybrid Recommendations
Let’s assume that we already have a specific customer base for our online-store and a set product base in which all products have already been ordered. Some products are best-sellers, some of them sell averagely and some products that sell poorly. Our recommendation system would perform quite well under such conditions, as it is designed to take into account the interaction between user behavior, the online shop and its products.
At some point we would need to refresh our product range, adding new products which only partly relate to the previous best-sellers or ones that sold averagely well. After a while, despite the apparent similarities to existing products, we may notice the recommendation system failing to recommend the new range, leading to these products selling poorly or not at all.
In machine learning terminology we call such behavior of simple recommendation systems a cold-start problem.
Is It Worth Worrying About Cold-Start Problems?
The answer is a definitive yes. When the recommendation system is directly related to your revenue and business, you should pay attention to its possible shortcomings.
As with the example above, when launching a brand new product line the desired behavior of a recommendation system would be to recommend to the user relevant products from the new line, as well as other suitably matched products, to provide more exposure and visibility; herein lies the problem.
With a simple recommendation system, the cold-start problem will make it impossible to promote them as valid recommendations. According to the collaborative filtering concept, the recommendation engine will always rate popular products higher than new products, regardless of a user’s interaction or preferences. Typically, products with higher visibility sell better than products which are hardly ever recommended.
That leads to the situation of falling into a loop where the recommendation system promotes products that are already popular, and often those that do not suit the user. A similar mechanism occurs for new users who have not bought anything yet. The above scenarios can be called product cold-start and user cold-start.
What Can Minimize the Effects of Cold-Start Problems?
Before we start looking for a solution to the problem, it’s worth initially looking into what is causing it. As previously outlined, a cold start occurs when we introduce new products or new users appear. We may consider the main reason to be that it is difficult for us to find a point of reference from other products and users.
The Basics of Recommendation Systems
Simple recommendation systems make use of interaction data by capturing which user bought which products. In reality, we also have information about who our users are (their age, location and so on) and product data (category, description, or any other attribute). Leveraging this data allows us to find similarities between users and products (example below).
In general, recommendation systems that are able to combine different approaches are called hybrids. The combination of external features which describe products and users, together with any implementation of matrix factorization, perfectly fits the hybrid recommendation algorithms group.
Leveraging Product Data
Let’s begin with the data that is related to the product. Products can be assigned to specific categories, collections, descriptions or product-specific characteristics such as size, model or color. Taking such characteristics into account should allow us to get an idea of what the new product is, and how it relates to the well-selling products already on offer.
Collecting and Using User Data
Obtaining data for products is relatively easy, but things get a little more complicated when attempting to gather data for improving user features. First of all, this is information that users have to leave in the system as they use the platform. This means the data may have different levels of completeness depending on how much the user has left. Examples of such data can be: user location; login frequency; previously logged in; typical amounts spent; etc. It gets slightly more complicated when we try to collect behavioral information such as: products viewed; type of device; length of time browsing a products details page; average session duration; etc.
Hybrid Recommendations vs. Collaborative Filtering
To illustrate the differences between the two methods of recommendation, we can show some educational examples below. Of course, these are based on simple examples, so it should not be taken literally.
We have an online shop which currently has several products on offer. Today we have released a new product (let’s say it’s a Saleor t-shirt). In theory, this could be a fragment of our product list.
Based on the above example, we can see some product similarities through their attributes, e.g. product 1, 3, and 5 are similar by their color attribute; products 2 and 3 by their category attribute; products 1 and 3 by color and collection simultaneously.
Let’s now take a look at the table below. It should be interpreted that when the value for the user-product pair is 1, it means that the product has already been bought. If it is 0, it means that the user has not (yet) bought the product.
On the basis of the above tables, we will try to analyze what might the recommendations be for Caitlyn using the CF and hybrid models. The model based solely on collaborative filtering will likely propose Caitlyn jeans. This is because Caitlyn bought saleor-socks, just like Tom and Ben; Tom and Ben also bought jeans. Sunglasses would likely be proposed in second place based on Tom’s other purchase. And lastly, the Saleor t-shirt, as no one has bought it yet.
How the Hybrid Model Enhances Recommendation Accuracy
The hybrid model will consider the problem of recommendation in two ways.
First of all, Caitlyn’s preferences based on her choices will be checked — in this case, we see that so far she has bought purple socks branded by Saleor and a random white t-shirt. With this info, the preferred products in order of preference would be those that meet more of these tastes(values) — in order of similarity, it would be: a Saleor t-shirt (2 of its features coincide with those of the socks and 1 coincides with a white t-shirt), then sunglasses (the color coincides with the color of the socks), and finally the jeans (they have no features in common with Caitlyn’s purchases).
Secondly, the model will attempt to replicate the user’s taste according to the behavior of the CF-based model presented earlier. The next step is to mix CF-based information with information based on the user’s taste preferences.
We can observe that both approaches have given completely different results. So what will the end result likely be? It is difficult to say because of how each unique hybrid model combines its data. It depends on the specific model case and its hyperparameters. It is worth noting that new, or not yet popular, products are still presented to the prospective customer, not simply ignored by the recommendation system.
Hybrid Recommendations Solve Cold-Start Problems with Enhanced Product and User Data
When a recommendation system is implemented with care and precision, it can be hugely beneficial for your e-commerce.
Each type of recommendation system improves user experience, however, simple recommendation systems based solely on user interaction with products will likely fail when new products and users appear. Hybrid recommendation systems significantly reduce the possibility of cold-start problems thanks to additional information gathered that describes the properties of users and products.
If you want to learn more about hybrid recommendation systems, these links provide more technical details: