Online shopping has been a staple for a while, but it is even more popular now because of the pandemic. As a result, it is critical that we can trust a platform’s product rankings. After all, no one wants to give bad gifts. Unfortunately, rankings can be based on fraudulent data like fake clicks, purchases, and reviews.
The problem stems from the race for visibility. The sheer number of products available on online platforms combined with consumers’ limited attention has created this race for the sellers. They want to be listed among the top-ranked items. As a result, “click farms” employ fake users to click on products to boost their popularity and mislead the platform to rank them in top positions.
Being in those top positions is critical for sellers. Think about the last time you shopped on Amazon for a product like slippers. Even if you narrow it down by color and size, the site has endless slipper options displayed across seemingly endless pages. Most people do not have the time or patience to look at a lot of pages of products. As a seller, if your slippers are not on the first page, there is a good chance they will not be viewed at all.
So, sellers are incentivized to artificially inflate their position. It has been reported that some Amazon sellers pay $10,000 a month to “black hat” companies in order to be ranked in top positions.
The current algorithms used by platforms to rank products are vulnerable to these frequent activities. The fraudulent activities mislead the algorithms to make poor decisions, resulting in the most visible positions being occupied by unpopular products. This can harm customer engagement and other metrics for the platform.
Many recent studies have explored how position bias can prevent platforms from accurately inferring customer preferences. My colleagues Vahideh Manshadi, Jon Schneider, and Shreyas Sekar, and I wanted to go beyond those studies and develop constructive solutions for the problem of fraudulent reviews. We wanted to know: Can an online platform efficiently learn the optimal product ranking in the presence of fake users? Further, can the platform learn the optimal ranking without knowing the identity or number of fake users?
The answer is yes to both questions. We designed algorithms that can efficiently learn the optimal product ranking in the presence of fake users, even though they are completely blind to the identity and number of fake users.
Our work presents a number of insights on how to design methods for uncertain environments to guarantee robustness in the face of manipulation. These include being more conservative in inferring key parameters and changing decisions based on limited data; employing parallelization and randomization to limit the damage caused by fake users; and augmenting a conservative approach via cross-learning.
To have a successful algorithm, we keep parallel copies of a learning algorithm that are different in terms of their conservatism level. Being conservative helps algorithms not get manipulated easily by fraudulent data. Our parallel algorithms communicate with each other to learn the right level of conservatism and to make decisions about how to rank products. The algorithms are not trying to determine quality scores of products, but rather whether a product should be ranked above or below another similar product. This makes it a simpler problem.
This is good news because learning algorithms are a low-cost way to mitigate the problem of fake data. Companies already have the necessary data, and they do not need to know how many fraudulent data points they have. The parallel learning algorithms remove that issue, as they communicate with each other to make accurate estimates about product rankings.
This new line of research on corrupted data can serve as a starting point for designing robust data-driven algorithms to tackle other operational challenges. More immediately, it can be used by platforms to help us make better buying decisions.