Making the best first impression: Using machine learning to optimize photo selection
The first photo a traveler sees when looking for a hotel, restaurant, or experience can make or break a booking decision. As a key partner to business owners and tour operators across the globe, we want to ensure that when multiple images are equally relevant, the one shown first helps them put their best image forward, while still surfacing authentic traveler perspectives. .
The first photo a traveler sees when looking for a hotel, restaurant, or experience can make or break a booking decision. As a key partner to business owners and tour operators across the globe, we want to ensure that when multiple images are equally relevant, the one shown first helps them put their best image forward, while still surfacing authentic traveler perspectives. Our model doesn’t suppress unflattering or critical images; they remain fully visible in the gallery and can rank highly based on quality and relevance. When multiple images are equally relevant (e.g., multiple bedroom shots), we prioritize the one that is most visually engaging as the primary image.
We recently launched our Primary Photo Service, a machine learning system that automatically selects the most compelling primary photos — the first photo that represents a hotel, restaurant, or attraction on Tripadvisor — to enable faster visual decision-making for our partners.
This end-to-end solution combines computer vision, pairwise learning, and large-scale infrastructure to scale high-quality photo selection.While simple on its surface, it processes around 12,000 read requests per second during peak hours. And since launch, we’ve seen significant increases in click-through rates and bookings, all without adding a layer of complexity to the lives of our partners.
Why primary photos matter
For any online platform, across categories, photos are business drivers. According to a 2022 report from Journal of Business Research, high-quality, visually rich images can significantly increase user engagement across digital platforms. Images are a central component of traveler engagement at Tripadvisor, with about 350M images published to locations and an average year-over-year growth in image uploads of about 33% since 2010.
When travelers browse our site, the primary photo is often their first impression of a property. A gorgeous exterior shot of a hotel with just the right light can bring in clicks. A generic, dark photo of a basic room will keep people scrolling. As one of our travelers stated plainly in the title of their review, “[I] booked because of the location, photos, and reviews.”
But acting on that knowledge historically has been difficult. Manual curation of photos, amongst the millions of images uploaded by owners, operators, and travelers alike, isn’t a feasible, scalable solution. And while our partners have a deluge of images to choose from, they vary greatly in perspective and quality.
Our methodology Using AI to find patterns in attraction
To find a solution, we leaned into a key insight: “attractiveness” may be subjective, but patterns exist. There are commonalities to be found in popular photos across categories and cultures: the right lighting, the perfect perspective, the exposition of unique features. We set out to develop an approach that combines two key metrics — visual appeal and relevance — to identify photos that are both eye-catching and accurately representative of what travelers want to see when they’re in discovery-mode.
We built a set of core machine learning models and heuristics:
- The Attractiveness Score Model, which assesses the visual appeal of each photo
- The Primary Photo Selection Logic, which combines attractiveness scores with business rules to effectively pick the best thumbnail for each item.
The Attractiveness Score Model
Attractiveness is inherently subjective, depending on technical quality, relevance to the specific context, and aesthetic quality. We had to lean into computer vision: training an AI model to look at and evaluate images the same multifaceted way we do as people.
Instead of defining beauty in absolute terms, we used a “pairwise” learning approach. Instead of asking, “Is this photo attractive?” we asked, “Which of these two photos is more attractive?” This relative comparison proved far more reliable and consistent since it focuses on the relative preferences between images, rather than deciding on arbitrary absolute metrics which can lead to inconsistent results.
Our model defines a high-attractiveness image as a high-quality photo that encourages travelers to click and explore the property further. These scores are used to sort through potentially thousands of images for any given location, weighted by heuristics, to identify the best candidates for the primary photo.