How Airbnb leverages ML to derive visitor curiosity from unstructured textual content knowledge and supply customized suggestions to Hosts
At Airbnb, we endeavor to construct a world the place anybody can belong anyplace. We try to grasp what our friends care about and match them with Hosts who can present what they’re on the lookout for. What higher supply for visitor preferences than the friends themselves?
We constructed a system known as the Attribute Prioritization System (APS) to hearken to our friends’ wants in a house: What are they requesting in messages to Hosts? What are they commenting on in opinions? What are widespread requests when calling buyer help? And the way does it differ by the house’s location, property kind, worth, in addition to friends’ journey wants?
With this customized understanding of what house facilities, amenities, and placement options (i.e. “house attributes”) matter most to our friends, we advise Hosts on which house attributes to accumulate, merchandize, and confirm. We will additionally show to friends the house attributes which can be most related to their vacation spot and desires.
We do that by a scalable, platformized, and data-driven engineering system. This weblog submit describes the science and engineering behind the system.
What do friends care about?
First, to find out what issues most to our friends in a house, we have a look at what friends request, touch upon, and get in touch with buyer help about essentially the most. Are they asking a Host whether or not they have wifi, free parking, a non-public scorching tub, or entry to the seashore?
To parse this unstructured knowledge at scale, Airbnb constructed LATEX (Listing ATtribute EXtraction), a machine studying system that may extract house attributes from unstructured textual content knowledge like visitor messages and opinions, buyer help tickets, and itemizing descriptions. LATEX accomplishes this in two steps:
- A named entity recognition (NER) module extracts key phrases from unstructured textual content knowledge
- An entity mapping module then maps these key phrases to house attributes
The named entity recognition (NER) module makes use of textCNN (convolutional neural network for text) and is educated and positive tuned on human labeled textual content knowledge from varied knowledge sources inside Airbnb. Within the coaching dataset, we label every phrase that falls into the next 5 classes: Amenity, Exercise, Occasion, Particular POI (i.e. “Lake Tahoe”), or generic POI (i.e. “submit workplace”).
The entity mapping module makes use of an unsupervised studying strategy to map these phrases to house attributes. To attain this, we compute the cosine distance between the candidate phrase and the attribute label within the fine-tuned phrase embedding house. We take into account the closest mapping to be the referenced attribute, and might calculate a confidence rating for the mapping.
We then calculate how regularly an entity is referenced in every textual content supply (i.e. messages, opinions, customer support tickets), and mixture the normalized frequency throughout textual content sources. Dwelling attributes with many mentions are thought-about extra vital.
With this method, we’re in a position to achieve perception into what friends are concerned with, even highlighting new entities that we might not but help. The scalable engineering system additionally permits us to enhance the mannequin by onboarding extra knowledge sources and languages.
What do friends care about for several types of properties?
What friends search for in a mountain cabin is totally different from an city condo. Gaining a extra full understanding of friends’ wants in an Airbnb house allows us to supply extra customized steering to Hosts.
To attain this, we calculate a singular rating of attributes for every house. Based mostly on the traits of a house–location, property kind, capability, luxurious degree, and many others–we predict how regularly every attribute will likely be talked about in messages, opinions, and customer support tickets. We then use these predicted frequencies to calculate a personalized significance rating that’s used to rank all attainable attributes of a house.
For instance, allow us to take into account a mountain cabin that may host six folks with a mean every day worth of $50. In figuring out what’s most vital for potential friends, we study from what’s most talked about for different properties that share these similar traits. The outcome: scorching tub, hearth pit, lake view, mountain view, grill, and kayak. In distinction, what’s vital for an city condo are: parking, eating places, grocery shops, and subway stations.
We might immediately mixture the frequency of key phrase utilization amongst related properties. However this strategy would run into points at scale; the cardinality of our house segments might develop exponentially giant, with sparse knowledge in very distinctive segments. As an alternative, we constructed an inference mannequin that makes use of the uncooked key phrase frequency knowledge to deduce the anticipated frequency for a phase. This inference strategy is scalable as we use finer and extra dimensions to characterize our properties. This enables us to help our Hosts to finest spotlight their distinctive and various assortment of properties.
How can friends’ preferences assist Hosts enhance?
Now that we now have a granular understanding of what friends need, we will help Hosts showcase what friends are on the lookout for by:
- Recommending that Hosts purchase an amenity friends usually request (i.e. espresso maker)
- Merchandizing an current house attribute that friends are inclined to remark favorably on in opinions (i.e. patio)
- Clarifying well-liked amenities that will find yourself in requests to buyer help (i.e. the privateness and talent to entry a pool)
However to make these suggestions related, it’s not sufficient to know what friends need. We additionally must be certain about what’s already within the house. This seems to be trickier than asking the Host because of the 800+ house attributes we accumulate. Most Hosts aren’t in a position to instantly and precisely add the entire attributes their house has, particularly since facilities like a crib imply various things to totally different folks. To fill in a number of the gaps, we leverage friends suggestions for facilities and amenities they’ve seen or used. As well as, some house attributes can be found from reliable third events, akin to actual property or geolocation databases that may present sq. footage, bed room rely, or if the house is overlooking a lake or seashore. We’re in a position to construct a really full image of a house by leveraging knowledge from our Hosts, friends, and reliable third events.
We make the most of a number of totally different fashions, together with a Bayesian inference mannequin that will increase in confidence as extra friends affirm that the house has an attribute. We additionally leverage a supervised neural community WiDeText machine studying mannequin that makes use of options concerning the house to foretell the chance that the following visitor will affirm the attribute’s existence.
Along with our estimate of how vital sure house attributes are for a house, and the chance that the house attribute already exists or wants clarification, we’re in a position to give customized and related suggestions to Hosts on what to accumulate, merchandize, and make clear when selling their house on Airbnb.
That is the primary time we’ve recognized what attributes our friends need right down to the house degree. What’s vital varies significantly primarily based on house location and journey kind.
This full-stack prioritization system has allowed us to present extra related and customized recommendation to Hosts, to merchandize what friends are on the lookout for, and to precisely symbolize well-liked and contentious attributes. When Hosts precisely describe their properties and spotlight what friends care about, friends can discover their excellent trip house extra simply.
We’re presently experimenting with highlighting facilities which can be most vital for every kind of house (i.e. kayak for mountain cabin, parking for city condo) on the house’s product description web page. We consider we are able to leverage the information gained to enhance search and to find out which house attributes are most vital for various classes of properties.
On the Host facet, we’re increasing this prioritization methodology to embody extra suggestions and insights into how Hosts could make their listings much more fascinating. This consists of actions like releasing up well-liked nights, providing reductions, and adjusting settings. By leveraging unstructured textual content knowledge to assist friends join with their excellent Host and residential, we hope to foster a world the place anybody can belong anyplace.
If any such work pursuits you, take a look at a few of our associated positions at Careers at Airbnb!
It takes a village to construct such a sturdy full-stack platform. Particular because of (alphabetical by final identify) Usman Abbasi, Dean Chen, Guillaume Guy, Noah Hendrix, Hongwei Li, Xiao Li, Sara Liu, Qianru Ma, Dan Nguyen, Martin Nguyen, Brennan Polley, Federico Ponte, Jose Rodriguez, Peng Wang, Rongru Yan, Meng Yu, Lu Zhang for his or her contributions, dedication, experience, and thoughtfulness!