So, how does this change the way we approach new problems?
A few years ago, we found that certain types of queries were becoming more common.
Japanese toys in San Francisco,
live lobster in Kissimmee,
vegan donuts near me.
These are hard queries, local queries.
People are not looking for websites,
but actually businesses on a map.
Well, we could write rules for each of these,
but it becomes unwieldy rather quickly.
So, let's see how we approach it from a machine learning perspective.
We start by thinking about how to collect the data to make it
an ML problem. Let's look at an example.
The query, coffee near me.
The idea behind machine learning is to take a bunch of
examples and convert that knowledge into future predictions.
When you search for "coffee near me," what
are the examples that you are collecting that you're converting into knowledge?
What is the future prediction?
The prediction is quite straightforward.
There are two options.
Bill's Diner carries coffee and it's only three minutes away.
However, there's a gourmet coffee shop just two minutes more.
And we rather think you'd prefer the coffee shop to the sandwich shop.
On the other hand,
if the gourmet coffee shop is across the bridge,
we probably will send you to the diner instead.
Or if the diner typically takes 10 minutes to serve
coffee or doesn't have takeaway coffee so that you have to sit down and eat,
then perhaps a 15-minute walk is what you'd prefer.
How far is too far?
How much is the rating of the restaurant and the time it takes to serve you?
How much do all of these matter?
Rather than guessing and having a whole bunch of rules,
we'd rather have users telling us.
So, we look at a bunch of data and do a trade-off.
Distance versus quality of coffee,
service time versus quality of coffee, etcetera.
But let's now just consider distance.
Where do you get this data?
As an AI-first company,
we might start with heuristics,
but we do so with the mindset that we're going to throw away
the heuristics just as soon as we have enough data about user preferences.
What we need are examples.
Remember, example equals labelled data.
Here, the input is the distance to the shop and
the label is "Does the user like the result or does he not?"
So, we take an example of a shop one kilometer away and the user says,
"Great, I'll go one kilometer for a great coffee."
And then we ask another user whether they'd go three kilometers and they say,
"I don't even like gourmet coffee."
So, we aggregate a bunch of different examples
until eventually we realize it's so far away that nobody wants to go.
And then, we try to fit our model.
So, machine learning is about collecting the appropriate data
and then finding this right balance of good learning and trusting the examples.