In this article at OpenGenus, we will get familiarized with the flow of a product-based interview round.
When applying for a data science related role in any product-based company like Facebook, Uber, Google and Netflix, we generally have a round of product-based interview round where we will be tested on our familiarity with their products and the algorithms and interfaces used to develop them and also the design of the product. These interviews are mostly in form of a discussion where we discuss one or 2 problems in length and often have a take-home assignment.
Interviewer: What are key performance metrics for Uber?
There are a number of key performance metrics (KPIs) that drive Uber. Uber aims to increase each of the metric given below in a quarterly basis. These are closely monitored by Uber in order to ensure that their business is on the right path. Some of the KPIs of Uber are:
- Gross Bookings is the measure of the total value of all rides booked through the Uber platform.
- Net Revenue is the measure of the total revenue generated by Uber, after deducting driver commissions and other operating expenses.
- Active Riders measures the number of unique riders who have taken an Uber ride in a given period of time.
- Free Cash Flow denotes the cash generated by Uber's business, after deducting capital expenditures.
Interviewer: Describe how you would build a model to predict if the ride request will be accepted by the driver or not.
In such questions, make sure you clarify the any doubts you have before you start to answer. This one falls under a discussion and all your skills are put to test by just this question. Sometimes, during the whole interview, we'd be discussing just one question. Given below is the flow of how the question is to be answered.
Candidate: We are specifically looking at the driver's point of view and not the rider's angle right?
Interviewer: Yes. And based on how Uber works, the rider books a ride, requests are sent out to individual drivers who are within a radius and then the driver accepts the request.
Candidate: Can I know what kind of information is given to the Uber drivers so that they are able to make the decision in a split second?
Interviewer: As a data scientist in Uber, let us say that this is a feature that you can customize i.e. you are able to control what is shown and what not is shown to the driver.
Candidate: Can I assume that the driver gets to accept a new trip either at the end of the current one or at the middle of the existing trip?
Interviewer: Yes, go ahead.
Candidate: So I think that the distance to the rider's pick up point should be shown. If its small, then its likely that the driver would accept the ride request. This is the first among the other features that came into my mind.
So before I get into the modeling phase, I was trying to understand the user experience and journey and the next I want to do is the exploratory data analysis.
Interviewer: What is your goal behind doing exploratory data analysis?
Candidate: My main goal is to actually understand the relationships between different data that Uber may have about their drivers and riders and explore the relationships between those features.
So, coming back to the distance, we would have two types of distances. One is the distance to the rider and another is the trip distance. That's the two things about the distance that popped up in my mind. Is there something that I should add?
Interviewer: I think that it's a great way to frame it. As you are going along the lines of distance, I think we could add the duration of the trip, which is a function of the distance and of course there are other factors that affect it. We should also keep in mind the multi-sided market here. Also some features I think that would be valuable ones is the trip characteristics and the price.
Candidate: Yeah, I think the multi-sided marketplace brings up that something I haven't thought of yet. So I got this idea of picking up uber eats orders on the way if they are ending a trip maybe.
Interviewer: I guess that we could probably just center the idea around the driver marketplace.
Candidate: So based on our discussion, I've come up with certain categories of factors. Of course, we can dive deep into each one as it is a big umbrella under which we have various factors/features present. They are:
- Vehicle - under this we can have the vehicle condition, number of passengers it can accommodate.
- Trip - this can contain quality of trip. We can have a regressor or a classifier specifically for this purpose.
- Driver - certain factors like the amount that the driver has made that day, the number of hours he has worked can come under it.
- Rider - under this we can have the time of the day the ride is booked. The neighborhood of the rider may be an influencing factor. Drivers may tend to avoid sketchy neighborhoods.
- Traffic - again, traffic plays an important role. The route to be taken, duration of the trip all depends on it.
- Special events - if there's any special event like a baseball match happening nearby, there'd be rush and drivers usually tend to not accept such rides.
Does this feel good? Do you have any questions or any additions?
Interviewer: So you know that the historical acceptance rate from the drivers in the past play a major role in our model. So in case of new drivers or riders, we do not have all their information and have missing values for some of the aspects that you mentioned. How do you propose we correct it?
Candidate: Assuming that Uber has already clustered its drivers and riders. Based on the few information available, we can intuitively guess the cluster they would belong to and populate the null values with the mean or median of the cluster. If my assumption is not valid, then we can populate the null with the mean or median of that particular column of the dataset.
Interviewer: That's a good way to put it. Now let us discuss about the model.
Candidate: In its simplest sense, this is a classification problem where we want to predict whether the driver accepts the ride or not. But a more nuanced version would be a multi-class model where we have predictions like 'very likely to accpept' and 'not at all likely to accept'.
I would also try out logistic regression model that gives us a probability of accepting. We can set an arbitrary threshold for this acceptance. This would be a great place to start at first as the feature understanding is given more importance at initial stages than the model's performance. In here, we can always look back and see what features drove the probability of accepting.
Once we trim down the number of features and the highly correlated features, we could also try using decision trees and other ensemble methods like XGBoost and random forest as ensemble models can help increase the model's performance.
Interviewer: Great! I'd now like to take this time to do a retrospective of the question. What do you think about the question itself?
Candidate: I think from the metric point of view, this is a great question to assess how holistically the candidate thinks about Uber's business and not just through the lens of the model or the data we have access to.
This obviously is a very scaled down version of the actual interview that takes place. At the end of this, we will probably given a take-home assignment with a deadline of submission.