INSUBCONTINENT EXCLUSIVE:
Back in 2013, Yelp was a 9-year old company built on a set of internal systems
It was coming to the realization that running its own data centers might not be the most efficient way to run a business that was continuing
At the same time, the company understood that the tech world had changed dramatically from 2004 when it launched and it needed to transform
the underlying technology to a more modern approach.
That a lot to take on in one bite, but it wasn&t something that happened willy-nilly or
overnight says Jason Yellen, SVP of engineering at Yelp
The vast majority of the company data was being processed in a massive Python repository that was getting bigger all the time
The conversation about shifting to a microservices architecture began in 2012.
The company was also running the massive Yelp application
inside its own datacenters, and as it grew it was increasingly becoming limited by long lead times required to procure and get new hardware
It saw this was an unsustainable situation over the long-term and began a process of transforming from running a huge monolithic application
on-premises to one built on microservices running in the cloud
It was a quite a journey.
The data center conundrum
Yellen described the classic scenario of a company that could benefit from a shift to
Yelp had a small operations team dedicated to setting up new machines
When engineering anticipated a new resource requirement, they had to give the operations team sufficient lead time to order new servers and
get them up and running, certainly not the most efficient way to deal with a resource problem, and one that would have been easily solved by
the cloud.
&We kept running into a bottleneck, I was running a chunk of the search team [at the time] and I had to project capacity out to
Then it would take a few months to order machines and another few months to set them up,& Yellen explained
He emphasized that the team charged with getting these machines going was working hard, but there were too few people and too many demands
and something had to give.
&We were on this cusp
We could have scaled up that team dramatically and gotten [better] at building data centers and buying servers and doing that really fast,
but we were hearing a lot of AWS and the advantages there,& Yellen explained.
To the cloud!
They looked at the cloud market landscape in
2013 and AWS was the clear leader technologically
That meant moving some part of their operations to EC2
Unfortunately, that exposed a new problem: how to manage this new infrastructure in the cloud
This was before the notion of cloud-native computing even existed
Sure, Google was operating in a cloud-native fashion in-house, but it was not really an option for most companies without a huge team of
engineers.
Yelp needed to explore new ways of managing operations in a hybrid cloud environment where some of the applications and data
lived in the cloud and some lived in their data center
It was not an easy problem to solve in 2013 and Yelp had to be creative to make it work.
That meant remaining with one foot in the public
cloud and the other in a private data center
One tool that helped ease the transition was AWS Direct Connect, which was released the prior year and enabled Yelp to directly connect from
their data center to the cloud.
Laying the groundwork
About this time, as they were figuring out how AWS works, another revolutionary
technological change was occurring when Docker emerged and began mainstreaming the notion of containerization
&That another thing that been revolutionary
We could suddenly decouple the context of the running program from the machine it running on
Docker gives you this container, and is much lighter weight than virtualization and running full operating systems on a machine,& Yellen
explained.
Another thing that was happening was the emergence of the open source data center operating system called Mesos, which offered
a way to treat the data center as a single pool of resources
They could apply this notion to wherever the data and applications lived
Mesos also offered a container orchestration tool called Marathon in the days before Kubernetes emerged as a popular way of dealing with
this same issue.
&We liked Mesos as a resource allocation framework
It abstracted away the fleet of machines
Mesos abstracts many machines and controls programs across them
Marathon holds guarantees about what containers are running where
We could stitch it all together into this clear opinionated interface,& he said.
Pulling it all together
While all this was happening, Yelp
began exploring how to move to the cloud and use a Platform as a Service approach to the software layer
The problem was at the time they started, there wasn&t really any viable way to do this
In the buy versus build decision making that goes on in large transformations like this one, they felt they had little choice but to build
that platform layer themselves.
In late 2013 they began to pull together the idea of building this platform on top of Mesos and Docker,
giving it the name PaaSTA, an internal joke that stood for Platform as a Service, Totally Awesome
It became simply known as Pasta.
Photo: David Silverman/Getty Images
The project had the ambitious goal of making their infrastructure work
as a single fabric, in a cloud-native fashion before most anyone outside of Google was using that term
Pasta developed slowly with the first developer piece coming online in August 2014 and the first production service later that year in
The company actually open sourced the technology the following year.
&Pasta gave us the interface between the applications and development
Operations had to make sure Pasta is up and running, while Development was responsible for implementing containers that implemented the
interface,& Yellen said.
Moving to deeper into the public cloud
While Yelp was busy building these internal systems, AWS wasn&t sitting
It was also improving its offerings with new instance types, new functionality and better APIs and tooling
Yellen reports this helped immensely as Yelp began a more complete move to the cloud.
He says there were a couple of tipping points as they
moved more and more of the application to AWS — including eventually, the master database
This all happened in more recent years as they understood better how to use Pasta to control the processes wherever they lived
What more, he said that adoption of other AWS services was now possible due to tighter integration between the in-house data centers and
AWS.
Photo:erhui1979/Getty Images
The first tipping point came around 2016 as all new services were configured for the cloud
He said they began to get much better at managing applications and infrastructure in AWS and their thinking shifted from how to migrate to
AWS to how to operate and manage it.
Perhaps the biggest step in this years-long transformation came last summer when Yelp moved its master
database from its owndata center to AWS
&This was the last thing we needed to move over
As of 2018, we are serving zero production traffic through physical data centers,& he said
While they still have two data centers, they are getting to the point, they have the minimum hardware required to run the network
backbone.
Yellen said they went from two weeks to a month to get a service up and running before this was all in place to just a couple of
He says any loss of control by moving to the cloud has been easily offset by the convenience of using cloud infrastructure
&We get to focus on the things where we add value,& he said — and that the goal of every company.