INSUBCONTINENT EXCLUSIVE:
Regardless of what you may think of Facebook as a platform, they run a massive operation, and when you reach their level of scale you have
to get more creative in how you handle every aspect of your computing environment.
Engineers quickly reach the limits of human ability to
track information, to the point that checking logs and analytics becomes impractical and unwieldy on a system running thousands of services
This is a perfect scenario to implement machine learning, and that is precisely what Facebook has done.
The company published a blog post
today about a self-tuning system they have dubbed Spiral
This is pretty nifty, and what it does is essentially flip the idea of system tuning on its head
Instead of looking at some data and coding what you want the system to do, you teach the system the right way to do it and it does it for
you, using the massive stream of data to continually teach the machine learning models how to push the systems to be ever better.
In the
blog post, the Spiral team described it this way: &Instead of looking at charts and logs produced by the system to verify correct and
efficient operation, engineers now express what it means for a system to operate correctly and efficiently in code
Today, rather than specify how to compute correct responses to requests, our engineers encode the means of providing feedback to a
self-tuning system.&
They say that coding in this way is akin to declarative code, like using SQL statements to tell the database what you
want it to do with the data, but the act of applying that concept to systems is not a simple matter.
&Spiral uses machine learning to create
data-driven and reactive heuristics for resource-constrained real-time services
The system allows for much faster development and hands-free maintenance of those services, compared with the hand-coded alternative,& the
Spiral team wrote in the blog post.
If you consider the sheer number of services running on Facebook, and the number of users trying to
interact with those services at any given time, it required sophisticated automation, and that is what Spiral is providing.
The system takes
the log data and processes it through Spiral, which is connected with just a few lines of code
It then sends commands back to the server based on the declarative coding statements written by the team
To ensure those commands are always being fine-tuned, at the same time, the data gets sent from the server to a model for further adjustment
in a lovely virtuous cycle
This process can be applied locally or globally.
The tool was developed by the team operating in Boston, and is only available internally
It took lots of engineering to make it happen, the kind of scope that only Facebook could apply to a problem like this (mostly because
Facebook is one of the few companies that would actually have a problem like this).