OpenStack: The Issue of Staying Close to the Upstream Production Code and Pushing Code Back to the Community

Managing a large-scale infrastructure, such as OVH’s OpenStack-based Public Cloud, requires constant maintenance while meeting customer needs and expectations. What’s more, OpenStack is always releasing new versions and customers demand more and more features. This is a huge challenge requiring smart management from the provider. Here are the main points from today’s conference given at Boston’s OpenStack Summit, by Adam Kijak, a DevOps at OVH.

Patching OpenStack: Adapting It to the Public Cloud Use Case

OpenStack was not originally built for Public Cloud. Public Cloud providers have to contend with a number of issues specific to OpenStack. For example, North-South network traffic is intense and highly critical, and two years ago, OpenStack solutions were not satisfactory. To improve this, we started to “hack” OpenStack to remove the network node from this path and added an important patch in Neutron. This was done on Juno for our beta version. We then prodded our Public Cloud offering and continued to add more and more small and large patches to our production branch, both for our own needs and based on specific requests from customers.

The issue of keeping upstream while in production

Now we have to take a step back and observe what has happened following two years that we spent meeting feature requests from customers and internal needs related to large-scale infrastructure management. It appears that we are forking OpenStack so we can provide a version which is perfect for our customers. Features and API compatibility are unchanged, but the code keeps deviating from the upstream code. Everytime we make a patch we move further away from the project’s upstream archives.

As a consequence, we were not able to integrate some recent features to our own production branch or to commit our own changes to the community. In addition, sometimes the same feature would be developed twice, once at OVH and once in the community, but with different code.

This model means that contributing to the upstream is a long and difficult process.

At OVH, we are more Ops than Dev

There are several reasons behind the model we chose, including time constraints. Sometimes we only have a few hours to respond to a particular need. When, for this reason, we push this patch to the community, most of the time it's not validated. By way of an example, we developed a patch on Swift to manage incoming requests. It took us three days to develop, test and prod the patch. We pushed it to the community and three months later, the patch is still in discussion. Despite this, we have been using the patch for three months because our customers need it!

So, what causes this kind of situation? Well, the answer is actually pretty simple. Most OpenStack techies in the community work like DevOps. However, at OVH we are more Ops than Dev, which means we prioritize operational constraints, whereas the OpenStack community is more Dev than Ops and prioritizes architecture and global design constraints.

What if, instead of a problem, this was actually an opportunity to improve our workflow?

While the community is focused on the global system for a well-designed cloud solution, we are largely focused on creating solutions that "just work". This presents us with a unique occasion to work together. After all, is that not the strength of the open-source community? Taking advantage of different approaches and working towards a common goal?

On the OVH side, we decided to work systematically on our own OVH master branch, keeping it really close to the upstream master branch and updated daily. All our patches are developed from this starting point. Working in this way means that we are always ready to easily push patches to the community repository. It doesn’t really matter whether or not the patch is approved, that’s not the main goal. The goal is to present a use case and provide one possible solution. The community is then free to give up on it if it is too specific, to work more deeply on it when our own implementation doesn't fit the current direction of the project, or to simply accept it.

In order to use the patch on our production branch, we just need to import it from our master branch to our production code base. This is a real, new advantage for us as it means we can use our current master branch whenever we need. All our changes have already been integrated.

This workflow should ideally be used as much as possible in any open-source project where you have to run a production environment and implement code modifications.