IOPS up to 10 times faster with the OVH Public Cloud

Good news for all those who take advantage of our Public Cloud to store large amounts of data: we have deployed hardware and software optimisations that will ultimately allow us to deliver improved performance across all instances and regions. The gain is far from minor: on a type B2-7 instance, for example, we go from 2,000 input/output operations per second (IOPS) to 20,000 IOPS – more than 10 times faster!

Before we look at these optimisations in detail, let's take a quick look back at history. At first, Virtual Machines (VMs) hosted on the Public Cloud were connected to remote storage. It was the logical solution from the point of view of pooling resources, but network latency was too much of a limitation on performance. So it wasn't long until we decided to switch to local storage and then implemented redundancy. With the local SSD RAID, we thought we had achieved a fully acceptable level of performance.

However, feedback from some of our customers revealed a slightly different view: for them, we were the best value for money on the market in terms of CPU, bandwidth, etc., but the results for applications that were particularly demanding in terms of storage dragged down the final score. The conclusion was obvious: with SSD NVMe appearing, some of our software choices needed a second look.

Finding the right combination

Our tests have highlighted a first area of optimisation: Qcow, the disk image format used for virtual machine storage. The copy-on-write technique may have advantages in a virtual environment, but it also means that the system has to run a check on the physical disk every time data is recalled or written, which is a significant waste of time.

Alongside this, there was the question of the file system. LVM gave excellent results in simulated benchmark tests, but its performance was far less impressive in application tests corresponding to the reality on the ground. In real life, our customers who make intensive use of Redis, MongoDB or Hadoop do not limit themselves to aligning 4K blocks. So we needed to conduct a new round of benchmark tests with more representative tools.

PCI Bench

Comparison of different storage formats in a benchmark test carried out by the OVH Metrics team. Red line: performance of PCI RAW with io=native on NVMe. Orange line: performance obtained with LVM. Yellow line: performance of VMs pre-optimisation. Green line: performance of dedicated servers that represent the benchmark measure we are trying to achieve: “lower is better”.

As shown in the benchmark test above, the results achieved with RAW were very similar to those of the dedicated servers.

At the end of the process, we asked some customers if they would like to test what we thought was the right combination, namely a migration from Qcow to RAW and a file system based on an optimised version of Ext4. Good news – the first test customers unanimously measured IOPS performance rates up to 10 times higher, as shown in the benchmark tests below.

PCI BenchPCI Bench

Migration

Now comes the second phase: large-scale deployment to make these improvements available to as many people as possible. The work will take a little time, because the necessary hardware and its configuration require the latest version of OpenStack, Newton, to which all our infrastructures are gradually migrating. Customers will be pleased to know that these optimisations have no impact on the price or naming of our Public Cloud instances. They are simply integrated into the existing offer. In fact, you may even be benefiting from it already!

If IOPS is a key criterion when deciding about your cloud activities, we invite you to get a concrete idea of the proposed performance upgrades in just a few clicks. All you have to do is launch a VM B2 (General Purpose) hosted in the GRA5 region.

And then?

We firmly believe that intensive IO uses have a place in our Public Cloud. In parallel with the ongoing migration, we are therefore preparing the next step, which will allow us to further increase the level of performance by yet another factor of 10. Without going into too much detail about an offer that’s still a work in progress, just imagine that your VM will one day be able to access, in PCI passthrough, a cluster of NVMe SSDs mounted according to the RAID of your choice and dedicated to your needs...