Journey to next-gen Ceph storage at OVHcloud with LXD

Introduction

My name is Filip Dorosz. I’ve been working at OVHcloud since 2017 as a DevOps Engineer. Today I want to tell you a story of how we deployed next-gen Ceph at OVHcloud. But first, a few words about Ceph: Ceph is a software defined storage solution that powers OVHcloud’s additional Public Cloud volumes as well as our product Cloud Disk Array. But I won’t bore you with the marketing stuff – let the story begin!

Journey to next-gen Ceph storage at OVHcloud with LXD

This looks like an interesting task…

One and a half years ago we started a very familiar sprint. Aside from usual stuff that we have to deal with, there were one task that looked a little more interesting. The title read: “Evaluate whether we can run newer versions of Ceph on our current software”. We needed newer versions of Ceph and Bluestore to create a next-gen Ceph solution with all flash storage.

Our software solution (which we call a legacy solution) is based on Docker. It sounds really cool but we run Docker a bit differently from it’s intended purpose. Our containers are very stateful. We run a full blown init system inside the container as docker entry point. And that init system then starts all the software we need inside the container, including Puppet which we use to manage the “things”. It sounds like we’re using Docker containers similarly to LXC containers doesn’t it?…

Our legacy Ceph infrastructure (allegory)
Our legacy Ceph infrastructure (allegory)

It quickly turned out that it is not possible to run newer Ceph releases in our in-house solution because newer versions of Ceph make use of systemd and in our current solution we don’t run systemd at all – not inside the containers and not on the hosts that host them.

The hunt for solutions began. One possibility was to package Ceph ourselves and get rid of systemd, but that’s a lot of work with little added value. Ceph community provides tested packages which need to be taken advantage of, so that option was off the table.

Second option was to run Ceph with supervisord inside the Docker container. While it sounds like a plan, even supervisord docs clearly states that supervisord “is not meant to be run as a substitute for init as “process id 1”.”. So that was clearly not an option either.

We needed systemd!

At this point, it was clear that we needed a solution that enables us to run systemd inside the container as well as on the host. It sounded like a perfect time to switch to a brand new solution – a solution that was designed to run a full OS inside the container. As our Docker used LXC backend it was a natural choice to evaluate LXC. It had all the features we need but with LXC we would have to code all the container-related automation ourselves. But could all this additional work be avoided? It turns out it could…

As I used LXD in my previous project I knew it is capable of managing images, networks, block devices and all the nice features that are needed to setup a fully functional Ceph cluster.

So I reinstalled my developer servers with an Ubuntu Server LTS release and installed LXD on them.

Ubuntu & LXD

LXD has everything that was needed to create fully functional Ceph clusters:

  • it supports ‘fat’ stateful containers,
  • it supports systemd inside the container,
  • it supports container images so we can prepare customized images and use them without hassle,
  • passing whole block devices to containers,
  • passing ordinary directories to containers,
  • support for easy container start, stop, restart,
  • REST API that will be covered in later parts of the article,
  • support for multiple network interfaces within containers using macvlan.

After just a few hours of manual work I had Ceph cluster running Mimic release inside LXD containers. I typed ceph health and I got ‘HEALTH_OK’. Nice! It worked great.

How do we industrialize that?

To industrialize it and plug it into our Control Plane we needed a Puppet module for LXD so Puppet could manage all the container related elements on the host. There was no such module that provided the functionality we needed so we needed to code it ourselves.

LXD daemon exposes handy REST API that we utilized to create our Puppet module. You can talk to the API locally over unix socket and through the network if you configure to expose it. For usage within the module it was really convenient to use lxc query command which works by sending raw queries to LXD over unix socket. The module is now opensource on GitHub so you can download and play with it. It enables you to configure basic LXD settings as well as create containers, profiles, storage pools etc.

The module allows you to create, as well as manage the state of the resources. Just change your manifests, run puppet agent, and it will do the rest.

Open source LXD Puppet Module, available on GitHub

The LXD Puppet module as of writing this provides the following defines:

  • lxd::profile
  • lxd::image
  • lxd::storage
  • lxd::container

For full reference please check out its GitHub page.

Manual setup VS Automatic setup with Puppet

I will show you a simple example of how to create the exact same setup manually and then again automatically with Puppet. For the purpose of this article I created a new Public Cloud instance with Ubuntu 18.04, one additional disk and already configured bridge device br0. Lets assume there is also a DHCP server listening on the br0 interface.

It’s worth noting that generally you don’t need to create your own image, you can just use the upstream ones with built-in commands. But for the purpose of this article, lets create a custom image that will be exactly like upstream. To create such image you just have to type some commands to repack upstream image into Unified Tarball.

root@ubuntu:~# wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64-root.tar.xz
root@ubuntu:~# wget https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64-lxd.tar.xz
root@ubuntu:~# mkdir -p ubuntu1804/rootfs
root@ubuntu:~# tar -xf bionic-server-cloudimg-amd64-lxd.tar.xz -C ubuntu1804/
root@ubuntu:~# tar -xf bionic-server-cloudimg-amd64-root.tar.xz -C ubuntu1804/rootfs/
root@ubuntu:~# cd ubuntu1804/
root@ubuntu:~/ubuntu1804# tar -czf ../ubuntu1804.tar.gz *

You will end with a ubuntu1804.tar.gz image that can be used with LXD. For the purpose of this article I’ve put this image in a directory reachable through HTTP for example: http://example.net/lxd-image/

Manual setup

First thing first lets install LXD.

root@ubuntu:~# apt install lxd

During package install you will be greeted with the message: “To go through the initial LXD configuration, run: lxd init” but we will just do the steps manually.

Next step is to add the new storage pool.

root@ubuntu:~# lxc storage create default dir source=/var/lib/lxd/storage-pools/default
Storage pool default create

Next, create a custom profile that will have: environment variable http_proxy set to ”, 2GB memory limit, roofs on default storage-pool and eth0 that will be part of bridge br0.

root@ubuntu:~# lxc profile create customprofile
Profile customprofile created
root@ubuntu:~# lxc profile device add customprofile root disk path=/ pool=default
Device root added to customprofile
root@ubuntu:~# lxc profile device add customprofile eth0 nic nictype=bridged parent=br0
Device eth0 added to customprofile
root@ubuntu:~# lxc profile set customprofile limits.memory 2GB

Lets print out the whole profile to check if its ok:

root@ubuntu:~# lxc profile show customprofile
config:
  environment.http_proxy: ""
  limits.memory: 2GB
description: ""
devices:
  eth0:
    nictype: bridged
    parent: br0
    type: nic
  root:
    path: /
    pool: default
    type: disk
name: customprofile
used_by: []

Then lets fetch the LXD image in the Unified Tarball format:

root@ubuntu:~# wget -O /tmp/ubuntu1804.tar.gz http://example.net/lxd-images/ubuntu1804.tar.gz

And import it:

root@ubuntu:~# lxc image import /tmp/ubuntu1804.tar.gz --alias ubuntu1804
Image imported with fingerprint: dc6f4c678e68cfd4d166afbaddf5287b65d2327659a6d51264ee05774c819e70

Once we have everything in place lets create our first container:

root@ubuntu:~# lxc init ubuntu1804 container01 --profile customprofile
Creating container01

Now lets add some host directories to the container:
Please note that you have to set proper owner of the directory on the host!

root@ubuntu:~# mkdir /srv/log01
root@ubuntu:~# lxc config device add container01 log disk source=/srv/log01 path=/var/log/

And as a final touch add a host’s partition to the container:

root@ubuntu:~# lxc config device add container01 bluestore unix-block source=/dev/sdb1 path=/dev/bluestore

/dev/sdb1 will be available inside the container. We use it for passing devices for Ceph’s Bluestore to the container.

The container is ready to be started.

root@ubuntu:~# lxc start container01

Voila! Container is up and running. We setup our containers very similarly to the above.

Although it was quite easy to setup the above by hand. For a massive deployment, you need to automate things. So now lets do the above using our LXD Puppet module.

Automatic setup with Puppet

To make use of the module, download it to your puppet server and place it in the module path.

Then, create a new class or add it to one of the existing classes; whatever suits you.

Automatic LXD setup with Puppet

I plugged it into my puppet server. Please note that I am using bridge device br0 that was prepared earlier by other modules and that LXD images are hosted on a webserver http://example.net/lxd-images/ as Unified Tarballs.

Full example module that makes use of LXD Puppet module:

class mymodule {
 
    class {'::lxd': }
 
    lxd::storage { 'default':
        driver => 'dir',
        config => {
            'source' => '/var/lib/lxd/storage-pools/default'
        }
    }
 
    lxd::profile { 'exampleprofile':
        ensure  => 'present',
        config  => {
            'environment.http_proxy' => '',
            'limits.memory' => '2GB',
        },
        devices => {
            'root' => {
                'path' => '/',
                'pool' => 'default',
                'type' => 'disk',
            },
            'eth0' => {
                'nictype' => 'bridged',
                'parent'  => 'br0',
                'type'    => 'nic',
            }
        }
    }
 
    lxd::image { 'ubuntu1804':
        ensure      => 'present',
        repo_url    => 'http://example.net/lxd-images/',
        image_file  => 'ubuntu1804.tar.gz',
        image_alias => 'ubuntu1804',
    }
 
    lxd::container { 'container01':
        state   => 'started',
        config  => {
            'user.somecustomconfig' => 'My awesome custom env variable',
        },
        profiles => ['exampleprofile'],
        image   => 'ubuntu1804',
        devices => {
            'log'  => {
                'path'   => '/var/log/',
                'source' => '/srv/log01',
                'type'   => 'disk',
            },
            'bluestore' => {
                'path'   => '/dev/bluestore',
                'source' => '/dev/sdb1',
                'type'   => 'unix-block',
            }
        }
    }
}

Now the only thing left to do is to run puppet agent on the machine. It will apply desired state:

root@ubuntu:~# puppet agent -t
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Retrieving locales
Info: Loading facts
Info: Caching catalog for ubuntu.openstacklocal
Info: Applying configuration version '1588767214'
Notice: /Stage[main]/Lxd::Install/Package[lxd]/ensure: created
Notice: /Stage[main]/Lxd::Config/Lxd_config[global_images.auto_update_interval]/ensure: created
Notice: /Stage[main]/Mymodule/Lxd::Storage[default]/Lxd_storage[default]/ensure: created
Notice: /Stage[main]/Mymodule/Lxd::Profile[exampleprofile]/Lxd_profile[exampleprofile]/ensure: created
Notice: /Stage[main]/Mymodule/Lxd::Image[ubuntu1804]/Exec[lxd image present http://example.net/lxd-images//ubuntu1804.tar.gz]/returns: executed successfully
Notice: /Stage[main]/Mymodule/Lxd::Container[container01]/Lxd_container[container01]/ensure: created
Notice: Applied catalog in 37.56 seconds

In the end you will have new container up and running:

root@ubuntu:~# lxc ls
+-------------+---------+--------------------+------+------------+-----------+
|    NAME     |  STATE  |        IPV4        | IPV6 |    TYPE    | SNAPSHOTS |
+-------------+---------+--------------------+------+------------+-----------+
| container01 | RUNNING | 192.168.0.5 (eth0) |      | PERSISTENT | 0         |
+-------------+---------+--------------------+------+------------+-----------+

Because you can expose custom environment variables in the container, it opens a lot of possibilities to configure new containers.

Open source LXD Puppet Module, available on GitHub

How good is that!?

I encourage everyone to contribute to the module or give it a star on GitHub if you find it useful.

Plans for the future

After extensive testing we are sure that everything works as intended and we were confident that we can go to prod with the new solution with Ceph storage on all flash based storage, without HDDs.

In the future, we plan to migrate all our legacy infrastructure to the new LXD based solution. It will be a mammoth project to migrate, with over 50PB that sit on over 2000 dedicated servers, but that’s a story for another time.