Web hosting: how to host 3 million websites?

In 2018, we launched one of the largest projects in OVH’s history: migrating the 3 million websites hosted in our Paris datacentre. If you want to discover the reasons behind this titanic project, we took a look at them in this post.

It is now time to explain how we proceeded with this project. We have already talked about our operational constraints, such as ensuring we’d no impact on the websites, even though we do not control the source code, and limiting the downtime. We also had many technical constraints, related to the architecture of our services. We’ll take a close look at our different migration scenarios in the next article, but today, we will explain how the infrastructure hosting your websites works.

The anatomy of web hosting

To work, websites and web applications usually need two things:

  • The source code, responsible for executing the behaviour of the website
  • The data used by the source code to customise the experience

For a site to be operational, the source code is run on one or more servers whose environment has been configured with the tools related to the programming languages ​​used. PHP currently dominates the market, but the choice is not limited to this language.

To remain in operational condition, these servers must be maintained and updated, and their operation must be continuously monitored. This role is usually different to that of developers, who are responsible for creating the source code of the website. If you’re interested in these roles, we’d recommend learning more about system administrators and DevOps, and their specific skills.

For a website owner, monitoring the infrastructure and its operations can be very expensive: you need to have a team large enough to maintain a presence 24 hours a day, 365 days a year. With such a team, developers can only rely on one-off service requests when the website owner wants to make changes.

It is this level of management that we offer through our web hosting solution. Instead of recruiting a system administrator team for each website, we have recruited a team of technical experts who take care of all the websites we host. The same goes for the data, which is stored in specific servers, and operated by those same teams of experts.

This way, each website does not need to utilise the resources of an entire server. With our solutions, we pool resources on our infrastructures, so individual servers can be used to run several websites at once.

All of these economies of scale allow us to offer a low-cost web hosting solution (from €1.49/month), while maintaining a high level of quality. We will explain very shortly how we calculate our quality of service, but if you do not want to wait, watch the conference that our developers hosted at the FOSDEM 2019.

To reach the required level of quality while managing so many websites, we have adapted our infrastructure.

The technical architecture of our web hosting solutions

The classic approach to building a web hosting service is to install a server, configure the databases and a source code execution environment, and then place new clients there until the server is full, before moving on to the next server.

This is very effective, but there are some disadvantages with this type of architecture:

  • In the event of breakdown, the recovery can be long. Unless you have real-time system replication, which is very expensive, you have to recover your data from the backup and then migrate it to a new machine before reopening the service to the clients.  Although the probability of hardware or software failure is low, this is a common operation on large infrastructures.
  • To deal with new technologies, it would be preferable to only introduce them on new servers during the first phase, and then take the time to progressively deploy them on existing servers in future phases.
    However, in a market as fast-moving as the internet, it becomes difficult to maintain a heterogeneous infrastructure. Technical teams must then adapt, and work with multiple different configurations, increasing the risk of regression or breakdown. And for support teams, it becomes very difficult to memorise all the different configurations and keep track of the ongoing changes.
  • Each software brick is inherently complex, so to achieve the best performance and quality of service, we have created teams that specialise in each technology: database, PHP environment, storage systems, network… Having all these teams interact on the same server can be difficult, and lead to misunderstandings regarding the availability of websites.
  • It is difficult to isolate a customer who consumes more resources than the average. If you’re lucky, you won’t be on the same server as this customer, but if you are, the performance of your website may suffer as a result of their resource consumption.

As you can see, at our scale, we chose to take a different approach, in the form of a well-known pattern for applications that must deal with the load: an n-tier architecture.

N-tier architecture

High-load sites can leverage ntier architectures to provide more resources for each software brick by distributing them across several servers. If we go further with our code/data division, we therefore need three servers for running a website:

  • A server responsible for the execution of the source code
  • Two frequently-used storage servers: a database server and a file server
N-tier architecture

File servers: Filerz

These are the servers on which we store the files that make up the website (usually the website’s source code). They are equipped with specific hardware to ensure fast data access and durability. We also use a specialist file system called ZFS, which allows us to easily manipulate local and remote backups.

They are managed by a dedicated Storage team, who also provide this type of service to several other teams at OVH (web hosting, emails…). If this is of interest to you, they also offer NAS-HA , which you will find in the public catalogue of our dedicated servers.

In case of failure, we keep a stock of parts and servers readily available, to restore service as quickly as possible.

Database servers

Database servers are typically used by websites to dynamically host site data. They utilise of a database management system (DBMS) (MySQL with our solutions) which structures the data and makes it accessible through a query language (SQL in our case).

These servers also require specific hardware for storing data, specifically high RAM, in order to take advantage of DBMS cache systems and respond faster to search queries.

These machines are managed by a dedicated Database team, who are responsible for this infrastructure. The Database team offers these services to the public through the CloudDB offer, and also handles the private SQL.

In Paris, these servers are hosted on a Private Cloud (SDDC),which us allows to switch virtual machines on the fly from one physical machine to another in case of problems. This helps reduce downtime during maintenance, and recovery times in the event of failure.

Web servers

These are the servers that receive the requests and execute the website source code. They get their data from the files and databases provided by the previously-described Filerz and database servers. The main requirement for these servers is good CPU resources, which are needed for executing the source code.

Since these web servers are stateless (i.e. they do not store any data locally), it is possible to add more of them and distribute the load across several machines. This allows us to distribute websites across different servers and avoid distortions of use across the infrastructure, as uses are dynamically distributed across all the servers of the farm.

If one of the servers goes down, other servers in the farm are available to maintain the traffic. This allows us to use a wide range of the servers available in our inventory, provided they have good CPU capabilities.

Load balancing

Queries don’t arrive at the intended web server by magic. For this, you need an entry point that sends the requests to the right server. This technology is called ‘load balancing’.

Our Paris datacentre features servers whose hardware is fully dedicated to load balancing. In our new Gravelines infrastructure, however, we use a public brick: IPLBs.

Website traffic arrives at a few IP addresses ( https://docs.ovh.com/en/hosting/list-addresses-ip-clusters-and-web-hostings/#cluster-025 ) that we dedicate to our web hosting service. These IP addresses are managed by our load balancers. They are therefore the entry point of our infrastructures. We have also implemented the very best anti-DDoS technology, to protect our clients’ websites.

These load balancers work perfectly for high volumes of website traffic, spread across multiple web servers, as queries are distributed fairly, via load balancing algorithms. But our requirements are different. We wish to have several different sites on a single server, and change the allocation based on several criteria: the customer’s hosting package (as more premium offers involve fewer websites per server), and the resources required to continuously distribute the load.

We also offer solutions where resources are guaranteed, such as Performance Hosting, or even fully dedicated, like Cloud Web.

In fact, the load distribution is very strongly tied to our customers. We have changed the distribution system with a brick dedicated to OVH, named predictor, that chooses the web server according to the request’s website. The predictors adapt to our infrastructure’s metrics, and the information provided by our system.

Web hosting architecture with Load Balancer and Predictor

All this makes our infrastructure a bit more complex than normal, although we won’t go much further into the details in order to keep thing simple and within the scope of this blog post. This should have provided enough of an overview to explain the possible migration scenarios.

By adding load balancing, as well as multiple database servers and file storage, this architecture allows us to host an incredibly large number of different websites. But as all infrastructure administrators know, “Shit happens!”. One day or another, failure will happen. It is therefore necessary to know how to react in such cases, in order to minimise the impact.

Fault domains

One of the techniques for reducing the impact of failures is to limit their perimeter by creating fault domains. Outside the world of computer science, we see similar concepts in forest management, with the use of empty parcels as firestoppers, or in the building industry, with the the doors of the same name.

In our business, it’s about dividing the infrastructure into pieces, and distributing customers across different clusters. We therefore divided the Paris infrastructure into 12 identical clusters. In each cluster, we find the load balancer, the web servers and the Filerz. If one of the clusters goes down, only  1/12 of the customers with sites hosted at that datacentre are affected.

Database servers are treated separately. Although don’t highlight it as a feature, we allow our customers to share the use of their databases between their hosting solutions when they need to share data. Since the customer isn’t able to choose the cluster of their websites, we have separated the databases from the clusters, in order to make them accessible to the entire datacentre.

So for the last time, we need to update our infrastructure schema…

Architecture with fault domains

Infrastructure management

This entire infrastructure is managed by our information system, using real-time configuration, which forms the link between the OVH Control Panel, the OVH APIs, and the infrastructure itself.

The information system includes an exhaustive representation of the infrastructure, which makes it possible to adapt the delivery of new accounts, manage changes in our offers, and perform technical actions on accounts.

For instance, when you create a new database on your hosting package, the information system takes care of selecting the server on which it will be located, to make sure it is created on the same infrastructure, before notifying you of its availability via email or API.

Congratulations… you now know a bit more about our architecture! To find out where your own website is running, you can find the names of your database servers, Filerz and clusters linked to your hosting in the OVH Control Panel.

Technical constraints for migration

This architecture imposes some technical constraints, if the websites hosted on it are to continue working as intended:

  • All the websites in the same cluster share the same IP address
  • Database servers and hosting clusters are uncorrelated
  • In order to migrate a website,  you must synchronise its migration with the migration of all its associated elements, i.e. the load balancer, Filerz, and databases
  • The source code of a website can use a database that is not referenced on its web hosting
  • The source code can include references to the infrastructure (absolute link including the filerz number , the cluster name, the name of the database servers …)

You now know all the operational and technical constraints related to the datacentre migration project. In the next article, we will discuss the different migration scenarios we considered, and the one we eventually chose.

See you soon!