Last spring, I built a wood oven in my garden. I’ve wanted to have one for years, and I finally decided to make it. To use it, I make a big fire inside for two hours, remove all the embers, and then it’s ready for cooking. The oven accumulates the heat during the fire and …
Our colleagues in the K8S team launched the OVH Managed Kubernetes solution last week, in which they manage the Kubernetes master components and spawn your nodes on top of our Public Cloud solution. I will not describe the details of how it works here, but there are already many blog posts about it (here and here, to get you …
At the OVH Observability (formerly Metrics) team, we collect, process and analyse most of OVH’s monitoring data. It represents about 500M unique metrics, pushing data points at a steady rate of 5M per second. This data can be classified in two ways: host or application monitoring. Host monitoring is mostly based on hardware counters (CPU, …
At the Metrics team we have been working on time series for several years. From our experience the data analytics capabilities of a Time Series Database (TSDB) platform is a key factor to create value from your metrics. And these analytics capabilities are mostly defined by the query languages they support.
TSL stands for Time Series Language. In a few words, TSL is an abstracted way, under the form of an HTTP proxy, to generate queries for different TSDB backends. Currently it supports Warp 10’s WarpScript and Prometheus’ PromQL query languages but we aim to extend the support to other major TSDB.
To better understand why we created TSL, we are reviewing some of the TSDB query languages supported on OVH Metrics Data Platform. When implementing them, we learnt the good, the bad and the ugly of each one. At the end, we decided to build TSL to simplify the querying on our platform, before open-sourcing it to use it on any TSDB solution.
Why did we decide to invest some of our Time in such a proxy? Let me tell you the story of the OVH metrics protocol!
OVH relies extensively on metrics to effectively monitor its entire stack. Whenever they are low-level or business centric, they allow teams to gain insight into how our services are operating on a daily basis. The need to store millions of datapoints per second has produced the need to create a dedicated team to build a operate a product to handle that load: Metrics Data Platform. By relying on Apache Hbase, Apache Kafka and Warp 10, we succeeded in creating a fully distributed platform that is handling all our metrics… and yours!
After building the platform to deal with all those metrics, our next challenge was to build one of the most needed feature for Metrics: Alerting.