OVHcloud recently got a new name to emphasize its focus: the cloud, to empower you to run your workloads easily, without caring too much about the underlying hardware. So why talk about FPGAs?
An FPGA is a hardware accelerator, a reconfigurable chip that can behave as custom silicon, designed for a specific application. We use FPGAs as custom network devices for our attacks mitigation system. But FPGA development is quite different from software development: it requires specialized proprietary software and long development cycles.
In this article, I would like to focus on the way we integrate FPGA development to the agile software development workflow used by all other developers at OVHcloud.
Why use FPGAs?
FPGAs are extremely generic chips, they can be used to build circuits for a very wide range of applications:
- signal processing
- machine learning (classification)
Their main interest is that you are not constrained by the architecture of a CPU, you can build your custom hardware architecture tailored to an application. This usually brings better performance, lower power consumption and lower latency.
For networking applications, the advantages are:
- a direct connection to 100GbE links: no network interface card, no PCIe link, packets are received directly on the chip
- an access to extremely low latency memory with very fast random accesses (QDR SRAM: each bank allows about 250 millions reads and writes per second)
- the ability to build custom packet processing pipelines to use the resources of the chip at their maximum.
This allows us to handle 300 millions packets per second and 400 Gb/s on a single FPGA board with a power consumption under 70W.
To know more about FPGAs, the ebook FPGAs For Dummies is a good resource.
A traditional FPGA development workflow
Languages used to develop on FPGA have a strong specificity: contrary to standard sequential languages, everything happens in parallel, to model the behavior of millions of transistors working in parallel on a chip. Two main languages are used: VHDL and SystemVerilog. We use SystemVerilog. Here is an example SystemVerilog module:
// Simple example module: a counter // Will clear if clear is 1 during one clock cycle. // Will increment at each clock cycle when enable is 1. `timescale 1ns / 1ps module counter #( // Number of bits of counter result parameter WIDTH = 5 ) ( input clk, // Control input enable, input clear, // Result output reg [WIDTH-1:0] count = '0 ); always_ff @(posedge clk) begin if (clear) begin count <= '0; end else if (enable) begin count <= count + 1; end end endmodule
Modules can be combined together by connecting their inputs and outputs to create complex systems.
Testing on the simulator
A very important tool when developing on FPGA is the simulator: it is complex and slow to test the code directly on an actual FPGA. To speed things up, simulators can run code without specific hardware. They are used both for unit tests, to test each module separately, and for functional tests, simulating the whole design, controlling its inputs and checking its outputs. Here is the result on the counter module:
This is the wave showing the value of each signal at each clock cycle. Of course, the simulator can also run headless, and the testbench can be modified to return a pass/fail result.
A basic simulator is provided by Xilinx, an FPGA manufacturer. More advanced simulators are provided by Mentor or Synopsys. These simulators are commercial and require expensive licenses.
Building the binary
Once all tests pass, it is time to get a binary that can be used to configure the FPGA. The biggest FPGA providers, Intel and Xilinx, both provide proprietary tools for this process. A first phase, the synthesis, transforms the source code into a circuit. The second phase, “place and route”, is a very complex optimization problem to fit the circuit onto the resources provided by the FPGA, while respecting the timing constraints, so that the circuit can run at the wanted frequency. It can last multiple hours, even up to one day on very complex designs. This process might fail if the design is over-constrained, so it is usual to have to start multiple processes with different seeds to have more chances to get a working binary at the end.
Our current FPGA development workflow
Our current development process is very close to a traditional one. But usually, FPGA development is much slower than software development. At OVHcloud, we are able to develop and ship a small feature in one day. We achieve that by leveraging the workflow used by software developers, and by using our cloud infrastructure. Here is the global workflow:
The whole workflow is controlled by CDS, our open-source continuous delivery system. All tests, as well as the compilation jobs, run on Public Cloud, except the tests on board, which are run in our lab.
Using our Public Cloud
The setup of all machines is done by Ansible. There are a few important roles, to install different important components:
- the simulator
- the Xilinx compiler, Vivado
- the Intel compiler, Quartus
- the license server for the simulator and the compilers
The license server as well as development boxes are long-running Public Cloud instances. The license server is the smallest instance possible, the development box is an instance with a fast CPU and a lot of RAM. The simulator and the compilers are installed on the development box. Authorizations to access the license server are managed with OpenStack security groups.
The instances used for simulated tests and for compilation are started using the OpenStack API when needed. It is very important because it allows to run multiple test sets in parallel, for different developers. It is also very important for compilation. We compile our designs for multiple targets (Stratix V FPGAs for 10G and Ultrascale+ FPGAs for 100G), so we need to run multiple compilation jobs in parallel. In addition, we run jobs in parallel with multiple seeds to improve our chances to get a correct binary. As build jobs can last 12 hours with our designs, it is very important to start enough in parallel to be sure that we get at least one working binary.
Running the tests
Functional tests are very important because they validate each feature our designs provide. The tests are developed in Python, using scapy to send traffic and analyze the results. They can be run with a simulated design or with the real design on the actual FPGA boards. CDS is able to run tests automatically on the real boards by booking lab servers and connecting to them through SSH. The same process is used for performance tests.
The result of this infrastructure is that developers can push a new feature on a branch of our git repository, and they will have full unit and functional tests results in 30 minutes. If all is ok, they can trigger the compilation, and have the result tested on board the next day. Then they just have to tag the new package version to have a new release available. After that, the team managing the production can deploy the new version using ansible.
We have automated as much as possible our process and we use the public cloud infrastructure to accelerate the workflow. But currently we are still using a quite traditional FPGA development flow. Many different approches exist to go further, and as we want to push the FPGA development process as close to software development as we can, we have looked into many of them.
A very common approach is to use High Level Synthesis (HLS). It consists in using a high-level language to develop modules instead of SystemVerilog. With Vivado HLS, it is possible to develop in C++. It is also possible to use OpenCL, which we have tested on Intel boards. The principle of HLS is to extract the algorithm from the high level code, and then to build automatically the best pipelined architecture on FPGA. But we are doing packet processing, our algorithms are extremely basic. The complexity of our code lies in the architecture itself, to be able to support very high data rates. So we were not able to use HLS efficiently, the code we got was actually more complex than the same function in SystemVerilog.
SystemVerilog is extremely low-level and does not allow to use high level of abstractions (at least not if you want the code to be usable by Intel and Xilinx compilers). What we really need to simplify development is to be able to use higher levels of abstraction. We do not need a complex compiler to try and guess the best architecture itself. For that, we have a PhD student currently collaborating on an Open Source project: Chisel.
Chisel is a hardware design language based on Scala. Its main interest is that it allows to use the whole level of abstraction offered by Scala to describe hardware. It is also fully Open Source, which is highly unusual in the hardware development world. It relies on Verilator, an Open Source simulator, for testing. This means that we could get rid of proprietary simulators and have a fully open source toolchain, up to the compilation.
There are currently no open source tools for the place and route phase, at least on the most recent Xilinx and Intel FPGAs. But Chisel can generate Verilog that can be used by proprietary compilers.
We plan to have our first modules designed in Chisel used in production soon. This should help us to have more reusable and easier to write code, and to get slowly rid of proprietary tools.
A paradigm change
The open source community is extremely important to keep making FPGA development closer and closer to software development. A sign of improvement is the progressive arrival of low-end FPGAs in FabLabs and hobby electronics projects. We hope Xilinx and Intel FPGA will follow and that they will one-day embrace open source for their compilers, which could allow to make them more efficient and interoperable. FPGAs are accelerators that offer an incredible flexibility and can be powerful alternatives to CPUs and GPUs, but to democratize their use in cloud environments, the open source community has to get much stronger.