Temps de lecture estimé : 6 minute(s)
Binder team partnered together in order to support the growth of the
BinderHub ecosystem around the world.
With approximately 100,000 weekly users of the mybinder.org public deployment and 3,000 unique git repositories hosting Binder badges, the need for more resources and computing time was felt.
Today, we are thrilled to announce that
OVH is now part of the world-wide federation of BinderHubs powering mybinder.org. All traffic to mybinder.org is now split between two BinderHubs – one run by the
Binder team, and another run on
So for those who don’t already know mybinder.org, here’s a summary.
What is Jupyter?
Jupyter is an awesome open-source project that allows users to create, visualise and edit interactive notebooks. It supports a lot of popular programming languages such as
Scala as well as some presentation standards such as
Example of a local Jupyter Notebook reading a notebook inside the OVH GitHub repository prescience client.
The main use case is the ability to share your work with tons of people, who can try, use and edit the work directly from their web browser.
Many researchers and professors are now able to work remotely on the same projects, without any infrastructure or environment issues. It’s a major improvement for communities.
Here is for example a notebook (Github project) allowing you to use Machine Learning, from dataset ingestion to classification:
Example of a Machine Learning Jupyter Notebook
What is JupyterHub?
JupyterHub is an even more awesome open-source project bringing the multi-user feature for
Jupyter notebooks. With several pluggable authentication mechanisms (ex: PAM, OAuth), it allows Jupyter notebooks to be spawned on the fly from a centralised infrastructure. Users can then easily share their notebooks and access rights with each other. That makes
JupyterHub perfect for companies, classrooms and research labs.
More information about
JupyterHub can be found here.
What is BinderHub?
BinderHub is the cherry on the cake: it allows users to turn any Git repository (such as GitHub) into a collection of interactive
Jupyter notebooks with only one click.
Landing page of the binder project
Binder instance deployed by OVH can be accessed here.
- Just choose a publicly accessible git repository (better if it already contains some
- Copy the URL of a chosen repository into the correct binder field.
- Click the launch button.
- If it is the first time that binder sees the repository you provide, you will see compilation logs appear. Your repository is being analysed and prepared for the start of a related
- Once the compilation is complete you will be automatically redirected to your dedicated instance.
- You can then start interacting and hacking inside the notebook.
- On the initial binder page you will see a link to share your repository with others.
How does it work?
Tools used by BinderHub
BinderHub connects several services together to provide on-the-fly creation and registry of Docker images. It uses the following tools:
- A cloud provider such as OVH.
- Kubernetes to manage resources on the cloud
- Helm to configure and control Kubernetes.
- Docker to use containers that standardise computing environments.
- A BinderHub UI that users can access to specify Git repos they want built.
- BinderHub to generate Docker images using the URL of a Git repository.
- A Docker registry that hosts container images.
- JupyterHub to deploy temporary containers for users.
What happens when a user clicks a Binder link?
After a user clicks a Binder link, the following chain of events happens:
- BinderHub resolves the link to the repository.
- BinderHub determines whether a Docker image already exists for the repository at the latest reference (git commit hash, branch, or tag).
- If the image doesn’t exist, BinderHub creates a build pod that uses repo2docker to:
- Fetch the repository associated with the link.
- Build a Docker container image containing the environment specified in configuration files in the repository.
- Push that image to a Docker registry, and send the registry information to the BinderHub for future reference.
- BinderHub sends the Docker image registry to JupyterHub.
- JupyterHub creates a Kubernetes pod for the user that serves the built Docker image for the repository.
- JupyterHub monitors the user’s pod for activity, and destroys it after a short period of inactivity.
A diagram of the BinderHub architecture
How we deployed it?
Powered by OVH Kubernetes
One great thing about the
binder project is that it is completely cloud agnostic, you just need a
kubernetes cluster to deploy on.
Kubernetes is one of the best choices to make when it comes to scalability on a micro-services architecture stack. The managed Kubernetes solution is powered by OVH’s Public Cloud instances. With OVH Load Balancers and integrated additional disks, you can host all types of workloads, with total reversibility.
To this end, we used 2 services in the OVH Public Cloud:
- A Kubernetes Cluster today consuming 6 nodes of
C2-15VM instances (it will grow in the future)
- A Docker Registry
We also ordered a specific
domain name so that our
binder stack could be publicly accessible from anywhere.
Installation of HELM on our new cluster
Once the automatic installation of our
kubernetes cluster was complete we downloaded the administration
yaml file allowing us to manage our cluster and to launch
kubectl commands on it.
kubectl is the official and most popular tool used to administrate
kubernetes cluster. More information about how to install it can be found here.
The automatic deployment of the full
binder stack is already prepared in the form of
helm is a package manager for
kubernetes and it needs a
client part (helm) and a
server part (tiller) to work.
All information about installing
tille can be found here.
Configuration of our HELM deployment
tiller installed on our cluster, everything was ready to automate the deployment of binder in our OVH infrastructure.
The configuration of the
helm deployment is pretty straightforward and all the steps have been described by the
binder team here.
Integration into the binderhub CD/CI process
binder team already had a travis workflow existing for the automation of their test and deploymentsprocesses. Everything is transparent and they expose all their configurations (except secrets) on their GitHub project. We just had to integrate with their current workflow and push our specific configuration on their repository.
We then waited for their next launch of their
travis workflow and it worked.
From this moment onward, the ovh stack for binder was running and accessible by anyone from everywhere at this adress: https://ovh.mybinder.org/.
What comes next?
OVH will continue engaging with the data open-source community, and keep building a strong relationship with the
Jupyter foundation and more generally the python community.
This first collaborative experience with such a data-driven open-source organisation helped us to establish the best team organisation and management to ensure that both
OVH and the community achieve their goals in the best way possible
Working with open source is very different from the industry as it requires a different mindset: very human-centric, where everyone has different objectives, priorities, timeline and points of view that should all be considered.
We are grateful to the Binder, Jupyter, and QuantStack team for their help, the OVH K8s team for the OVH Managed Kubernetes and OVH Managed Private Registry, and the OVH MLS team for the support. You rock, people!