Data Processing

Why are you still managing your data processing clusters?

Why are you still managing your data processing clusters?

Cluster computing is used to share a computation load among a group of computers. This achieves a higher level of performance and scalability.    Apache Spark is an open-source, distributed and cluster-computing framework, that is much faster than the previous one (Hadoop MapReduce). This is thanks to features like in-memory processing and lazy evaluation. Apache Spark …

Why are you still managing your data processing clusters? Read More »

Improving the quality of data with Apache Spark

Improving the quality of data with Apache Spark

As data consultant experts and heavy Apache Spark users, we felt honoured to become early adopters of OVHcloudData Processing. As a first use case to test this offering, we chose our quality assessment process. As a data consultancy company based in Paris, we build complete and innovative data strategies for our large corporate and public …

Improving the quality of data with Apache Spark Read More »

Do you need to process your data? Try the new OVHcloud Data Processing cloud service!

Do you need to process your data? Try the new OVHcloud Data Processing service!

One of the data services of OVHcloud is called OVHcloud Data Processing (ODP). It is a service that allows you to submit a processing job without caring about the cluster behind it. You just have to specify the ressources you want to use for your job, and the service will abstract the cluster creation, and destroy it for you as soon as your job is finished. In other words, you don’t have to think about clusters any more. Decide how much resources you need to process your data in the most efficient way for you and let OVHcloud Data Processing do the rest.