Machine learning : from idea to reality

Since you are currently reading a blog post from a tech company, I’ll bet you’ve already heard about artificial intelligence and machine learning dozens of times this month.

And this is perfectly understandable! Health, advertising, gaming, insurance, banking, e-commerce… you name it. Behind the buzz, the reality is here, and we can say for sure that any sector can now use machine learning efficiently.

But how does it work? Is it complicated early on? What kind of resources does it require?

As with many companies, you may already be planning to use machine learning in 2020, and are wondering what challenges you will face as you get started. Well, let’s find out together!

But… what’s hidden behind machine learning?

Artificial Intelligence – and her subcategory, machine learning – are fast-growing sciences, where we are only just scratching the surface.

Much has yet to be discovered, and so the knowledge and tools are evolving at a rapid pace, which can lead to conflicting arguments.

“It’s a no-no for statistics…”

“It’s not for us… We are a small company…” 

I’ve heard remarks like that quite often. Damn, were they actually right? What’s actually going on behind the scenes?

Let’s draw a parallel with an everyday situation… All human beings learn naturally. We are born with cognitive functions that help us with that every day. After hearing his parents, a 2- to 3-year-old child will be able to detect a positive or negative sentence. It’s based on multiple parameters, including the voice intonation, the look, the words themselves, the context… The same happens a bit later with words. After reading sentences in a book, a 5- to 6-year-old child should be able to detect positivity and negativity.

Imagine that you want to imitate this sentiment analysis with a computer. A good way to start would be to take a large collection of words with corresponding positivity/negativity scores (we can call that a “dataset”).

You can then start to detect some patterns. The word “cool”, for example, will often occur when the sentence is tagged as positive.

Now, what if instead of words, you want to detect pattern in full sentences, taking into account emojis, language etc., and perform it on very large datasets? You can achieve that without machine learning, but it might be hard.

Machine learning will use the power of a lot of computers and some specific software to achieve this, following this classic workflow:

The first step is to get clean and useful data, called “training data” here. For our previous example, it consist of classified words/sentences with a sentiment score. It’s one of the hardest part or the process (getting relevant and actionnable data).

One you have your data, you can use a specific machine learning software platform, such as Python Libraries, Pandas or Scikit-Learn, or an AI studio with visual interfaces, to “learn”. This step will consume a lot of power (compute), as it will try various statistical approaches to find the best patterns.

Once you have found the best-matching pattern, it will generate what we call a “machine learning model”.

Once you have it, this model will be able to make predictions. If you push new data into this model (“input” in the schema), a prediction will be made, giving you some results. Afterwards, to stay up to date, you may have to periodically re-train your model with more accurate data, new algorithms, etc.

This kind of process was rocket science few years ago, but today, since the power required is far more accessible, thanks to cloud providers and advancements in data science tools, it can implemented for a very low cost.

“But… since my country language is the same for all companies, maybe I can find a pre-built machine learning model that i can use for sentiment analysis ?

That’s the spirit! For basic projects, where you don’t need any customisation, you can find plenty of out-of-the-box solutions. No data science skills are required, as these models are created by third-parties. If you are able to use an API, then you will be able to use these. And if you want to try some fun models for free, you can explore OVH Labs’ AI Marketplace (https://market-place.ai.ovh.net/).

To sum up: no matter the size of your company, machine learning can fit your needs and bring you some real benefits. It’s based upon statistics tools, but with slightly different concepts and same goals. Today, thanks to open-source contributors, you can find a lot of out-of-the-box machine learning software tools, but when you need more accuracy, it will require some data-science skills.

So… How can machine learning help your company on a daily basis?

Even without knowing it, you benefit from machine learning on a large scale, every day. You have an email account with spam filtering? Machine learning. You watched Netflix and got some good recommendations? Machine learning. You used Waze to avoid traffic jams this morning? Machine learning.

If large companies are using it daily, the growth in small- and mid-sized companies is spectacular.

First of all, there is a huge difference between using software that is already equipped for machine learning, and developing a machine-learning project of your own.

Here are some easy real-world examples that you could deploy in few steps with out-of-the-box solutions (no data science required!):

  • Analyse the sentiment in your brand’s social networks, such as Twitter and Facebook (“This week, 67% of people talked about us positively.)
  • Detect nudity in pictures uploaded to forums, blogs, communities…
  • Detect hateful text in comments, product reviews…
  • Find objects in images and tag them (useful for classifying products and for SEO)

Using it is as simple as this code sample for detecting objects in images (Python, Go, Java… even PHP can use it):

curl -X POST "https://api-market-place.ai.ovh.net/image-recognition/detect" -H "accept: application/json" -H "X-OVH-Api-Key: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" -H "Content-Type: application/json" -d '{"url":"https://mywebsite.com/images/input.jpg", "top_k": 2}'

The results will be provided with a probability score:

More advanced use cases – such as forecasting your stocks or revenues, detecting and fighting fraud, or predicting outages and planned maintenance – will require additional expertise.

Each company will have multiple factors to study. You can try ready-to-use services, but their accuracy will not necessarily meet your expectations.

If this sounds familiar, you’ll be very interested iny the next chapter…

The tortuous journey towards accurate results

Once you jumped over the mental gap of machine-learning complexity, you will have to meet and defeat other monsters. 

Here, instead of using out-of-the-box-solutions, we’ll be doing it ourselves…

Creating a machine learning model: the theory

And… the reality for a data scientist

We meet companies and customers from various fields, and each time we discuss machine learning with them, there are no surprises. It’s always the same pain points, which we can sum up as follows:

  1. Ideas and project selection and internal follow-up (“Are you sure that this ML project will bring value?”)
  2. Data problems: not enough data, bias in data, sensitive data…
  3. Lack of data science expertise
  4. Model accuracy (i.e. the results you get are not exploitable)
  5. The move from prototype to production: deployment, upscaling, versioning, …
  6. Budget

Step #1. Ideas/project selection and internal follow-up. This may seem quite easy at first, but it’s still critical. Data scientists do not have marketing, healthcare, or financial knowledge, so use cases need to come from the end users, not your IT department.

Also, these requesters need to define their needs correctly, and then follow the projects precisely. Always ask them to explain what they want to be able to achieve once they have your predictions.

For large companies, a project manager on the end use side will be essential.

Step #2. Data problems. This is often solved by correctly defining what we are looking for (i.e. step #1), then exporting the data and verifying it manually, wherever possible. You might check your sales and stock, for example, to be sure you are not omitting data. If you need to go further with data cleaning and volumetry, this can be done manually (with time), or with specific tools, such as OVHcloud AutoML, data science studios, such as Dataiku (which is available as a free version), or Apache Spark for data processing.

But more than the tools, your data comprehension is the foundation here.

Step #3 and #4. Lack of data science expertise and model accuracy. These issues can initially be solved through collaboration with your partners. Many consulting firms now specialise in this field, for example, and it’s not just for large companies. And don’t forget, you can also use out-of-the-box solutions for the initial tests, such as those available through the OVHcloud AI Marketplace, as this will not require any data-science skills.

Another option to explore is training some of your employees. As we previously mentioned, software tools and libraries are now accessible to the masses, and since machine learning is quite interesting today you might find some volunteers easily. Online training courses are available, such as datacamp.com, openclassrooms.com, or video training from Udemy, Coursera or Pluralsight. Just bear in mind that these will require statistics, algebra, and communications skills, not just programming.

Step #5. The move from prototype to production can now be achieved in just two minutes with OVHcloud! We just released a new tool called OVHcloud Serving Engine, which allows you to deploy models on our Public Cloud, and is available for free during the early-access phase.

We’ve also deployed pre-trained models for French and English sentiment analysis. Don’t hesitate to try them out in our Public Cloud Control Panel!

Step #6. Budget. Honestly, you can start doing machine learning on your own laptop. After that, you can use cloud resources for training, but getting some initial results will not cost you a lot. Once you have proved that results are achievable with a low budget, having a bigger one should be seen as an investment, not a cost. You can always do it progressively, as the cloud inherently works that way – you only pay what you consume.

Conclusion

I hope that after reading this blog post, you are more confident about exploring machine learning this year! OVHcloud first started exploring these topics five years ago, and has since acquired lot of knowledge in this field. Today, we are better able to create the tools and services of our dreams, and then provide them to you. Following the release of our AI Studio and and AI Marketplace, we are accompanying you from R&D to production, with the OVHcloud Serving Engine.

I encourage you to read our next blog post about it! Stay tuned…