OVH NEWS | THE LATEST ON IT INNOVATIONS AND TRENDS


Discover, understand, anticipate












03/04/2017
Share

Report written by Hugo Bonnaffé


SYSTRAN – Translation Revolutionized Through Neural Networks.


Created in 1968 by Dr. Peter Toma, a Hungarian linguist, SYSTRAN has significantly contributed to the history of translation solutions, from mainframe platforms supplied to the US Air Force, the European Union, NASA… to the first Internet portals – Babelfish, Google, Yahoo!... which were all using SYSTRAN technology. The company, which now flies the Korean flag, has most of its R&D done in Paris. That’s where a new and unprecedented translation engine was created: it works with neural networks and runs with processing capacities previously unseen, provided by GPU servers. Interview.





Who is SYSTRAN?


In 2014, SYSTRAN was acquired by CSLi, a South Korean company that changed its name to SYSTRAN International. The company has its headquarters in Seoul with two regional offices, one in Paris and another in San Diego. The Paris office is also home to the company’s R&D. The SYSTRAN group now has 200 employees across these 3 sites, with about 100 research and development engineers and language specialists.
SYSTRAN's automatic translation solutions enable companies to improve their multilingual communication and productivity in many areas.
They are more commonly used for internal collaboration, for the management and interpretation of Big Data, eDiscovery (being the search for evidence in electronically stored documents, as part of a legal proceeding), content management, customer support and e-commerce. With over 140 pairs of available languages, SYSTRAN is the reference translation technology for multinational companies, Defense and Security organizations and translation agencies. SYSTRAN translation software is quickly and easily customizable to a specific domain meaning to a specific terminology used by engineering, legal, manufacturing, IT, etc.







In concrete terms, what are your solutions made up of?


SYSTRAN’s solutions equip professionals with efficient and secured multilingual communication tools, customized to their needs and work environment.
Our flagship product, SYSTRAN Enterprise Server (SES), provides access, anywhere, anytime, to the power of our translation engines via a web interface, which also features platform administration capabilities. The platform can be hosted by us in the Cloud (particularly at OVH), or located within the customer’s IT environment. In both cases, security, data integrity, respect for intellectual property and personal data are top priorities. Many of our large-account customers turn to SYSTRAN to keep confidential data from being delivered to free internet players by their own employees, with no guarantee as to how the data would be used.
In some instances, the customer’s needs go beyond our off-the-shelf offering: this is why we have an API, which enables the creation of elaborate solutions, that take advantage of SYSTRAN technology (e.g. integration into a CRM tool).
Our API is also available in SaaS mode, a method that meets the needs of players whose translation needs may vary over time in terms of volume, and/or are unpredictable.
In some instances, the integration work has already been done, by ourselves or the corresponding partner: in that case, add-ons are ready for use (e.g. add-on for Kcura’s e-Discovery solution).
On the other hand, some customers wish to develop a completely new product around SYSTRAN’s technology: for those, our Software Development Kits (SDK) are the right choice.
Last, a desktop version is also available, for individual users or SoHo’s.
Our Professional Services are designed to accompany our customers in the adoption & integration of our products. They include training, change management, installation assistance, as well as the customization of the work environment, terminology, language pairs etc.
As a conclusion, our portfolio of products & services aims at being as broad and complete as possible, so as to accompany our customers on the path to success in their international development. As an example, a typical SYSTRAN project can be deployed on all continents, with 40+ language pairs and 80,000+ users.



Who are your customers?


SYSTRAN’s market is by nature global and multi-sectorial. Thus, our clientele comprises both SME’s and large accounts, private & public. Here are a few examples: Adobe, PSA, Ford, Claas, Boehringer Ingelheim, Lombard Odier, Société Générale, Petit Futé, Symantec, Hewlett Packard Enterprise, Cisco, PwC, Xerox Litigation Services…



On the market for translation solutions, the existing offers and the technologies they use are quite similar. What makes SYSTRAN stand out from the competition, free or not?


Our strengths are both on the Products and Services sides.
From a Products point of view, we invest heavily into R&D, so as to always keep a leading edge. As a result, we were the first to release a commercial product embedding Neural Machine Translation technology – branded Pure Neural TM Machine Translation (PNMTTM).
We also put emphasis on quality control, with a dedicated team.
Our ability to customize tools, with new language pairs but also specialized terminology, makes our products far more compelling than all-purpose solutions, which are not aligned with the customer’s industry.
Besides, security is one of our obsessions: the servers hosting SES are either located in secured data centers, or used offline. From that perspective, partners like OVH are a perfect pick to meet challenging security requirements.
Last, our ability to accompany our customers with Professional Services, including customization, sets us apart from players offering only off-the-shelf products.







Let’s talk about this Neural Machine Translation engine: how does it work?


PNMTTM is, from a technological point of view, totally different from the previous generations of Machine Translation technology. Based on Deep Neural Networks, there is no explicit language knowledge representation; this is a huge difference with the Rule-Based approach, which relies on a rules database (dictionary entries for instance), or the Statistical Machine Translation approach, which is based on a sentence fragments database. As in a human brain, the language knowledge is coded in the connections between the artificial neurons – and these connections are automatically learnt and adjusted during the training phase of the system (exactly like a human language learning phase). As a matter of fact – the PNMTTM engine acquires a lot of knowledge from various data sources, that none of the previous technologies was able to capture automatically, such as semantics, stylistics, gender, positive/negative words…



What new possibilities does your Neural Machine Translation engine bring?


For the first time, we have a technology that can understand sentences in their context and translate them with a higher quality than most non-native speakers. We see it as a technology that will become a necessary language assistant for anybody involved in travel, foreign language communication (emails, chat, scientific paper writing…). We also do some research around underlying applications as language learner assistants or multilingual Chatbots…
In the end, the subject will no longer be translating but rather communicating in multiple languages without losing the nuances and specificities that define the wealth of a language.
You may test the quality or generic neural translation yourself by using our demo server, available at: https://demo-pnmt.systran.net/



When talking about machine learning, they often say that the quality of the data applied to the algorithms during the learning process is as important as the quality of the algorithms themselves. You’ve designed certain text corpora to be fed into your translation engine neural networks: where do they come from?


The corpus we use primarily comes from free and open sources, originating either from institutions (e.g. United Nations, the European Union, the European Central Bank, the Parliament of Canada, or Patent Offices) or from communities (e.g. OpenSubtitles), to name a few. We also build our own corpora for specific domains and may also buy it from specialized agencies. Then of course, our customers have the possibility to use their own private translation memories to specialize their translations.



What’s the infrastructure you’ve set up at OVH to host this Neural Machine Translation engine?


Our infrastructure comprises 2 front-ends and several compute nodes (translation nodes). The front-ends use the EG-64-S infrastructure server: S :64Gb RAM, E5-1650v2 CPU, SoftRaid 2x4 To disk, while the compute nodes comprise a mix of CPU servers: HG Infrastructure (Intel 2x Xeon E5-2640v3, 16c/32t - 2,6 GHz/3,4 GHz, 256 Gb de RAM, 2x4 To HD) and GPU server : GPU-4X-1080 : 128G 2xE5-2630v3 240 Go SSD 4xGTX 1080.



What would be the added value of GPU servers for your industry, as opposed to using CPU-equipped servers?


The GPU servers enable faster translations. The translation speed using GPU is currently 3 times faster than with CPU cores. The limiting factor with GPU technology is GPU RAM (on average, each model uses 2GB of GPU RAM) and its calculation power (8.9TFlops). GPU servers also enable the creation of translation models. Those models are much more compact in size, and produce better quality translations.



How do you imagine the future for your industry? Is an oral conversation being instantly and simultaneously translated still science fiction?


We aim to bring to market specialized solutions that are based on our expertise in the field of language processing. This goes beyond the “simple” ability to produce automatic and generic translation features.
For business in general, it means that a language tool will be part of all processes of any global company and that French, English and Chinese-speaking people will be able to truly and seamlessly communicate together, each of them in their own language. Today, existing solutions are applied at the end of the process – e.g. when you receive a foreign email, an RFP from China, etc., you realize that you need it to be translated, but these solutions are not integrated at the very core of the process: while the document is being produced, or while you’re having a phone conference call, or when you need to comment on a document in a foreign language, etc.
The possibilities offered by Artificial Intelligence and the algorithms that we’ve industrialized open way to unlimited possibilities: soon, we will be able to use the same neural networks to facilitate teaching foreign languages and to produce content directly in a multilingual fashion. It will also be possible for one to speak in his/her mother tongue, and be understood by a foreign party, thanks to a connected earplug. Far from science fiction, this scenario is very close to becoming a reality… short term. There is more to come with Artificial Intelligence, that will surprise us all and enrich our professional experience. This is just the beginning.