Rest of World introduces machine translation for its stories
Our stories are now available in Spanish and Portuguese using Google’s Machine Translation technology.
Why we did it and what our goals were
You might have noticed there’s a shiny, new “translate” button on our articles that allows you to quickly switch to the article in Spanish and Portuguese using Google Translate.
At Rest of World, our aims are to reach as many people as possible with our journalism, and make our site and stories accessible to everyone regardless of their geography, circumstances, or language. Our current readers largely speak English, but our website survey earlier this year revealed that respondents also read a mix of Spanish (14%), Portuguese (5%), Hindi (4.5%) and Mandarin (4.8%) among other languages.
We’re a small team with limited resources, so translating every article, or covering specific regions in a native language is not possible for us. In order to work around this constraint we decided to utilize available machine translation technology to meet the language needs of our readers. We recognize that nothing can ever beat human-translated stories, so will continue to use and promote content translated by real humans wherever possible.
Before building our translation feature, we did some research and testing to understand which translation technology offered the best service and quality translations. We have a small product team, and zero data scientists, so we knew we needed to use a third-party translation service to support our goals.
Our research led us to the two leading translation services out there: Amazon Translate and Google Translate, both of which offered similar benefits in terms of cost and functionality. Both services utilize neural machine translation, meaning that the algorithm assesses the document as a whole to interpret the translation, rather than interpreting the content word-by-word or line-by-line. These neural networks are constantly being trained and iterated upon using vast amounts of text based data and inputs from human translators.
Having selected the leading two services, we then ran a blind test with reporters and editors on the team who are native speakers of Spanish and Hindi in order to assess which of the two would meet our acceptable standards test. The unanimous winner was Google Translate across both languages.
Spanish in particular performed very well. The content was generally understood but occasionally some words were being literally translated and some context was lost.
Hindi didn’t perform quite so well, and highlighted some obvious issues when translating directly from English. Although the content was generally understood, some common words used in modern tech writing simply do not exist in Hindi, and the algorithm tried to fill the blanks with complex words and literal translations making some sentences incomprehensible. Additionally, clauses and gender pronouns were frequently mixed up meaning the context of the story was often lost or difficult to understand.
Based on the outcome of our test we opted to build a service with the Google Translate API, and have initially rolled out Spanish and Portuguese machine translations on our site on a trial basis. We haven’t widely tested Portuguese with users, but we’ve made an assumption that because of the semantic similarities between both Latin languages, Portuguese should hopefully work as well. We’ll be live testing this on site and gathering feedback, so if you’re a Portuguese speaker you can let us know if that assumption is correct!
How it works and how we built it
We use WordPress to manage and publish stories on our site. It provides the means to extend functionality and integrate with a service like Google Translate. The first hurdle was to provide a translated variation of almost any story on our site based on the original URL. We used a method to pass the language code as part of the story URL. If the language code matches one we support we kick off the translation process.
When a translated version of an article is called, we check a temporary cache to see if it was translated before. If not, we send the article content to Google’s translation service, save the response temporarily for future requests, and return the translated text. This helps reduce repetitive requests for the same story, and limits the resources we consume and the overall cost. The service ignores HTML, so style information and other elements are preserved. There are exceptions where not all supporting text is translated, and we expect to solve those edge cases over time.
The majority of our English-speaking readers will see a Translate button with a list of supported languages. However, if the preferred language of your browser matches one of those languages, the button will default to a direct link for that language.
For now, translation is possible on all articles published from October 1, 2022 onwards. Over time we will move this date progressively back so as to cover all our published stories.
It’s worth noting that there are other considerations. Search engines do not like automated story pages so you must take care to flag them as such. We also needed to create mechanisms to gather feedback, and to allow our editors to exempt stories where translations raise issues.
As we have engineered it, adding support for new languages is as simple as whitelisting a new language code. For example, adding Irish (Gaeilge) is as simple as adding ‘ga‘ to the support list. For now we are in no hurry to do so – we want to improve our existing efforts and better understand our audience needs.
The content is being directly translated from English with no editorial intervention, so we anticipate the translations won’t be quite right some of the time. However, Google Translate as a service is constantly being improved and iterated on, and we hope to extend the feature to other languages going forward.
We’d love to hear your feedback and how you feel about the service. How have you found the translations? What language do you think we should support next? You can email us firstname.lastname@example.org with your thoughts.