Another important factor in media localization: language-specific unit conversions
Have you seen the movie <Black Panther: Wakanda Forever>?
A sequel to the 2018 release of Black Panther, this film is an American superhero film based on Marvel Comics featuring the character Shuri. This film has caught the eye of many as a tribute to the late actor Chadwick Boseman, who played the lead character in the first Black Panther film.
Let me brief you on what the movie is about.
T'Challa, King of Wakanda, is dead from a terminal illness, and his little sister, Shuri, unsuccessfully attempts to bring him back to life by recreating the "Heart Shaped Herb," a fictional herb that gives anyone with Wakandan royal blood supernatural abilities. A year later, Wakanda faces international pressure to share its Vibranium and comes into conflict with Namor, who opposes it. With the help of Okoye and a CIA agent named Everett K. Ross, Shuri finds and collaborates with an MIT student named Riri Williams (Ironheart), who has developed a Vibranium detector. After a battle between Wakandan and Namor’s forces, Shuri becomes the new Black Panther, and the movie ends with her forming a peace treaty with Namor.
Halfway through the movie, when Shuri and Ironheart are evading U.S. intelligence, there is a line where Shuri asks Ironheart how high the surveillance drones are. And Ironheart replies:
Basically, 9,000 meters and 30,000 feet mean the same thing, but you can see that different languages use different units. This is the result of applying "localization*" for countries that use their own language-specific units. If the Korean subtitle says 30,000 feet, most viewers won’t be able to understand the meaning or realize the actual height. Localizing your media requires proper conversion of units used uniquely in different languages, which is an important factor in helping viewers of that media understand your content.
- Localization: Going beyond translating ‘word-for-word’ or ‘sentence-for-sentence’ but translating appropriately based on each country's specific culture and language.
It is not easy to convert units for different countries and languages. As you can see from the example above, in addition to using different units, there are also differences in number representation, spacing, and how sentences are worded. If you look at the above subtitles closely, you'll see that in Korean, numbers and units are followed by a verb. And in French, it doesn’t use “,” as a thousands separator, but uses a “ “(space) instead. The Japanese language doesn't do spacing. And in English, numbers are characterized in Arabic. Arabic numerals are the ten symbols most commonly used to write decimal numbers: 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9.
As you can see, there are many factors to consider when creating an efficient localized translation. In this blog post, we will go through how XL8 provides “localized” translations that understand specific units for each language.
Training the Machine Translation engine to learn language and country-specific units
To increase the accuracy of our machine translation, XL8 has added language-specific units to its translation engine. We have initially trained 32 languages* that are most commonly used by our customers, and we plan to add more language-specific units in the future. When adding units, we have taken into account the characteristics of each country. In particular, we have focused on implementing more accurate language-specific representations. Units are most often expressed in a "number + unit" format. We have reflected this format to enable accurate Machine Translation. In addition, by studying actual data, we are able to understand the way locals express themselves and are then able to reflect that directly into our translation.
- Arabic, Brazilian-Portuguese, Burmese, Chinese (Simplified), Chinese (Traditional), Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, LATAM Spanish, Malaysian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, and Vietnamese.
Reflecting country-specific characteristics
In Korea, there is a unit called ‘Geun’ (“catty”), which is a unit that is used in East Asia. We have added the unit "斤", for Simplified and Traditional Chinese, "cân" for Vietnamese, and "kati" for Malay. Although it is the same unit, the weight varies slightly from country to country, with mainland China using 500g, Taiwan using 600g, and Vietnam using 605g. In the UK the "Imperial Pint," at 20 British ounces is greater than the US Pint, which is 16 US ounces. Adding in the difference between British ounce measurement and US ounce measurement, the British, or Imperial Pint is 20% larger than the US pint. So even though it is the same unit, we reflected the appropriate weight for each country in our translation engine, allowing the viewers to experience a more "localized" translation.
Reflecting language-specific characteristics
Going back to the subtitles for ‘Black Panther: Wakanda Forever’, you can see that language-specific units come in many forms. If you don't take those linguistic idiosyncrasies into account, you end up not being able to effectively render units for fusional languages, or even not recognizing a word at all, depending on the spacing. So we have categorized each language as an agglutinative language, a fusional language, an isolated language, and treated each separately to reflect the characteristics of each language. For agglutinative languages, for example, we've trained our engine to recognize when a unit is followed by an affix* to produce a "localized" translation. For fusional languages, we made sure to recognize units correctly even when their form changes. For isolated languages, we made sure that units could be translated correctly in sentences without spaces. And for other languages, we trained it to understand different forms of units based on their characteristics.
- Affix: morpheme that is attached to a word stem to form a new word or modify its meaning
Reflecting language-specific number notation
Language not only shows differences in characters, but also in numbers. For example, when representing the number "1200400.1", in Korean it would be "1,200,400.1", using "," as the thousands separator and "." as the decimal separator. French however, does this differently. It uses “ “(space) as the thousands separator, and uses “,” as the decimal separator. So French displays this number as “1 200 400,1”. Some languages do not use the thousands separator at all, or use their own unique character to do so. At XL8, we created a number format for each specific language to reflect these characteristics, and tested it with sentences from each respective language to ensure that each language accurately reflects its own characters for representing numbers.
Reflecting real data
If a unit is simply translated it will not be a good Machine Translation. A good Machine Translation should reflect the words that are actually spoken by the “locals”. For example, translating “1 degree Celesius” into Korean doesn’t make sense, because Korea uses Celsius by default. So in Korean, it will not be displayed as “1 degree Celesius”, but usually as just “1 degree”. We look for these kinds of different unit use cases in real-world data to provide "localized" translations.
To provide the perfect localization output, the units “learned” by our engine are double-checked through the conversion process
When training our engines, units per language goes through the normal first round of checks before it is included in the output. But, for units’ measurement it also goes through something called the “conversion” process. As such, we have added a secondary check to make sure this works properly. The secondary check involves the following process: First it checks that the numbers and units are entered correctly. Units are categorized by volume, weight, length, and so on. So we then need to make sure that the units in the two languages you want to convert are in the same category. After this, we need to check to make sure that the conversion accurately reflects the unit values, and that there are no errors along the way. By following this process, we can ensure that the units are expressed properly and will be used in the actual translation process.
XL8's localization strategy for providing the most accurate media translations
XL8 is the only company in the world that provides Machine Translation specialized for colloquial media entertainment content. We have received many questions from people about how our translation engine can translate content in the most localized way possible. We've blogged about a number of features, including context awareness, age and gender estimation, glossary utilization, and in this post we have talked about converting units that are optimized for each specific language.
Translation isn't just about swapping languages. A deep understanding of a country’s culture is a must in order to fully convey the creator’s intentions, which will help the viewers immerse themselves in your content. XL8 has worked diligently to ensure that our translations are language- and country-specific.
Stay tuned, as we will continue to deliver “localized” translations that will allow your viewers to feel the original emotion of your content!
Written by Dino Jun, Intern of Research team at XL8
Edited by Rosa Lee, Marketing Specialist at XL8