What is Transliteration? How is it different from Translation?
You must be well versed in what Translation means. Simply put, it is the process of converting text from one language to another while keeping the exact meaning intact.
For example, the translation of the Hindi sentence “क्या आपने खाना खा लिया?” to English would be “have you eaten?”
But NeuralSpace also offers Transliteration. What is that then? Transliteration is the conversion of text from one alphabet to another without changing the pronunciation. The characters of the source-alphabet are replaced with similar-sounding characters of the target-alphabet.
For example, taking the same Hindi sentence “क्या आपने खाना खा लिया?”, its transliteration will be “Kya aapne khana kha liya?”. This English-letter sentence has obviously no meaning to an English speaker but when it’s read out loud, a Hindi speaker would perfectly understand it. Why? Because the pronunciation and “sound” is exactly the same as if it was written in the Hindi alphabet.
Why do we even need Transliteration?
The need for creating content in local languages is more relevant than ever. While Internet and technology penetration is increasing year over year, the QWERTY or English/Latin keyboards are the default in many geographies, making local language content creation challenging. For languages that don't use the English/Latin alphabet, e.g., Arabic, Hindi, Punjabi, Sinhala, typing can be challenging as keyboards are often by default set to Latin characters. With Transliteration, you can create content in such languages using English/Latin characters.
The need for AI-powered Transliteration models
When a word is transliterated from one alphabet into another, the result might change depending upon the context in which the word is used. For example, the Hinglish sentence “Kal vo park jaa rahe the” should be transliterated to Hindi as “कल वो पार्क जा रहे थे”. Whereas the sentence “the bag is red” should be transliterated as “द बैग इस रेड”. You can notice how the transliteration of the word “the” is sometimes “थे” and sometimes “द” depending on the context in which it is used.
Usually Transliteration models are made rule-based. They have a mapping for each character in the source language to a character in the target language. This makes them unable to transliterate according to the context in which the word is used.
On the other hand, AI-powered Transliteration models, how they are available at NeuralSpace, have the capability to transliterarate capturing the right context in which the word is used. They can also be trained according to one's custom requirements by fine-tuning an existing Transliteration model.
Integration with Google Maps, Spotify, and other APIs
If you have tried integrating Google Maps or Spotify APIs with your chatbots in, e.g., Tamil, Arabic, Chinese or Greek, you know they rarely produce acceptable results. With Transliteration, you can extract entities like names, addresses, songs, etc. in your local language and convert them into the English/Latin alphabet to fully utilize APIs from Google Maps, Spotify and other international organisations.
Creating NLU Data for Chatbots
A common practice is to create a dataset in English and translate it to another language using Google Translate or similar APIs. While this can work well for simple FAQ chatbots, contextual chatbots require well structured/meaningful data in the local languages. Transliteration-powered typing tools can help accelerate the dataset creation process for chatbots in languages that don’t use the English/Latin alphabet.
Low-Cost Content Accessibility in Multiple Languages
Internationalization/Translation can be extremely expensive when your databases are in English and dynamic, ever-expanding with content like songs, albums, or addresses. Transliteration can be a handy tool to convert such content on the fly to other languages that don’t use the English/Latin alphabet. It can be indexed using popular frameworks like Elasticsearch, or other similar tools and instantly made accessible in multiple languages. This can potentially save thousands of dollars on manual translation efforts.
Something as simple as typing can be extremely challenging when we talk about languages spoken in the Middle East, Africa, India and South-East Asia. With governments and financial institutions mandating inclusion and content creators being more used to the Latin letter keypad, a tool like Transliteration can be very handy to make it easy to create content for local languages.
Sign-up on the NeuralSpace Platform and try out Transliteration on the UI!
Join the NeuralSpace Slack Community to connect with us. Also, receive updates and discuss topics in NLP for low-resource languages with fellow developers and researchers.
Check out our Documentation to read more about the NeuralSpace Platform and its different services.
Happy NLP and cheers!