Humans and machines, such a beautiful pairing. (Image via Sdtimes.com)
A monetized mobile app that crowdsources the digitization of Arabic text recently placed first at NYU Abu Dhabi’s International Hackathon for Social Good in the Arab World.
Named Arabic Snippets, the app is an example of a technology designed to improve Arabic computational linguistics - an underdeveloped field that’s brimming with potential commercial opportunities.
Arabic Snippets helps digitize Arabic language books and documents by streamlining the process of turning scanned text or images into searchable digital content - a basic digitization technique. Part of how Arabic Snippets accomplishes this task includes basic elements of natural language processing (NLP) which is a form of machine learning, or artificial intelligence.
NLP tools enable computers to independently understand and communicate through human languages like Arabic. As computers were designed to communicate via straightforward programming languages, they have trouble deciphering human languages, which are rife with slang and different dialects.
How NLP works?
Enabling computers to understand and process Arabic creates numerous commercial opportunities as they can perform complicated - and traditionally expensive - tasks faster than their human overlords. Some areas NLP can innovate are translation, big data analysis, newsgathering, e-health, social media analysis, voice recognition, and digitization.
Globally, tech giants like Facebook, Google, and Microsoft are investing in various forms of NLP.
The winning team that developed the Arabic Snippets mobile application at Hackathon 2016. (Image via NYUAD)
Arabic Snippets uses NLP by independently recognizing and clustering annotated Arabic letters, words, and phrases and reorganizing them into full sentences and texts without human supervision, meaning scanned texts don’t need to be manually digitized word-by-word each and every time.
“We used a simple technique in the hackathon to do this and got 98.6 percent accuracy,” said Nizar Habash, an NYUAD associate professor of computer science who mentored the Arabic Snippet team. “We think we can do better with more NLP resources.”
But advanced NLP resources are an issue - especially for complex languages like Arabic.
“Even in English NLP is not very accurate yet,” said Mohanad Fors, the editor and cofounder of TechBel3arabi, an Arabic-language tech news platform that has experimented with NLP via machine translation.
“I believe today the nearest technology that is good is Skype’s instant translation. It is about 60 percent accurate,” said Fors, referencing Skype Translator's instant Arabic translation feature, released earlier this year.
Arabic is spoken by nearly 300 million people worldwide, but has received comparatively little attention in modern computational linguistics, despite a growing demand.
Some of the functions of Crowd Analyzer's services. (Image via Crowd Analyzer)
Arabic is among the top five languages used on Google Translate out of the 32 it currently supports, according to Google. But even Google needs ordinary, slow humans to help build its Arabic translation capabilities. Through its Translate Community, Google relies on crowdsourcing manual annotations and translations from volunteer contributors as a way to improve the quality of the platform. Like digitizing Arabic texts, this is a tedious process that Arabic NLP can help improve, and the time is ripe for innovations in the field to occur.
“Arabic content is exponentially growing on the web,” said Bahaa Galal, the cofounder of Crowd Analyzer, a UAE based social media monitoring platform that utilizes NLP. “Arabic speaking people will be more likely to use Arabic language in their technology related activities. So building a company around Arabic NLP techniques will be more needed than before.”
“There has been a lot of work, but more is still needed,” Habash told Wamda. He has also written a book on the topic. “Arabic Snippets, if largely used, will contribute by creating large data sets that can be used for building better systems.”
Apps like Arabic Snippets are just scratching the surface.
The Arabic NLP landscape
Founded in 2013, Crowd Analyzer helps businesses monitor, search, and analyze what's happening on social media.
“Crowd Analyzer as a social media monitoring platform has been built around Arabic NLP,” said Galal. “We have studied the Arabic language and its dynamics so we can create a true form of artificial intelligence that can understand Arabic language as humans do.”
The platform can understand English and Arabic text automatically in terms of relevancy, dialect and sentiment. Galal said they can basically determine whether or not the person posting on social media was happy, sad or neutral about the topic they were posting about.
Crowd Analyzer then uses the data gathered to produce personalized insights for its customers.
The Sakhr software timeline. (Image via Sakhr)
The realm of machine translation is seeing activity as well. An example is Curras, a translation portal created by Palestine’s Birzeit University and developed to help build IT applications for automatic translation, search and retrieval, spell-checking, speech recognition.
Another longtime player in the field is Kuwait’s Sakhr, which since 1982 has produced software for Arabic machine translation, speech recognition, speech synthesis, and optical character recognition.
Elsewhere, the US-based e-health startup X2AI used Arabic NLP to create a chatbot designed to engage in personalized text message conversations with Syrian refugees to help them cope with psychological and emotional trauma. X2AI partnered with Field Innovation Team, an NGO delivering tech-enabled humanitarian assistance, to distribute the chatbot to refugees and aid workers in MENA.
And then there’s the aforementioned Skype Translator, which uses an Arabic NLP tool to translate voice-chats and instant messages in real-time between Arabic and other languages. Currently, Skype Translator only supports Modern Standard Arabic.
Plenty of challenges ahead
Despite its many uses, Arabic NLP technology still needs time and patience to develop.
Tech Bel3arabi is a tech news platform and they regularly Arabize technical terms with explanations rather than just literal translations. “We tried to use natural language processing,” said founder Mohanad Fors. “In some languages [it] can give up to 70 percent accuracy, but we were not as lucky.”
Their attempts to use NLP to translate full articles resulted in nearly unintelligible texts. “I didn't understand 10 percent of the article,” said Fors. “We keep exploring, but now we depend 100 percent on manual translation with many revisions.”