Text Recognition with ML Kit
ML Kit gives developers the ability to implement text recognition into their apps. When using an API to develop your HMS-powered app, you'll have two different options. The text recognition API can be on-device or in-cloud. The on-device service will allow you to recognize Simplified Chinese, Japanese, Korean, and Latin-based languages (including English, Spanish, Portuguese, Italian, German, French, Russian, and special characters. The in-cloud API is more robust and allows you to recognize a wider variety of languages including Simplified Chinese, English, Spanish, Portuguese, Italian, German, French, Russian, Japanese, Korean, Polish, Finnish, Norwegian, Swedish, Danish, Turkish, Thai, Arabic, Hindi, and Indonesian.
The text recognition service is able to recognize text in both static images and dynamic camera streams with a host of APIs, which you can call synchronously or asynchronously to build your text recognition-enabled apps.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Using the ML Kit demo APK, you can see this technology in action. The app is quick to accurately recognize any text your camera is pointed at. It takes less than a second for large text blocks to be converted into an actual text input for your phone. Translation features are also impressively fast, being able to read your words back to you in another language of your choice. This APK shows the extent to which this kit can be used, and makes development so much easier for these features.
How Developers are Implementing Text Recognition
There are many different ways that developers are taking advantage of ML Kit's text recognition. The ability to point your phone at some text and save it to your device opens many possibilities for great app ideas. You can use text recognition to quickly save the information off of a business card, translate text, create documents, and much more. Any situation where you can avoid requiring users to manually input text should be taken advantage of. This makes your app easier and quicker to use.
Whether a developer uses the on-device API or the in-cloud API depends on the needs of their app. The on-device API lets you add real-time processing of images from the camera stream. This means a user will be able to point their camera at some text, and the phone will be able to use ML Kit to recognize that text in real-time. The on-cloud API is better for high-accuracy text recognition from images and documents, but won't be able to complete real-time recognition from a camera.
Developer Resources
Huawei provides plenty of documentation and guides to help you get started with ML Kit's text recognition. You can get started with this guide here.
For all of the functions of ML Kit, refer to their service portal here.
For an overview of their APIs, browse the comprehensive resource library here.
You can also look at different ways that ML Kit can be implemented, by seeing a collection of sample codes here.
Related
HMS ML Kit- Text to Speech
With HMS ML Kit Huawei ensures that developers have a simple way to implement text-to-speech features into their app. Text to speech has the ability to turn text content into a natural spoken voice. This service uses the deep neural network (DNN) synthesis mode and can be quickly integrated through the on-device SDK to generate audio data in real-time. It supports the download of offline models. In the current version, two standard male voices and six standard female voices are available. This service is available globally, to developers across the world.
Using the ML Kit demo APK, you can see this technology in action.
How Developers are Implementing Text to Speech
Some of the more common areas in which TTS is used can be found in broadcasting, news, voice navigation, and audio reading. Developers will be able to use this feature to allow their users to convert large amounts of text, into speech output. HMS TTS also works seamlessly with navigation data, which enables developers to create powerful navigation apps. ML Kit also understands how to synthesizes the voice segment into the navigation voice, so that navigation is more personalized.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
TTS is currently only available for users who have a Huawei device. Some of the limitations of this service include a cap on the amount of text that can be read, at 500 characters. Currently, TTS in French, Spanish, German, and Italian is deployed only in Asia, Africa, Latin America, and Europe. TTS depends on on-cloud APIs. During commissioning and usage, ensure that the device can access the Internet. Default specifications of the real-time output audio data are as follows: PCM mono, 16-bit depth, and 16 kHz audio sampling rate.
Developer Resources
Huawei provides plenty of documentation and guides to help you get started with ML Kit's Real-Time Transcription. You can get started with HMS ML Kit- Text to Speech with this guide here.
For all of the functions of ML Kit, refer to their service portal here.
For an overview of their APIs, browse the comprehensive resource library here.
You can also look at different ways that ML Kit can be implemented, by seeing a collection of sample codes here.
Dear Translate on HMS vs Google Translate on GMS
Being able to translate language can come in very handy when you're traveling to different countries. Smartphones have the ability to translate speech, text, or photos in real-time. There are many different apps that offer these features but we are going to focus on Dear Translate and Google Translate.
While Google Translate is one of the most popular solutions for this, it's not available to newer Huawei phones that don't support GMS. Dear Translate is an HMS alternative that is available for free from the Huawei AppGallery.
Google Translate on GMS
Google Translate offers instant translation from many different types of inputs. It works with text input, camera access, or audio input. Pointing your camera at a sign or literature from another language will let Google Translate translate the text into your chosen language. To use the audio input, you can just speak your sentence into your phone, and then let your phone read it out in the target language.
When using Google translate to communicate in real-time, the best feature to use is called "Conversation". This feature lets you choose two languages and provides you with an interface designed to be used with both parties. Each person presses their microphone icon when they speak, and the language is translated for both people.
For offline translation, you can download 59 different languages, for using your app in locations where the internet is not available.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Google Translate Features:
Text translation: Translate between 103 languages by typing
Tap to Translate: Copy text in any app and tap the Google Translate icon to translate (all languages)
Offline: Translate with no internet connection (59 languages)
Instant camera translation: Translate text in images instantly by just pointing your camera (88 languages)
Photos: Take or import photos for higher quality translations (50 languages)
Conversations: Translate bilingual conversations on the fly (43 languages)
Handwriting: Draw text characters instead of typing (95 languages)
Phrasebook: Star and save translated words and phrases for future reference (all languages)
Cross-device syncing: Login to sync phrasebook between app and desktop
Dear Translate on HMS:
If you're looking for an HMS alternative to Google Translate, Dear Translate might be your best option. This free app is available on AppGallery and is currently ad-free. It supports many similar features to Google Translate like text translate, camera-based AR translate, voice translate, and more.
The app is missing a feature comparable to the conversation feature in Google Translate. This is the only feature that I found myself missing from the Dear Translate app, and you'll be able to function without it. Overall Dear Translate offers more compatible languages than Google Translate for the features that it supports.
Dear Translate Features:
Support translation in 107 languages, meeting your need for translation when studying, working, going abroad, and traveling.
Languages: support translation in 107 languages such as English, Chinese, Japanese, Korean, French, Russian, and Spanish, covering 186 countries.
Text translation: translate the text you type in, and support translation in 107 languages.
AR translation: translate upon scanning with a camera, with no need of shooting.
Simultaneous translation: with streaming speech recognition, translate while you are speaking.
Photo translation: with the powerful function of camera-based OCR word capturing and photo translation, translate what you shoot.
Emotion translation: funny emotion translation, making the translation more interesting.
Off-line translation: a free dictionary translation application supporting off-line translation, to translate when you are traveling abroad and network connection is unavailable.
Dear Translate is a great free app that is a nice solution for anyone who needs to translate languages using their Huawei phone. The app really highlights the abilities of HMS Core.
Interesting and useful. I didn't know it.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Our lives are now packed with advanced devices, such as mobile gadgets, wearables, smart home appliances, telematics devices, and more.
Of all the features that make them advanced, the major one is the ability to understand user speech. Speaking into a device and telling it to do something are naturally easier and more satisfying than using input devices (like a keyboard and mouse) for the same purpose.
To help devices understand human speech, HMS Core ML Kit introduced the automatic speech recognition (ASR) service, to create a smoother human-machine interaction experience.
Service IntroductionASR can recognize and simultaneously convert speech (no longer than 60s) into text, by using industry-leading deep learning technologies. Boasting regularly updated algorithms and data, currently the service delivers a recognition accuracy of 95%+. The supported languages now are: Mandarin Chinese (including Chinese-English bilingual speech), English, French, German, Spanish, Italian, Arabic, Russian, Thai, Malay, Filipino, and Turkish.
Demo
Use CasesASR covers many fields spanning life and work, and enhances recognition capabilities of searching for products, movies, TV series, and music, as well as the capabilities for navigation services. When a user searches for a product in a shopping app through speech, this service recognizes the product name or feature in speech as text for search.
Similarly, when a user uses a music app, this service recognizes the song name or singer input by voice as text to search for the song.
On top of these, ASR can even contribute to driving safety: During driving — when users are not supposed to use their phone to, for example, search for a place — ASR allows them to speak out where they want to go and converts the speech into text for the navigation app which can then offer the search results to users.
Features
Real-time result output
Available options: with and without speech pickup UI
Endpoint detection: Start and end points of speech can be accurately located.
Silence detection: No voice packet is sent for silent parts.
Intelligent conversion of number formats: For example, when the speech is "year two thousand twenty-two", the text output by ASR will be "2022".
How to Integrate ML Kit?
For guidance about ML Kit integration, please refer to its official document. Also welcome to the HUAWEI Developers website, where you can find other resources for reference.
The translation service from HMS Core ML Kit supports multiple languages and is ideal for a range of scenarios, when combined with other services.
The translation service is perfect for those who travel overseas. When it is combined with the text to speech (TTS) service, an app can be created to help users communicate with speakers of other languages, such as taking a taxi or ordering food. Not only that, when translation works with text recognition, these two services help users understand menus or road signs, simply using a picture taken of them.
Translation Delivers Better Performance with a New Direct MT SystemMost machine translation (MT) systems are pivot-based: They first translate the source language to a third language (named pivot language, which is usually English) and then translate text from that third language to the target language.
This process, however, compromises translation accuracy and is not that effective because it uses more compute resources. Apps expect a translation service that is more effective and more accurate when handling idiomatic language.
To meet such requirements, HMS Core ML Kit has strengthened its translation service by introducing a direct MT system in its new version, which supports translation between Chinese and Japanese, Chinese and German, Chinese and French, and Chinese and Russian.
Compared with MT systems that adopt English as the pivot language, the direct MT system has a number of advantages. For example, it can concurrently process 10 translation tasks with 100 characters in each, delivering an average processing speed of about 160 milliseconds — a 100% decrease. The translation result is also remarkable. For example, when translating culture-loaded expressions in Chinese, the system manages to ensure the translation complies with the idiom of the target language, and is accurate and smooth.
As an entry to the shared Task: Triangular MT: Using English to improve Russian-to-Chinese machine translation in the Sixth Conference on Machine Translation (WMT21), the mentioned direct MT system adopted by ML Kit won the first place with superior advantages.
Technical Advantages of the Direct MT SystemThe direct MT system leverages the pioneering research of Huawei in machine translation, while Russian-English and English-Chinese corpora are used for knowledge distillation. This, combined with the explicit curriculum learning (CL) strategy, gives rise to high-quality Russian-Chinese translation models when only a small amount of Russian-Chinese corpora exists — or none at all. In this way, the system avoids the low-resource scenarios and cold start issue that usually baffle pivot-based MT systems.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Direct MT
Technology 1: Multi-Lingual Encoder-Decoder Enhancement
This technology overcomes the cold start issue. Take Russian-Chinese translation as an example. It imports English-Chinese corpora into a multi-lingual model and performs knowledge distillation on the corpora, to allow the decoder to better process the target language (in this example, Chinese). It also imports Russian-English corpora into the model, to help the encoder better process the source language (in this example, Russian).
Technology 2: Explicit CL for Denoising
Sourced from HW-TSC's Participation in the WMT 2021 Triangular MT Shared Task
Explicit CL is used for training the direct MT system. According to the volume of noisy data in the corpora, the whole training process is divided into three phases, which adopts the incremental learning method.
In the first phase, use all the corpora (including the noisy data) to train the system, to quickly increase its convergence rate. In the second phase, denoise the corpora by using a parallel text aligning tool and then perform incremental training on the system. In the last phase, perform incremental training on the system, by using the denoised corpora that are output by the system in the second phase, to reach convergence for the system.
Technology 3: FTST for Data AugmentationFTST stands for Forward Translation and Sampling Backward Translation. It uses the sampling method in its backward model for data enhancement, and uses the beam search method in its forward models for data balancing. In the comparison experiment, FTST delivers the best result.
Sourced from HW-TSC's Participation in the WMT 2021 Triangular MT Shared Task
In addition to the mentioned languages, the translation service of ML Kit will support direct translation between Chinese and 11 languages (Korean, Portuguese, Spanish, Turkish, Thai, Arabic, Malay, Italian, Polish, Dutch, and Vietnamese) by the end of 2022. This will open up a new level of instant translation for users around the world.
The translation service can be used together with many other services from ML Kit. Check them out and see how they can help you develop an AI-powered app.
Optical character recognition (OCR) technology efficiently recognizes and extracts text in images of receipts, business cards, documents, and more, freeing us from the hassle of manually entering and checking text. This tech helps mobile apps cut the cost of information input and boost their usability.
So far, OCR has been applied to numerous fields, including the following:
In transportation scenarios, OCR is used to recognize license plate numbers for easy parking management, smart transportation, policing, and more.
In lifestyle apps, OCR helps extract information from images of licenses, documents, and cards — such as bank cards, passports, and business licenses — as well as road signs.
The technology also works for receipts, which is ideal for banks and tax institutes for recording receipts.
It doesn't stop here. Books, reports, CVs, and contracts. All these paper documents can be saved digitally with the help of OCR.
How HMS Core ML Kit's OCR Service WorksHMS Core's ML Kit released its OCR service, text recognition, on Jan. 15, 2020, which features abundant APIs. This service can accurately recognize text that is tilted, typeset horizontally or vertically, and curved. Not only that, the service can even precisely present how text is divided among paragraphs.
Text recognition offers both cloud-side and device-side services, to provide privacy protection for recognizing specific cards, licenses, and receipts. The device-side service can perform real-time recognition of text in images or camera streams on the device, and sparse text in images is also supported. The device-side service supports 10 languages: Simplified Chinese, Japanese, Korean, English, Spanish, Portuguese, Italian, German, French, and Russian.
The cloud-side service, by contrast, delivers higher accuracy and supports dense text in images of documents and sparse text in other types of images. This service supports 19 languages: Simplified Chinese, English, Spanish, Portuguese, Italian, German, French, Russian, Japanese, Korean, Polish, Finnish, Norwegian, Swedish, Danish, Turkish, Thai, Arabic, and Hindi. The recognition accuracy for some of the languages is industry-leading.
The OCR service was further improved in ML Kit, providing a lighter device-side model and higher accuracy. The following is a demo screenshot for this service.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
How Text Recognition Has Been ImprovedLighter device-side model, delivering better recognition performance of all supported languages
The device-side service has downsized by 42%, without compromising on KPIs. The memory that the service consumes during runtime has decreased from 19.4 MB to around 11.1 MB.
As a result, the service is now smoother. It has a higher accuracy for recognizing Chinese on the cloud-side, which has increased from 87.62% to 92.95%, higher than the industry average.
Technology SpecificationsOCR is a process in which an electronic device examines a character printed on a paper, by detecting dark or light areas to determine a shape of the character, and then translates the shape into computer text by using a character recognition method. In short, OCR is a technology (designed for printed characters) that converts text in an image into a black-and-white dot matrix image file, and uses recognition software to convert the text in the image for further editing.
In many cases, image text is curved, and therefore the algorithm team for text recognition re-designed the model of this service. They managed to make it support not only horizontal text, but also text that is tilted or curved. With such a capability, the service delivers higher accuracy and usability when it is used in transportation scenarios and more.
Compared with the cloud-side service, however, the device-side service is more suitable when the text to be recognized concerns privacy. The service performance can be affected by factors such as device computation power and power consumption. With these in mind, the team designed the model framework and adopted technologies like quantization and pruning, while reducing the model size to ensure user experience without compromising recognition accuracy.
Performance After UpdateThe text recognition service of the updated version performs even better. Its cloud-side service delivers an accuracy that is 7% higher than that of its competitor, with a latency that is 55% of that of its competitor.
As for the device-side service, it has a superior average accuracy and model size. In fact, the recognition accuracy for some minor languages is up to 95%.
Future UpdatesMost OCR solutions now support only printed characters. The text recognition service team from ML Kit is trying to equip it with a capability that allows it to recognize handwriting. In future versions, this service will be able to recognize both printed characters and handwriting.
The number of supported languages will grow to include languages such as Romanian, Malay, Filipino, and more.
The service will be able to analyze the layout so that it can adjust PDF typesetting. By supporting more and more types of content, ML Kit remains committed to honing its AI edge.
In this way, the kit, together with other HMS Core services, will try to meet the tailored needs of apps in different fields.
ReferencesHMS Core ML Kit home page
HMS Core ML Kit Development Guide