Introduction:
HMS ML Kit provides diversified leading machine learning capabilities that are easy to use, helping you develop various AI apps,based on ML Kit, we provide various of service, and this article will introduce all of the ML Kit service for the developer in detail.
Text-related Services
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
1. Text Recognition, can extracts text from images of receipts, business cards, and documents. This service is widely used in office, education, transit, and other apps.
2. Document Recognition, can recognize text with paragraph formats in document images. It can extract text from document images to convert paper documents into electronic copies, greatly improving the information input efficiency and reducing labor costs.
3. Bank Card Recognition, can quickly recognize information such as the bank card number, covering mainstream bank cards such as China Union Pay, American Express, MasterCard, Visa, and JCB around the world. It is widely used in finance and payment scenarios requiring bank card binding to quickly extract bank card information, realizing quick input of bank card information.
4. General Card Recognition, provides a universal development framework based on the text recognition technology. It allows you to customize the post-processing logic to extract required information from any fixed-format cards, such as Exit-Entry Permit for Traveling to and from Hong Kong and Macao, Hong Kong identity card, and Mainland Travel Permit for Hong Kong and Macao Residents.
Language/Voice-related Services
1. Translation, can detect the language of text and translate the text into different languages. Currently, this service can translate text online between 21 languages and translate text offline between 17 languages.
2. Language Detection, supports both online and offline modes. Currently, 52 languages can be detected on the cloud and 51 languages can be detected on the device.
3. Automatic Speech Recognition (ASR), can convert speech (no more than 60 seconds) into text in real time. Currently, Mandarin Chinese (including Chinese-English bilingual speech), English, French, German, Spanish, and Italian are supported.
4. Text to Speech (TTS), can convert text information into audio output. Real-time audio data can be output from the on-device API (offline models can be downloaded). Rich timbres, and volume and speed options are supported to produce more natural sounds.
5. Audio File Transcription, can convert an audio file into text, output punctuation, and generate text information with timestamps. Currently, the service supports Chinese and English.
6. Video Course Creator, can automatically create video courses based on courseware and commentaries, reducing video creation costs and improving efficiency.
7. Real-Time Transcription, enables your app to convert long speech (no longer than 5 hours) into text in real time. The generated text contains punctuation marks and timestamps.
8. Sound Detection, can detect sound events in online (real-time recording) mode. The detected sound events can help you perform subsequent actions.
Image-related Services
1. Image Classification, classifies elements in images into intuitive categories, such as people, objects, environments, activities, or artwork, to define image themes and application scenarios.
2. Object Detection and Tracking, can detect and track multiple objects in an image, so they can be located and classified in real time. This is useful for examining and recognizing images.
3. Landmark Recognition, can identify the names and latitude and longitude of landmarks in an image. You can use this information to create individualized experiences for users.
4. Image Segmentation, The image segmentation service can differentiate elements in an image. For example, you can use this service to create photo editing apps that replace certain parts of photos, such as the background.
5. Product Visual Search, searches for the same or similar products by a taken photo from the users in the pre-established product image library, and returns the IDs of those products and related information.
6. Image Super-Resolution, provides the 1x and 3x super-resolution capabilities. 1x super-resolution removes the compression noise, and 3x super-resolution not only effectively suppresses the compression noise, but also provides a 3x enlargement capability.
7. Document Skew Correction, can automatically identify the location of a document in an image and adjust the shooting angle to the angle facing the document.
8. Text Image Super-Resolution, can zoom in an image that contains text and significantly improve the definition of text in the image.
9. Scene Detection, Classify the scene content of images and add annotation information, such as outdoor scenery, indoor places, and buildings, to help understand the image content.
Face/Body-related Services
1. Face Detection, can detect the shapes and features of your user's face, including their facial expression, age, gender, and wearing. You can use the service to develop apps that dynamically beautify users' faces during video calls.
2. Skeleton Detection, detects and locates key points of the human body, such as the top of the head, neck, shoulder, elbow, wrist, hip, knee, and ankle. For example, when taking a photo, the user can pose a posture similar to a preset one.
3. Liveness Detection, can detect whether a user in a service scenario is a real person. This service is useful in various scenarios.
4. Hand Keypoint Detection, can detect 21 hand keypoints (including fingertips, knuckles, and wrists) and return positions of the keypoints. Currently, static image detection and real-time video stream detection are supported.
Conclusion
Except ML Kit, HMS still provides Awareness Kit, which provides your app with the ability to obtain contextual information including users' current time, location, behavior, audio device status, ambient light, weather, and nearby beacons. Scan Kit, which scans and parses all major 1D and 2D barcodes and generates QR codes, helping you quickly build barcode scanning functions into your apps. And Nearby Service, which allows apps to easily discover nearby devices and set up communication with them using technologies such as Bluetooth and Wi-Fi. The service provides Nearby Connection and Nearby Message APIs.
Related
This article is originally from HUAWEI Developer Forum.
Forum link: https://forums.developer.huawei.com/forumPortal/en/home
1 About This Document
Check out the machine learning service business introduction on the Huawei Developer website (https://developer.huawei.com/consumer/en/doc/development/HMS-Guides/ml-introduction-4)
It can be seen that Huawei HMS divides machine learning services into four major services: text-related services, language-related service, image-related services, and face/body-related services. One of them is text-related services. Including text recognition, document recognition, bank card recognition, general card recognition, what are the differences and associations between these sub-services?I will try to explain.
2 Application Scenario Differences
Text service SDKs are classified into device APIs and cloud APIs. Device APIs are processed and analyzed only on the device side and use the computing resource such as the CPU and GPU of the device. Cloud APIs need to send data to the cloud and use the server resources on the cloud for processing and analysis, all the services have device-side APIs except the document identification service, which requires a large amount of computing data to be processed on the cloud. To simplify the analysis scope, we only describe the device-side API service in this document.
2.1 Scenario Comparison
2.1.1 Text recognition: It is more like a versatile talent. Anything can be done, as long as it is text, it can be recognized.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Text OCR application scenarios
Text OCR does not provide a UI. The UI is implemented by developers.
2.1.2 Bank card identification: more like a partial student, only a certain subject is excellent.
default customized box is provided for bank cards. You can quickly extract bank card numbers by directly aligning with the box.
Bank card identification
2.1.3 General cards: Between the above two categories, with certain attainments in a certain field. Can extract text from all cards. In addition, a card alignment box is provided to prompt users to align cards to be identified.
2.2 How to Choose
Bank Card OCR are selected for identification bank cards. For other types of cards, general cards identification are used for identification. For other scenarios, text recognition is used.
3 Service Integration Differences
Compilation Dependency Differences
In order to facilitate everyone's understanding, first explain the following concepts:
Basic SDK APIs provided for developers. All APIs are opened through the basic SDK.
Plug-in The calibration box mentioned in the previous scene comparison summary provides an interface to verify the input quality of the image frame. If it does not meet the requirements, can prompt the user to reposition it.
Model package This is the core of Huawei's HMS ML Kit services. It contains a large number of samples input through a machine learning platform to learn and generate interpreter model files.
Compilation Dependency Summary
According to the preceding compilation dependency, all services need to integrate the corresponding basic SDK and model package. However, Bank Card recognition, and General Card recognition have corresponding plug-ins, which are the calibration boxes mentioned above. In terms of models, Bank Card recognition use a dedicated model package, while General Card recognition and text recognition uses a general model package.
Development Differences
First, let's see how to integrate the services. The detailed steps are not described here. You can view the development steps of the corresponding services on Huawei Developers.
Text recognition
Create an identifier. MLTextAnalyzer analyzer = MLAnalyzerFactory.getInstance().getLocalTextAnalyzer(setting);
Create a fram object and transfer the image bitmap. MLFrame frame = MLFrame.fromBitmap(bitmap);
Send the frame object to the recognizer for recognition. Task<MLText> task = analyzer.asyncAnalyseFrame(frame);
Result handling Task<MLText> task = analyzer.asyncAnalyseFrame(frame); task.addOnSuccessListener(new OnSuccessListener<MLText>() { @override public void onSuccess(MLText text) { // Recognition success. } }).addOnFailureListener(new OnFailureListener() { @override public void onFailure(Exception e) { // Recognition failed. } });
Bank Card recognition
Start the UI to identify the bank card. private void startCaptureActivity(MLBcrCapture.Callback callback) {
Rewrite the callback function to process the recognition result. private MLBcrCapture.Callback callback = new MLBcrCapture.Callback() { @override public void onSuccess(MLBcrCaptureResult bankCardResult){ // Identify the success. } };
General Card recognition
Start the interface to identify the general card. private void startCaptureActivity(Object object, MLGcrCapture.Callback callback)
Rewrite the callback function to process the recognition result. private MLGcrCapture.Callback callback = new MLGcrCapture.Callback() { @override public int onResult(MLGcrCaptureResult cardResult){ //Successful identification processing The return MLGcrCaptureResult.CAPTURE_STOP;// processing is complete, and the identification is exited. } };
Development Summary
According to the preceding comparison, the processing logic is similar except that no GUI is provided for text recognition. The images to be recognized are transmitted to the SDK and the recognition result is obtained through the callback function. The core difference is that the returned structured data is different.
According to the preceding comparison, the bank card recognition return the directly processed identification content. You can directly obtain the bank card number through the interface without considering how the content is extracted. However, the text recognition and general card recognition return the full identification information, it contains text content such as blocks, lines, and words. If you want to obtain the required information, you need to extract the full information that is identified. For example, you can use the regular expression to match consecutive x digits to identify a card number or match the content after a recognized keyword.
4 Technical Difference Analysis
Based on the preceding difference analysis, we can see that text-related services are different in scenarios, service integration, also has some association. For example, Text recognition and General Card recognition use the same general machine learning model. The following analyzes and explains the technical differences from the technical perspective. As described in the compilation dependency analysis, the basic SDK and model package need to be integrated for text services, and plug-ins need to be integrated for some services to generate calibration boxes. What is the model package? You may be familiar with machine learning. Machine learning is usually divided into the collection of training samples, feature extraction, data modeling, prediction, etc. The model is actually a "mapping function" learned through training samples, feature extraction and other actions in machine learning. In HUAWEI HMS ML Kit, this mapping function is not enough. It needs to be executed, which is called the interpreter framework. In addition, some algorithms need to perform pre-processing and post-processing on the image, for example, converting an image frame into a corresponding eigenvector. To facilitate understanding, the preceding content is collectively referred to as a model file. To enable these model files to run on the mobile phone, the model files further need to be optimized, for example, a running speed of the model files on the mobile phone terminal is optimized, and a size of the model files is reduced.
Differences and association analysis
Now, let's look at the differences and relationships between text services. To facilitate understanding, the following figure shows the differences and relationships between text services.
Text recognition
The training is carried out using a general text data set. His advantages are wide application range and high flexibility. As long as the text content can be recognized.
General card recognition
It is the same as the data set used for text recognition, so there is no difference between the model files, but a general card plug-in is added. The main function is to ensure that the user points the card to the center of the camera, and also recognizes the reflective and blurred images , if the requirements are not met, the user is prompted to readjust, so that the recognition accuracy of the card can be improved.
Bank Card OCR
The bank card recognition service uses the dedicated data training set of the bank card. We all know that the characters on the bank card are greatly different from those in common print. In addition, the characters are convex. If the general model is used, it is difficult to achieve high accuracy, the training uses the dedicated data sets of bank cards and ID cards to improve the accuracy of ID card and bank card identification. In addition, targeted pre-processing is performed for bank cards. For example, the image quality and tilt angle can be dynamically detected in real time, and an alignment box can be generated to restrict the location of cards, if the image is blurred, reflected, or not aligned with the calibration box, the user is prompted to re-align the image.
Notice:
Based on Huawei machine learning service, we will share a series of practical experience later. You can continue to pay attention to our forum.
Any questions about this, you can try to acquire answers from HUAWEI Developer Forum.
About This Document
Zxing is a common third-party open-source SDK. However, Zxing has the following defect: It only implements basic operations of scanning the QR code and does not support more complex scanning environments such as strong light, bending, and deformation. Currently, the mainstream practice is to optimize the source code based on Zxing. However, the optimization effect is still not ideal, and many people will spend a lot of time on the optimization.
The Huawei Scan Kit service provides convenient bar code and QR code scanning, parsing, and generation capabilities, helping developers quickly build the QR code scanning function in apps. Thanks to Huawei's long-term accumulation in the computer vision field, Huawei's unified barcode scanning service (Scan Kit) can detect and automatically amplify long-distance or small-sized codes, and optimize the identification of common complex barcode scanning scenarios (such as reflection, dark light, smudge, blur, and cylinder). Improves the QR code scanning success rate and user experience.
Now, let's compare the capabilities of Zxing and Huawei HMS Scan Kit from the following aspects:
Remote code scanning
Scan QR codes in complex scenarios
Scan the barcode at any angle.
multicode recognition
Integration difficulty
SDK Package Size
Cross-platform support
Comparison of long-distance code scanning
The success of long-distance QR code scanning depends on the QR code specifications (the more information is, the more difficult it is to identify) and the distance between the camera and the QR code. Due to the lack of automatic zoom-in optimization for Zxing, it is difficult to recognize the code when the code is less than 1/5 of the screen. The HMS Scan Kit has a pre-detection function, which can automatically amplify the QR code at a long distance even if the QR code cannot be identified by naked eyes.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
https://communityfile-drcn.op.hiclo...A1B389BE789A4FD4B3C70DF5A2582CD647359FF7D.gif
Comparison Conclusion: Scan Kit Wins
Comparison by Scanning Codes in Complex Scenarios
In complex scenarios, code scanning can be classified into reflection, dark light, smudge, blur, and cylinder scanning. In complex scenarios, the recognition effect of Zxing is poor. Complex scenarios are as follows:
These scenarios are common in daily life. For example, outdoor scenarios such as reflection, dark light, and smudge may occur. When a QR code is attached to a product, curved surfaces or even edges and corners may occur. When you walk and scan the QR code, you will also encounter the challenge of motion blur. The following figure shows the test comparison in these scenarios.
https://communityfile-drcn.op.hiclo...4F29750B35AD76B2B0183B45B46F9DFCAF24F3105.gif
https://communityfile-drcn.op.hiclo...9C42B539997998371B63EA6A0451ED87ED041DFEE.gif
https://communityfile-drcn.op.hiclo...CE580ECA67EEEBBCC9E7D7238AA5052D87B882D79.gif
Comparison Conclusion: Scan Kit Wins
Scan the QR code at any angle for comparison.
Currently, Zxing supports only forward scanning, that is, Zxing cannot identify the code with a certain angle. Scan Kit can easily achieve this. When the code deflection is within 10 degrees, Zxing can still have high recognition accuracy. However, when the code deflection exceeds 10 degrees, the recognition accuracy of Zxing decreases sharply. However, Scan Kit is not affected by the clip angle, and the recognition accuracy does not decrease.
https://communityfile-drcn.op.hiclo...4B65A180C69FEEF682C5E2069485FD83D8CA58C18.gif
Comparison Conclusion: Scan Kit Wins
Multi-code identification comparison
Multi-code identification helps identify multiple codes at a time in scenarios such as express delivery and supermarket checkout, improving service processing efficiency. In multi-code identification mode, the Scan Kit can identify five types of codes on the screen at the same time and return the corresponding types and values of all codes at a time.
Comparison Conclusion: Scan Kit Wins
SDK Package Size Comparison
The size of the Zxing package is about 500 KB, which is a satisfactory size. Scan Kit has two modes: Lite and Pro. In Lite mode, the package size is 700 KB. In Pro mode, the package size is 3.3 MB. If you use the table, you will have a clearer understanding.
These two modes are slightly different on non-Huawei phones. Therefore, if you are not sensitive to the package size on non-Huawei phones, try to select the Pro version. I have also performed tests on non-Huawei Lite versions, but the test results are slightly lower than those of the Pro version.
Conclusion: Zxing has advantages.
Platform Support Comparison
Zxing and Scan Kit support both iOS and Android platforms.
Conclusion: The score is even.
Comparison of Integration Modes
The integration mode of Zxing is relatively simple. It can be quickly integrated with SDK by only a few lines of code. However, in the actual product development process, the development of the product interface and auxiliary functions is also involved. However, Zxing does not provide the corresponding quick integration mode. The integration guide is available on the live network for a long time. Therefore, the development difficulty can be reduced. In summary, the first point in Zxing integration is that no default interface is available. Second, you need to achieve their own automatic amplification, flash and other functions.
Scan Kit provides multiple access modes, including single-code access, multi-code access, and customized access. The differences between the two integration modes are as follows:
The default layout is provided for the single-code access of Scan Kit cameras. In addition, functions such as automatic amplification and flash are preset. Developers do not need to manually configure these functions. The code integration volume is 5 lines, which is especially suitable for scenarios where quick integration and quick replacement of the QR code scanning function are required.
The customized access of Scan Kit allows users to design the layout by themselves. Only the basic functions and blank layout of scanning and decoding are provided. Users can design the layout based on their app style. However, they need to implement functions such as automatic zoom and flash. The corresponding technical documents can be found on the optical network of Huawei developers. However, compared with the single-code access, this access mode is more complicated.
The integration mode is as follows:
Zxing integration process
1. Create a project and import the Zxing module.
2. Add rights and dynamically apply for rights.
3. Copy the onActivity method.
4. Invoking the Decoding Function
5. Compile the UI and ensure that the UI is correctly displayed.
Scan Kit integration process
The default view mode provides two functions: camera QR code scanning and image-based QR code scanning. In this mode, developers do not need to develop the UI for QR code scanning.
The process is almost the same as that of Zxing.
1. Create a project and import the Scan Kit module.
2. Add permissions and dynamically apply for permissions.
3. Copy the onActivity method.
4. Invoke the decoding function.
The following uses the Default View Mode as an example to describe the integration procedure.
1. Create a project and add online dependency in the app/build.gradle file.
implementation'com.huawei.hms:scan:{version}'
2. Declare the QR code scanning page in the AndroidManifest.xml file of the calling module.
<!-Camera permission-->
<uses-permission android:name="android.permission.CAMERA" />
<!--Reading the file permission-->
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<!--Features-->
<uses-feature android:name="android.hardware.camera" />
<uses-feature android:name="android.hardware.camera.autofocus" />
3. Create QR code scanning options based on the site requirements.
HmsScanAnalyzerOptions options = new HmsScanAnalyzerOptions.Creator().setHmsScanTypes(HmsScan.QRCODE_SCAN_TYPE, HmsScan.DATAMATRIX_SCAN_TYPE).create();
4. Invoke the static method startScan of ScanUtil to start the Default View QR code scanning page.
ScanUtil.startScan(this, REQUEST_CODE_SCAN_ONE, options);
The comparison shows that Scan Kit and Zxing have the same dependency and permission application methods. However, Scan Kit can use the UI by default (with built-in flash, automatic focal length, and QR code import). Zxing needs to implement the UI by itself, then, manually complete these functions.
Comparison Conclusion: Scan Kit Wins
Technical Analysis
Why is Scankit better than Zxing? The following describes the technical analysis of Zxing and Scan Kit from the perspective of technical implementation principles.
Zxing Technology Analysis
Zxing uses the traditional recognition algorithm. It can detect codes by analyzing the codes from a certain angle. This algorithm allows only a certain degree of deformation, for example, the square code can be slightly skewed by less than 10 degrees, his pixels still fit the pattern, but if they're deformed too much or angularly too large, they can't detect the position of the code. The detection process of ZXing is classified into two types: one-dimensional code detection and two-dimensional code serial detection.
In one-dimensional code detection, Zxing uses a progressive scanning mechanism for feature recognition. Because one-dimensional code features are black-and-white crossover, when the black-and-white sequence with equal spacing of the class is identified, it is considered as a potential code. The length of the potential code is determined by finding the start bit and the end bit. Then, the sequence is sent to different one-dimensional code decoding modules for serial decoding, which takes a long time. When serial decoding fails, a failure message is displayed, and the failure time is also long. In addition, once the one-dimensional code has a wrinkle, rotation, or deformation, a sequence that meets a specific requirement cannot be found through progressive scanning, that is, the one-dimensional code cannot be detected in a complex condition.
In two-dimensional code detection, Zxing uses different detection algorithms for different two-dimensional codes. For example, the most common QR code has three location detection patterns. Therefore, Zxing still uses the interlaced scanning mode to find the features of the location detection pattern. Once the features whose black-and-white ratio is 1:1:3:1:1 are found, that is, a central point of the position detection graph is used as a reference point to perform affine transformation, so that the corrected picture is sent to the QR decoding module. The positioning point of the QR code has a function of correcting rotation, and therefore can be well adapted to a rotation situation. However, Zxing is completely unable to process cases such as partial blocking, deformation, and contaminating and reflecting of the positioning point. As shown in the figure, the detection position detection graph is the most important step for detecting whether the two-dimensional code is successfully detected. Once a location fails to be detected, the two-dimensional code cannot be detected.
Technical Analysis of Huawei HMS Scan Kit
Scan Kit uses the deep learning algorithm, which is spatially invariant. By training detectors of corresponding code types, Scan Kit can quickly find all required codes.
Actual process:
The bar code detection module and angle prediction module use the deep learning model.
Barcode detection: The serial process of separate detection of two-dimensional codes of one-dimensional codes in Zxing is no longer restricted. A trained detector can be used to directly obtain the code pattern and corresponding position. The bar code can be accurately sent to the corresponding decoding module through one detection, and a separate serial decoding process is no longer required. Because decoding includes a series of operations with high overheads such as skipping scanning, and information of different codes cannot be shared, this operation greatly reduces an end-to-end delay, and avoids a lot of repeated and unnecessary calculation overheads.
Angle prediction: The corresponding three-bit angle of the code is returned for radiographic transformation. In practice, the core of barcode detection is to accurately obtain boundary points. After being converted into binary images, the images are sent to the decoding module, but the decoding effect is still poor. This is also the most important step to solve the bar code identification in complex scenarios.
To sum up, the deep learning changes the serial detection and decoding process of the barcode of Zxing to a parallel process. In addition, the three-digit angle value of the barcode is returned. After the affine change, the aligned standard front barcode is obtained. This greatly improves the barcode detection success rate and greatly reduces the latency.
More information
Demos, sample codes, and development documents are available on the Huawei developer official website.
Demo and sample code:
https://developer.huawei.com/consumer/en/doc/development/HMS-Examples/scan-sample-code4
Development guide:
https://developer.huawei.com/consumer/en/doc/development/HMS-Guides/scan-introduction-4
API reference:
https://developer.huawei.com/consumer/en/doc/development/HMS-References/scan-apioverview
To be supplemented
Based on Huawei machine learning service, we will share a series of practical experience later. You can continue to pay attention to it.
More information like this, you can visit HUAWEI Developer Forum
Original link: https://forums.developer.huawei.com/forumPortal/en/topicview?tid=0201253487604250240&fid=0101187876626530001
1 About This Document
Check out the machine learning service business introduction on the Huawei Developer website
It can be seen that Huawei HMS divides machine learning services into four major services: text-related services, language-related service, image-related services, and face/body-related services. One of them is text-related services. Including text recognition, document recognition, bank card recognition, general card recognition, what are the differences and associations between these sub-services?I will try to explain.
2 Application Scenario Differences
First, let’s look at the sub-services of the text-related services and the scenario differences.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Text service SDKs are classified into device APIs and cloud APIs. Device APIs are processed and analyzed only on the device side and use the computing resource such as the CPU and GPU of the device. Cloud APIs need to send data to the cloud and use the server resources on the cloud for processing and analysis, all the services have device-side APIs except the document identification service, which requires a large amount of computing data to be processed on the cloud. To simplify the analysis scope, we only describe the device-side API service in this document.
2.1 Scenario Comparison
As shown in the preceding table, the application scenarios of different capabilities are different.
2.1.1 Text recognition: It is more like a versatile talent. Anything can be done, as long as it is text, it can be recognized..
Text OCR application scenarios
Text OCR does not provide a UI. The UI is implemented by developers.
2.1.2 Bank card identification: more like a partial student, only a certain subject is excellent.
default customized box is provided for bank cards. You can quickly extract bank card numbers by directly aligning with the box.
Bank card identification
2.1.3 General cards: Between the above two categories, with certain attainments in a certain field. Can extract text from all cards. In addition, a card alignment box is provided to prompt users to align cards to be identified.
General card identification
2.2 How to Choose
Bank Card OCR are selected for identification bank cards. For other types of cards, general cards identification are used for identification. For other scenarios, text recognition is used.
3 Service Integration Differences
Compilation Dependency Differences
In order to facilitate everyone’s understanding, first explain the following concepts:
Basic SDK APIs provided for developers. All APIs are opened through the basic SDK.
Plug-in The calibration box mentioned in the previous scene comparison summary provides an interface to verify the input quality of the image frame. If it does not meet the requirements, can prompt the user to reposition it.
Model package This is the core of Huawei’s HMS ML Kit services. It contains a large number of samples input through a machine learning platform to learn and generate interpreter model files.
The following table summarizes the compilation dependencies of different services.
Compilation Dependency Summary
According to the preceding compilation dependency, all services need to integrate the corresponding basic SDK and model package. However, Bank Card recognition, and General Card recognition have corresponding plug-ins, which are the calibration boxes mentioned above. In terms of models, Bank Card recognition use a dedicated model package, while General Card recognition and text recognition uses a general model package.
Development Differences
First, let’s see how to integrate the services. The detailed steps are not described here. You can view the development steps of the corresponding services on the Developer website.
https://developer.huawei.com/consumer/en/doc/development/HMS-Guides/ml-introduction-4
The following describes the main development procedure of the corresponding service:
Text recognition
Create an identifier. MLTextAnalyzer analyzer = MLAnalyzerFactory.getInstance().getLocalTextAnalyzer(setting);
Create a fram object and transfer the image bitmap. MLFrame frame = MLFrame.fromBitmap(bitmap);
Send the frame object to the recognizer for recognition. Task<MLText> task = analyzer.asyncAnalyseFrame(frame);
Result handling Task<MLText> task = analyzer.asyncAnalyseFrame(frame); task.addOnSuccessListener(new OnSuccessListener<MLText>() { @override public void onSuccess(MLText text) { // Recognition success. } }).addOnFailureListener(new OnFailureListener() { @override public void onFailure(Exception e) { // Recognition failed. } });
Bank Card recognition
Start the UI to identify the bank card. private void startCaptureActivity(MLBcrCapture.Callback callback) {
Rewrite the callback function to process the recognition result. private MLBcrCapture.Callback callback = new MLBcrCapture.Callback() { @override public void onSuccess(MLBcrCaptureResult bankCardResult){ // Identify the success. } };
General Card recognition
Start the interface to identify the general card. private void startCaptureActivity(Object object, MLGcrCapture.Callback callback)
Rewrite the callback function to process the recognition result. private MLGcrCapture.Callback callback = new MLGcrCapture.Callback() { @override public int onResult(MLGcrCaptureResult cardResult){ //Successful identification processing The return MLGcrCaptureResult.CAPTURE_STOP;// processing is complete, and the identification is exited. } };
Development Summary
According to the preceding comparison, the processing logic is similar except that no GUI is provided for text recognition. The images to be recognized are transmitted to the SDK and the recognition result is obtained through the callback function. The core difference is that the returned structured data is different. To facilitate understanding, the following tables are provided:
Return the content summary.
According to the preceding comparison, the bank card recognition return the directly processed identification content. You can directly obtain the bank card number through the interface without considering how the content is extracted. However, the text recognition and general card recognition return the full identification information, it contains text content such as blocks, lines, and words. If you want to obtain the required information, you need to extract the full information that is identified. For example, you can use the regular expression to match consecutive x digits to identify a card number or match the content after a recognized keyword. Based on the preceding analysis, the development difficulty comparison is as follows:
Development difficulty comparison summary
4 Technical Difference Analysis
Based on the preceding difference analysis, we can see that text-related services are different in scenarios, service integration, also has some association. For example, Text recognition and General Card recognition use the same general machine learning model. The following analyzes and explains the technical differences from the technical perspective. As described in the compilation dependency analysis, the basic SDK and model package need to be integrated for text services, and plug-ins need to be integrated for some services to generate calibration boxes. What is the model package? You may be familiar with machine learning. Machine learning is usually divided into the collection of training samples, feature extraction, data modeling, prediction, etc. The model is actually a “mapping function” learned through training samples, feature extraction and other actions in machine learning. In HUAWEI HMS ML Kit, this mapping function is not enough. It needs to be executed, which is called the interpreter framework. In addition, some algorithms need to perform pre-processing and post-processing on the image, for example, converting an image frame into a corresponding eigenvector. To facilitate understanding, the preceding content is collectively referred to as a model file. To enable these model files to run on the mobile phone, the model files further need to be optimized, for example, a running speed of the model files on the mobile phone terminal is optimized, and a size of the model files is reduced.
Differences and association analysis
Now, let’s look at the differences and relationships between text services. To facilitate understanding, the following figure shows the differences and relationships between text services.
Text recognition integration mode
Text recognition
The training is carried out using a general text data set. His advantages are wide application range and high flexibility. As long as the text content can be recognized.
General card recognition
It is the same as the data set used for text recognition, so there is no difference between the model files, but a general card plug-in is added. The main function is to ensure that the user points the card to the center of the camera, and also recognizes the reflective and blurred images , if the requirements are not met, the user is prompted to readjust, so that the recognition accuracy of the card can be improved.
Bank Card OCR
The bank card recognition service uses the dedicated data training set of the bank card. We all know that the characters on the bank card are greatly different from those in common print. In addition, the characters are convex. If the general model is used, it is difficult to achieve high accuracy, the training uses the dedicated data sets of bank cards and ID cards to improve the accuracy of ID card and bank card identification. In addition, targeted pre-processing is performed for bank cards. For example, the image quality and tilt angle can be dynamically detected in real time, and an alignment box can be generated to restrict the location of cards, if the image is blurred, reflected, or not aligned with the calibration box, the user is prompted to re-align the image.
5 Summary
Based on the preceding analysis, the conclusion is as follows:
How, after reading this article, what feeling, come to express your opinion quickly!
DemoGithub address: https://github.com/HMS-MLKit/HUAWEI-HMS-MLKit-Sample
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Article Introduction
In this article, we will show how to integrate Huawei ML Kit (Face Detection) and powerful AI engine MindSpore Lite in an android application to detect in realtime either the users are wearing masks or not. Due to Covid-19, the face mask is mandatory in many parts of the world. Considering this fact, the use case has been created with an option to remind the users with audio commands.
Huawei ML Kit (Face Detection)
Huawei Face Detection service (offered by ML Kit) detects 2D and 3D face contours. The 2D face detection capability can detect features of your user's face, including their facial expression, age, gender, and wearing. The 3D face detection capability can obtain information such as the face keypoint coordinates, 3D projection matrix, and face angle. The face detection service supports static image detection, camera stream detection, and cross-frame face tracking. Multiple faces can be detected at a time.
Following are the important features supported by Face Detection service:
MindSpore Lite
MindSpore Lite is an ultra-fast, intelligent, and simplified AI engine that enables intelligent applications in all scenarios, provides E2E solutions for users, and helps users enable AI capabilities. Following are some of common scenarios to use MindSpore:
For this article, we implemented Image classification. The camera stream yield frames. We then process it to detect faces using ML Kit (Face Detection). Once, we have the faces, we process our trained MindSpore lite engine to detect either the face is With or Without Mask.
Pre-Requisites
Before getting started, we need to train our model and generate .ms file. For that, I used HMS Toolkit plugin of Android Studio. If you are migrating from Tensorflow, you can convert your model from .tflite to .ms using the same plugin.
The dataset used for this article is from Kaggle (link is provided in the references). It provided 5000 images for both cases. It also provided some testing and validation images to test our model after being trained.
Step 1: Importing the images
To start the training, please select HMS > Coding Assistance > AI > AI Create > Image Classification. Import both folders (WithMask and WithoutMask) in the Train Data description. Select the output folder and train parameters based on your requirements. You can read more about this in the official documentation (link is provided in the references).
Step 2: Creating the Model
When you are ready, click on Create Model button. It will take some time depending upon your machine. You can check the progress of the training and validation throughout the process.
Once the process is completed, you will see the summary of the training and validation.
Step 3: Testing the Model
It is always recommended to test your model before using it practically. We used the provided test images in the dataset to complete the testing manually. Following were the test results for our dataset:
After testing, add the generated .ms file along with labels.txt in the assets folder of your project. You can also generate Demo Project from the HMS Toolkit plugin.
Development
Since it is on device capability, we don't need to integrate HMS Core or import agconnect-services.json in our project. Following are the major steps of development for this article:
Read full article.
7.2: Final Results
Conclusion
Building smart solutions with AI capabilities is much easy with HUAWEI mobile services (HMS) ML Kit and AI engine MindSpore Lite. Considering different situations, the use cases can be developed for all industries including but not limited to transportation, manufacturing, agriculture and construction.
Having said that, we used Face Detection ML Kit and AI engine MindSpore to develop Face Mask detection feature. The on-device open capabiltiies of HMS provided us highly efficient and optimized results. Individual or Multiple users without Mask can be detected from far in realtime. This is applicable to be used in public places, offices, malls or at any entrance.
Tips & Tricks
Make sure to add all the permissions like WRITE_EXTERNAL_STORAGE, READ_EXTERNAL_STORAGE, CAMERA, ACCESS_NETWORK_STATE, ACCESS_WIFI_STATE.
Make sure to add aaptOptions in the app-level build.gradle file aftering adding .ms and labels.txt files in the assets folder. If you miss this, you might get Load model failed.
Always use animation libraries like Lottie to enhance UI/UX in your application. We also used OwlBottomSheet for the help bottom sheet.
The performance of model is directly propotional to the number of training inputs. Higher the number of inputs, higher will be accuracy to yield better results. In our article, we used 5000 images for each case. You can add as many as possible to improve the accuracy.
MindSpore Lite provides output as callback. Make sure to design your use case while considering this fact.
If you have Tensorflow Lite Model file (.tflite), you can convert it to .ms using the HMS Toolkit plugin.
HMS Toolkit plugin is very powerful. It supports converting MindSpore Lite and HiAI models. MindSpore Lite supports TensorFlow Lite and Caffe and HiAI supports TensorFlow, Caffe, CoreML, PaddlePaddle, ONNX, MxNet and Keras.
If you want to use Tensorflow with HMS ML Kit, you can also implement that. I have created another demo where I put the processing engine as dynamic. You can check the link in the references section.
References
HUAWEI ML Kit (Face Detection) Official Documentation:
https://developer.huawei.com/consum...-Guides-V5/face-detection-0000001050038170-V5
HUAWEI HMS Toolkit AI Create Official Documentation:
https://developer.huawei.com/consumer/en/doc/development/Tools-Guides/ai-create-0000001055252424
HUAWEI Model Integration Official Documentation:
https://developer.huawei.com/consum...ols-Guides/model-integration-0000001054933838
MindSpore Lite Documentation:
Using MindSpore on Mobile and IoT — MindSpore Lite r1.1 documentation
MindSpore Lite Code Repo:
MindSpore/mindspore
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
gitee.com
Kaggle Dataset Link:
Face Mask Detection ~12K Images Dataset
12K Images divided in training testing and validation directories.
www.kaggle.com
Lottie Android Documentation:
Lottie Docs
Lottie is a library for Android, iOS, Web, and Windows that parses Adobe After Effects animations exported as json with Bodymovin and renders them natively on mobile and on the web
airbnb.io
Tensorflow as a processor with HMS ML Kit:
https://github.com/yasirtahir/Huawe...icodelabs/fragments/mlkit/facemask/tensorflow
Github Code Link:
https://github.com/yasirtahir/DetectFaceMask
Read full article.
Nice and useful to know it in the time of COVID-19.
How much accuracy it provides?
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Facial recognition technology is quickly implemented in fields such as finance and healthcare, which has in turn raised issues involving cyber security and information leakage, along with growing user expectations for improved app stability and security.
HMS Core ML Kit strives to help professionals from various industries work more efficiently, while also helping them detect and handle potential risks in advance. To this end, ML Kit has been working on improving its liveness detection capability. Using a training set with abundant samples, this capability has obtained an improved defense feature against presentation attacks, a higher pass rate when the recognized face is of a real person, and an SDK with heightened security. Recently, the algorithm of this capability has become the first on-device, RGB image-based liveness detection algorithm that has passed the comprehensive security assessments of China Financial Certification Authority (CFCA).
CFCA is a national authority of security authentication and a critical national infrastructure of financial information security, which is approved by the People's Bank of China (PBOC) and State Information Security Administration. After passing the algorithm assessment and software security assessment of CFCA, ML Kit's liveness detection has obtained the enhanced level certification of facial recognition in financial payment, a level that is established by the PBOC.
The trial regulations governing the secure implementation of facial recognition technology in offline payment were published by the PBOC in January 2019. Such regulations impose higher requirements on the performance indicators of liveness detection, as described in the table below. To obtain the enhanced level certification, a liveness detection algorithm must have an FAR less than 0.1% and an FRR less than 1%.
LevelDefense Against Presentation AttacksBasicWhen LDAFAR is 1%, LPFRR is less than or equal to 1%.EnhancedWhen LDAFAR is 0.1%, LPFRR is less than or equal to 1%.
Requirements on the performance indicators of a liveness detection algorithm
The liveness detection capability enables an app to have the facial recognition function. Specifically speaking, the capability requires a user to perform different actions, such as blinking, staring at the camera, opening their mouth, turning their head to the left or right, and nodding. The capability then uses technologies such as facial keypoint recognition and face tracking to compare two continuous frames, and determine whether the user is a real person in real time. Such a capability effectively defends against common attack types like photo printing, video replay, face masks, and image recapture. This helps distinguish frauds, protecting users.
Liveness detection from ML Kit can deliver a user-friendly interactive experience: During face detection, the capability provides prompts (indicating the lighting is too dark, the face is blurred, a mask or pair of sunglasses are blocking the view, and the face is too close to or far away from the camera) to help users complete face detection smoothly.
To strictly comply with the mentioned regulations, CFCA has come up with an extensive assessment system. The assessments that liveness detection has passed cover many items, including but not limited to data and communication security, interaction security, code and component security, software runtime security, and service function security.
Face samples used for assessing the capability are very diverse, originating from a range of different source types, such as images, videos, masks, head phantoms, and real people. The samples also take into consideration factors like the collection device type, sample textile, lighting, facial expression, and skin tone. The assessments cover more than 4000 scenarios, which echo the real ones in different fields. For example, remote registration of a financial service, hotel check-in, facial recognition-based access control, identity authentication on an e-commerce platform, live-streaming on a social media platform, and online examination.
In over 50,000 tests, ML Kit's liveness detection presented its certified defense capability that delivers protection against different attack types, such as people with a face mask, a face picture whose keypoint parts (like the eyes and mouth) are hollowed out, a frame or frames containing a face extracted from an HD video, a silicone facial mask, a 3D head phantom, and an adversarial example. The capability can accurately recognize and quickly intercept all the presentation attacks, regardless of whether the form is 2D or 3D.
Successfully passing the CFCA assessments is proof that the capability meets the standards of a national authority and of its compliance with security regulations.
The capability has so far been widely adopted by the internal core services of Huawei and the services (account security, identity verification, financial risk control, and more) of its external customers in various fields. Those are where liveness detection plays its role in ensuring user experience and information security in an all-round way.
Moving forward, ML Kit will remain committed to exploring cutting-edge AI technology that improves its liveness detection's security, pass rate, and usability and to better helping developers efficiently create tailored facial recognition apps.
Get more information at:
Home page of HMS Core ML Kit
Development Guide of HMS Core ML Kit