1. Introduction to Virtual Human
Virtual Human is a service that utilizes cutting-edge AI technologies, including image vision, emotion generation, voice cloning, and semantic understanding, and has a wide range of applications, spanning news broadcasting, financial customer service, and virtual gaming.
Application scenarios:
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
2. ML Kit's Virtual Human Service
ML Kit's Virtual Human service is backed by core Huawei AI technologies, such as image processing, text to speech, voice cloning, and semantic understanding, and provides innovative, cost-effective authoring modes for education, news, and multimedia production enterprises. Virtual Human service features a number of crucial advantages over other similar services, including the following:
Ultra-HD 4K cinematic effects
Supports large-screen displays. The details and textures of the entire body are rendered in the same definition.
Generates images that fit seamlessly with the real background, and achieve trackless fusion under HD resolution.
Generates detailed lip features, distinct lipstick reflection, and lifelike textures.
Produces clear and visible teeth, and true-to-life textures.
Hyper-real synthesis effects
True restoration of teeth (no painting involved), lips, and even lipstick reflections.
True restoration of facial features such as illumination, contrasts, shadows, and dimples.
Seamless connections between the generated texture for the mouth and the real texture.
Intricate animation effects that outperform those for 3D live anchors.
Comparing with services provided by other enterprises.
3. ML Kit's Virtual Human Video Display
As shown below, Virtual Human generates ultra-HD video effects, provides for clearer enunciation, and exercises better control over key details, such as lip features, lipstick reflections, actual pronunciation and illumination.
4. Integrating ML Kit's Virtual Human Service
4.1 Integration Process
4.1.1 Submitting the Text of the Video for Generation
Call the customized API for converting text into the virtual human video, and pass the required configurations (specified by parameter config) and text (specified by parameter data) to the backend for processing through the API. First, check the length of the passed text. The maximum length of the Chinese text is 1,000 characters, and that for the English text is 3,000 characters. Perform the non-null check on the passed configurations, then submit the text and configurations to convert the text into audio
4.1.2 Using the Data Provided by an Asynchronous Scheduled Task
Call the text-to-speech algorithm to convert text into the video, based on the data provided by the asynchronous scheduled task, and synthesize the video with the previously obtained audio.
4.1.3 Checking Whether the Text Has been Successfully Converted
Call the API for querying the results of converting text into the virtual human video, to check whether the text has been successfully converted. If the execution is complete, the video link will be returned.
4.1.4 Accessing the Videos via the Link
Access the generated video through the link returned by the API, to query the results of converting text into the virtual human video.
4.2 Main APIs for Integration
4.2.1 Customized API for Converting Text into the Virtual Human Video
URL: http://10.33.219.58:8888/v1/vup/text2vedio/submit
Request parameters
Main functions:
Input the customized API for converting text into the virtual human video. The API is asynchronous. Currently, Virtual Human can only complete the conversion using an offline mode, a process that takes time to complete. The conversion results can be queried via the API for querying the results of converting text into the virtual human video. If the submitted text has been synthesized, you can return and play the video directly.
Main logic:
Convert the text into audio based on the text and configurations to be synthesized, passed by the frontend. Execute multithreading asynchronously, generate the video that meets pronunciation requirements based on the text-to-speech algorithm, and then compound the video with audio to generate the virtual human video. If the submitted text has been synthesized, you can return and play the video directly.
4.2.2 API for Querying the Results of Converting Text into the Virtual Human Video
URL: http://10.33.219.58:8888/v1/vup/text2vedio/query
Request parameters
Main functions:
Query the conversion status in batches, based on the submitted text IDs.
Main logic:
Query the synthesis status of the video through textlds (the ID list of the synthesized text passed by the frontend), save the obtained status results to a set as the output parameter, and insert the parameter to the returned request. If the requested text has been synthesized, you can return and play the video directly.
4.2.3 API for Taking the Virtual Human Video Offline in Batches
URL: http://10.33.219.58:8888/v1/vup/text2vedio/offline
Request parameters
Main functions:
Bring the video offline in batches, based on the submitted text ID.
Main logic:
Change the status of the video corresponding to the ID in the array to offline through textlds (the ID array of the synthesized text transmitted by the frontend), and then delete the video. The offline video is not capable of being played.
4.3 Main Functions of ML Kit's Virtual Human
ML Kit's Virtual Human service has a myriad of powerful functions.
1. Dual language support: Virtual Human currently supports Chinese and English, and thus text in either Chinese or English can be used as audio data.
2. Multiple virtual anchors: The service supports up to four virtual anchors, one Chinese female voice, one English female voice, and two English male voices.
3. Picture-in-picture video: Picture-in-picture video play, in essence, small-window video playback, is supported as well. When playing a video in picture-in-picture mode, the video window moves in accordance with the rest of the screen. Users are able to view the text while playing the video, and can drag the video to any location on the screen for easier reading.
4. Adjustable speech speed, volume, and tone: The speech speed, volume, and tone can be adjusted at will, to meet a wide range of user needs.
5. Multi-background settings: The service allows you to choose from diverse backgrounds for virtual anchors. There are currently three built-in backgrounds provided: transparent, green-screen, and technological. You can also upload an image to apply a customized background.
6. Subtitles: Virtual Human is capable of automatically generating Chinese, English, and bilingual subtitles.
7. Multi-layout settings: You can change the position of the virtual anchors on the screen (left, right, or middle of the screen) by setting parameters. You can also determine the size of the virtual anchors and choose to place either their upper body or entire body in view. In addition, you are free to set a channel logo, its position on the screen, as well as the video to be played. This ensures that the picture-in-picture effect achieves a bona fide news broadcast experience.
Picture-in-picture effect:
5. Final Thoughts
As a developer, after using ML Kit's Virtual Human service to generate a video, I was shocked at its capabilities, especially the picture-in-picture capability, which helped me generate real news broadcast effects. It has got me wondering whether virtual humans will soon replace real anchors.
To learn more, please visit the official website:
Reference
Official website of Huawei Developers
Development Guide
HMS Core official community on Reddit
Demo and sample code
Discussions on Stack Overflow
Related
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
ML Kit:
Added the face verification service, which compares captured faces with existing face records to generate a similarity value, and then determines whether the faces belong to the same person based on the value. This service helps safeguard your financial services and reduce security risks.
Added the capability of recognizing Vietnamese ID cards.
Reduced hand keypoint detection delay and added gesture recognition capabilities, with support for 14 gestures. Gesture recognition is widely used in smart household, interactive entertainment, and online education apps.
Added on-device translation support for 8 new languages, including Hungarian, Dutch, and Persian.
Added support for automatic speech recognition (ASR), text to speech (TTS), and real time transcription services in Chinese and English on all mobile phones, and French, Spanish, German, and Italian on Huawei and Honor phones.
Other updates: Optimized image segmentation, document skew correction, sound detection, and the custom model feature; added audio file transcription support for uploading multiple long audio files on devices.
Learn More
Nearby Service:
Added Windows to the list of platforms that Nearby Connection supports, allowing you to receive and transmit data between Android and Windows devices. For example, you can connect a phone as a touch panel to a computer, or use a phone to make a payment after placing an order on the computer.
Added iOS and MacOS to the list of systems that Nearby Message supports, allowing you to receive beacon messages on iOS and MacOS devices. For example, users can receive marketing messages of a store with beacons deployed, after entering the store.
Learn More
Health Kit:
Added the details and statistical data type for medium- and high-intensity activities.
Learn More
Scene Kit:
Added fine-grained graphics APIs, including those of classes for resources, scenes, nodes, and components, helping you realize more accurate and flexible scene rendering.
Shadow features: Added support for real-time dynamic shadows and the function of enabling or disabling shadow casting and receiving for a single model.
Animation features: Added support for skeletal animation and morphing, playback controller, and playback in forward, reverse, or loop mode.
Added support for asynchronous loading of assets and loading of glTF files from external storage.
Learn More
Computer Graphics Kit:
Added multithreaded rendering capability to significantly increase frame rates in scenes with high CPU usage.
Learn More
Made necessary updates to other kits. Learn More
New Resources
Analytics Kit:
Sample Code: Added the Kotlin sample code to hms-analytics-demo-android and the Swift sample code to hms-analytics-demo-ios.
Learn More
Dynamic Tag Manager:
Sample Code: Added the iOS sample code to hms-dtm-demo-ios.
Learn More
Identity Kit:
Sample Code: Added the Kotlin sample code to hms-identity-demo.
Learn More
Location Kit:
Sample Code: Updated methods for checking whether GNSS is supported and whether the GNSS switch is turned on in hms-location-demo-android-studio; optimized the permission verification process to improve user experience.
Learn More
Map Kit:
Sample Code: Added guide for adding dependencies on two fallbacks to hms-mapkit-demo, so that Map Kit can be used on non-Huawei Android phones and in other scenarios where HMS Core (APK) is not available.
Learn More
Site Kit:
Sample Code: Added the strictBounds attribute for NearbySearchRequest in hms-sitekit-demo, which indicates whether to strictly restrict place search in the bounds specified by location and radius, and added the attribute for QuerySuggestionRequest and SearchFilter for showing whether to strictly restrict place search in the bounds specified by Bounds.
Learn More
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
New FeaturesAnalytics Kit
Released the function of saving churned users as an audience to the retention analysis function. This function enables multi-dimensional examination on churned users, thus contributing to making targeted measures for winning back such users.
Changed Audience analysis to Audience insight that has two submenus: User grouping and User profiling. User grouping allows for segmenting users into different audiences according to different dimensions, and user profiling provides audience features like profiles and attributes to facilitate in-depth user analysis.
Added the Page access in each time segment report to Page analysis. The report compares the numbers of access times and users in different time segments. Such vital information gives you access to your users' product usage preferences and thus helps you seize operations opportunities.
Added the Page access in each time segment report to Page analysis. The report compares the numbers of access times and users in different time segments. Such vital information gives you access to your users' product usage preferences and thus helps you seize operations opportunities.
Learn more>>
3D Modeling Kit
Debuted the auto rigging capability. Auto rigging can load a preset motion to a 3D model of a biped humanoid, by using the skeleton points on the model. In this way, the capability automatically rigs and animates such a biped humanoid model, lowering the threshold of 3D animation creation and making 3D models appear more interesting.
Added the AR-based real-time guide mode. This mode accurately locates an object, provides real-time image collection guide, and detects key frames. Offering a series of steps for modeling, the mode delivers a fresh, interactive modeling experience.
Learn more>>
Video Editor Kit
Offered the auto-smile capability in the fundamental capability SDK. This capability detects faces in the input image and then lightens up the faces with a smile (closed-mouth or open-mouth).
Supplemented the fundamental capability SDK with the object segmentation capability. This AI algorithm-dependent capability separates the selected object from a video, to facilitate operations like background removal and replacement.
Learn more>>
ML Kit
Released the interactive biometric verification service. It captures faces in real time and determines whether a face is of a real person or a face attack (like a face recapture image, face recapture video, or a face mask), by checking whether the specified actions are detected on the face. This service delivers a high-level of security, making it ideal in face recognition-based payment scenarios.
Improved the on-device translation service by supporting 12 more languages, including Croatian, Macedonian, and Urdu. Note that the following languages are not yet supported by on-device language detection: Maltese, Bosnian, Icelandic, and Georgian.
Learn more>>
Audio Editor Kit
Added the on-cloud REST APIs for the AI dubbing capability, which makes the capability accessible on more types of devices.
Added the asynchronous API for the audio source separation capability. On top of this, a query API was added to maintain an audio source separation task via taskId. This serves as the solution to the issue that in some scenarios, a user failed to find their previous audio source separation task when they exited and re-opened the app, because of the long time taken by an audio source separation task to complete.
Enriched on-device audio source separation with the following newly supported sound types: accompaniment, bass sound, stringed instrument sound, brass stringed instrument sound, drum sound, accompaniment with the backing vocal voice, and lead vocalist voice.
Learn more>>
Health Kit
Added two activity record data types: apnea training and apnea testing in diving, and supported the free diving record data type on the cloud-side service, giving access to the records of more activity types.
Added the sampling data type of the maximum oxygen uptake to the device-side service. Each data record indicates the maximum oxygen uptake in a period. This sampling data type can be used as an indicator of aerobic capacity.
Added the open atomic sampling statistical data type of location to the cloud-side service. This type of data records the GPS location of a user at a certain time point, which is ideal for recording data of an outdoor sport like mountain hiking and running.
Opened the activity record segment statistical data type on the cloud-side service. Activity records now can be collected by time segment, to better satisfy requirements on analysis of activity records.
Added the subscription of scenario-based events and supported the subscription of total step goal events. These fresh features help users set their running/walking goals and receive push messages notifying them of their goals.
Learn more>>
Video Kit
Released the HDR Vivid SDK that provides video processing features like opto-electronic transfer function (OETF), tone mapping, and HDR2SDR. This SDK helps you immerse your users with high-definition videos that get rid of overexposure and have clear details even in dark parts of video frames.
Added the capability for killing the WisePlayer process. This capability frees resources occupied by WisePlayer after the video playback ends, to prevent WisePlayer from occupying resources for too long.
Added the capability to obtain the list of video source thumbnails that cover each frame of the video source — frame by frame, time point by time point — when a user slowly drags the video progress bar, to improve video watching experience.
Added the capability to accurately play video via dragging on the progress bar. This capability can locate the time point on the progress bar, to avoid the inaccurate location issue caused by using the key frame for playback location.
Learn more>>
Scene Kit
Added the 3D fluid simulation component. This component allows you to set the boundaries and volume of fluid (VOF), to create interactive liquid sloshing.
Introduced the dynamic diffuse global illumination (DDGI) plugin. This plugin can create diffuse global illumination in real time when the object position or light source in the scene changes. In this way, the plugin delivers a more natural-looking rendering effect.
Learn more>>
New ResourcesMap Kit
For the hms-mapkit-demo sample code: Added the MapsInitializer.initialize API that is used to initialize the Map SDK before it can be used.
Added the public layer (precipitation map) in the enhanced SDK.
Go to GitHub>>
Site Kit
For the hms-sitekit-demo sample code: Updated the Gson version to 2.9.0 and optimized the internal memory.
Go to GitHub >>
Game Service
For the hms-game-demo sample code: Added the configuration of removing the dependency installation boost of HMS Core (APK), and supported HUAWEI Vision.
Go to GitHub >>
Made necessary updates to other kits. Learn more >>
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
What do you usually do if you like a particular cartoon character? Buy a figurine of it?
That's what most people would do. Unfortunately, however, it is just for decoration. Therefore, I tried to create a way of sending these figurines back to the virtual world — In short, I created a virtual but moveable 3D model of a figurine.
This is done with auto rigging, a new capability of HMS Core 3D Modeling Kit. It can animate a biped humanoid model that can even interact with users.
Check out what I've created using the capability.
What a cutie.
The auto rigging capability is ideal for many types of apps when used together with other capabilities. Take those from HMS Core as an example:
Audio-visual editing capabilities from Audio Editor Kit and Video Editor Kit. We can use auto rigging to animate 3D models of popular stuffed toys that can be livened up with proper dances, voice-overs, and nursery rhymes, to create educational videos for kids. With the adorable models, such videos can play a better role in attracting kids and thus imbuing them with knowledge.
The motion creation capability. This capability, coming from 3D Engine, is loaded with features like real-time skeletal animation, facial expression animation, full body inverse kinematic (FBIK), blending of animation state machines, and more. These features help create smooth 3D animations. Combining models animated by auto rigging and the mentioned features, as well as numerous other 3D Engine features such as HD rendering, visual special effects, and intelligent navigation, is helpful for creating fully functioning games.
AR capabilities from AR Engine, including motion tracking, environment tracking, and human body and face tracking. They allow a model animated by auto rigging to appear in the camera display of a mobile device, so that users can interact with the model. These capabilities are ideal for a mobile game to implement model customization and interaction. This makes games more interactive and fun, which is illustrated perfectly in the image below.
As mentioned earlier, the auto rigging capability supports only the biped humanoid object. However, I think we can try to add two legs to an object (for example, a candlestick) for auto rigging to animate, to recreate the Be Our Guest scene from Beauty and the Beast.
How It WorksAfter a static model of a biped humanoid is input, auto rigging uses AI algorithms for limb rigging and automatically generates the skeleton and skin weights for the model, to finish the skeleton rigging process. Then, the capability changes the orientation and position of the model skeleton so that the model can perform a range of actions such as walking, jumping, and dancing.
AdvantagesDelivering a wholly automated rigging processRigging can be done either manually or automatically. Most highly accurate rigging solutions that are available on the market require the input model to be in a standard position and seven or eight key skeletal points to be added manually.
Auto rigging from 3D Modeling Kit does not have any of these requirements, yet it is able to accurately rig a model.
Utilizing massive data for high-level algorithm accuracy and generalizationAccurate auto rigging depends on hundreds of thousands of 3D model rigging data records that are used to train the Huawei-developed algorithms behind the capability. Thanks to some fine-tuned data records, auto rigging delivers ideal algorithm accuracy and generalization. It can implement rigging for an object model that is created from photos taken from a standard mobile phone camera.
Input Model SpecificationsThe capability's official document lists the following suggestions for an input model that is to be used for auto rigging.
Source: a biped humanoid object (like a figurine or plush toy) that is not holding anything.
Appearance: The limbs and trunk of the object model are not separate, do not overlap, and do not feature any large accessories. The object model should stand on two legs, without its arms overlapping.
Posture: The object model should face forward along the z-axis and be upward along the y-axis. In other words, the model should stand upright, with its front facing forward. None of the model's joints should twist beyond 15 degrees, while there is no requirement on symmetry.
Mesh: The model meshes can be triangle or quadrilateral. The number of mesh vertices should not exceed 80,000. No large part of meshes is missing on the model.
Others: The limbs-to-trunk ratio of the object model complies with that of most toys. The limbs and trunk cannot be too thin or short, which means that the ratio of the arm width to the trunk width and the ratio of the leg width to the trunk width should be no less than 8% of the length of the object's longest edge.
Driven by AI, the auto rigging capability lowers the threshold of 3D modeling and animation creation, opening them up to amateur users.
While learning about this capability, I also came across three other fantastic capabilities of the 3D Modeling Kit. Wanna know what they are? Check them out here. Let me know in the comments section how your auto rigging has come along.
Augmented reality (AR) bridges real and virtual worlds, by integrating digital content into real-world environments. It allows people to interact with virtual objects as if they are real. Examples include product displays in shopping apps, interior design layouts in home design apps, accessible learning materials, real-time navigation, and immersive AR games. AR technology makes digital services and experiences more accessible than ever.
This has enormous implications in daily life. For instance, when shooting short videos or selfies, users can switch between different special effects or control the shutter button with specific gestures, which spares them from having to touch the screen. When browsing clothes or accessories, on an e-commerce website, users can use AR to "wear" the items virtually, and determine which clothing articles fit them, or which accessories match with which outfits. All of these services are dependent on precise hand gesture recognition, which HMS Core AR Engine provides via its hand skeleton tracking capability. If you are considering developing an app providing AR features, you would be remiss not to check out this capability, as it can streamline your app development process substantially.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
The hand skeleton tracking capability works by detecting and tracking the positions and postures of up to 21 hand skeleton joints, and generating true-to-life hand skeleton models with attributes like fingertip endpoints and palm orientation, as well as the hand skeleton itself. Please note that when there is more than one hand in an image, the service will only send back results and coordinates from the hand in which it has the highest degree of confidence. Currently, this service is only supported on certain Huawei phone models that are capable of obtaining image depth information.
AR Engine detects the hand skeleton in a precise manner, allowing your app to superimpose virtual objects on the hand with a high degree of accuracy, including on the fingertips or palm. You can also perform a greater number of precise operations on virtual hands, to enrich your AR app with fun new experiences and interactions.
Hand skeleton diagram
Simple Sign Language TranslationThe hand skeleton tracking capability can also be used to translate simple gestures in sign languages. By detecting key hand skeleton joints, it predicts how the hand posture will change, and maps movements like finger bending to a set of predefined gestures, based on a set of algorithms. For example, holding up the hand in a fist with the index finger sticking out is mapped to the gesture number one (1). This means that the kit can help equip your app with sign language recognition and translation features.
Building a Contactless Operation InterfaceIn science fiction movies, it is quite common to see a character controlling a computer panel with air gestures. With the skeleton tracking capability in AR Engine, this mind-bending technology is no longer out of reach.
With the phone's camera tracking the user's hand in real time, key skeleton joints like the fingertips are identified with a high degree of precision, which allows the user to interact with virtual objects with specific simple gestures. For example, pressing down on a virtual button can trigger an action, pressing and holding a virtual object can display the menu options, spreading two fingers apart on a small object across a larger object can show the details, or resizing a virtual object and placing it in a virtual pocket.
Such contactless gesture-based controls have been widely used in fields as diverse as medical equipment and vehicle head units.
Interactive Short Videos & Live StreamingThe hand skeleton tracking capability in AR Engine can help with adding gesture-based special effects to short videos or live streams. For example, when the user is shooting a short video, or starting a live stream, the capability will enable your app to identify their gestures, such as a V-sign, thumbs up, or finger heart, and then apply the corresponding special effects or stickers to the short video or live stream. This makes the interactions more engaging and immersive, and makes your app more appealing to users than competitor apps.
Hand skeleton tracking is also ideal in contexts like animation, course material presentation, medical training and imaging, and smart home controls.
The rapid development of AR technologies has made human-computer interactions based on gestures a hot topic throughout the industry. Implementing natural and human-friendly gesture recognition solutions is key to making these interactions more engaging. Hand skeleton tracking is the foundation for gesture recognition. By integrating AR Engine, you will be able to use this tracking capability to develop AR apps that provide users with more interesting and effortless features. Apps that offer such outstanding AR features will undoubtedly provide an enhanced user experience that helps them stand out from the myriad of competitor apps.
ConclusionAugmented reality is one of the most exciting new technological developments in the past few years, and a proven method for presenting a variety of digital content, including text, graphics, and videos, in a visually immersive manner. Now an increasing number of apps are opting to provide AR-based features of their own, in order to provide an interactive and easy-to-use experience in fields as diverse as medical training, interior design and modeling, real-time navigation, virtual classrooms, health care, and entertainment. Hand gesture recognition is at the core of this trend. If you are currently developing an app, the right development kit, which offers all the preset capabilities that you need for your app, is key to reducing the development workload and building the features that you want. It can also let you focus on optimizing the app's feature design and the user experience. AR Engine offers an effective and easy-to-use hand gesture tracking capability for AR apps. By integrating this kit, your app will be able to identify user hand gestures in real time in a highly precise manner, implement responsive user-device interactions based on these detected hand gestures, and therefore provide users with highly immersive and engaging AR experience.
Portrait Retouching ImportanceMobile phone camera technology is evolving — wide-angle lens, optical image stabilization, to name but a few. Thanks to this, video recording and mobile image editing apps are emerging one after another, utilizing technology to foster greater creativity.
Among these apps, live-streaming apps are growing with great momentum, thanks to an explosive number of streamers and viewers.
One function that a live-streaming app needs is portrait retouching. The reason is that though mobile phone camera parameters are already staggering, portraits captured by the camera can also be distorted for different reasons. For example, in a dim environment, a streamer's skin tone might appear dark, while factors such as the width of camera lens and shooting angle can make them look wide in videos. Issues like these can affect how viewers feel about a live video and how streamers feel about themselves, signaling the need for a portrait retouching function to address these issues.
I've developed a live-streaming demo app with such a function. Before I developed it, I identified two issues developing this function for a live-streaming app.
First, this function must be able to process video images in real time. A long duration between image input to output compromises interaction between a streamer and their viewers.
Secondly, this function requires a high level of face detection accuracy, to prevent the processed portrait from deformation, or retouched areas from appearing on unexpected parts.
To solve these challenges, I tested several available portrait retouching solutions and settled on the beauty capability from HMS Core Video Editor Kit. Let's see how the capability works to understand how it manages to address the challenges.
How the Capability Addresses the ChallengesThis capability adopts the CPU+NPU+GPU heterogeneous parallel framework, which allows it to process video images in real time. The capability algorithm runs faster, but requires less power.
Specifically speaking, the beauty capability delivers a processing frequency of over 50 fps in a device-to-device manner. For a video that contains multiple faces, the capability can simultaneously process a maximum of two faces, whose areas are the biggest in the video. This takes as little as 10 milliseconds to complete.
The capability uses 855 dense facial landmarks so that it can accurately recognize a face, allowing the capability to adapt its effects to a face that moves too fast or at a big angle during live streaming.
To ensure an excellent retouching effect, the beauty capability adopts detailed face segmentation and neutral gray for softening skin. As a result, the final effect looks very authentic.
Not only that, the capability is equipped with multiple, configurable retouching parameters. This feature, I think, is considerate and makes the capability deliver an even better user experience — considering that it is impossible to have a portrait retouching panacea that can satisfy all users. Developers like me can provide these parameters (including those for skin softening, skin tone adjustment, face contour adjustment, eye size adjustment, and eye brightness adjustment) directly to users, rather than struggle to design the parameters by ourselves. This offers more time for fine-tuning portraits in video images.
Knowing these features of the capability, I believed that it could help me create a portrait retouching function for my demo app. So let's move on to see how I developed my app.
Demo DevelopmentPreparationsMake sure the development environment is ready.
Configure app information in AppGallery Connect, including registering as a developer on the platform, creating an app, generating a signing certificate fingerprint, configuring the fingerprint, and enabling the kit.
Integrate the HMS Core SDK.
Configure obfuscation scripts.
Declare necessary permissions.
Capability Integration1. Set up the app authentication information. Two methods are available, using an API key or access token respectively:
API key: Call the setApiKey method to set the key, which only needs to be done once during app initialization.
Code:
HVEAIApplication.getInstance().setApiKey("your ApiKey");
The API key is obtained from AppGallery Connect, which is generated during app registration on the platform.
It's worth noting that you do not need to hardcode the key in the app code, or store the key in the app's configuration file. The right way to handle this is to store it on cloud, and obtain it when the app is running.
Access token: Call the setAccessToken method to set the token. This is done only once during app initialization.
Code:
HVEAIApplication.getInstance().setAccessToken("your access token");
The access token is generated by an app itself. Specifically speaking, call the
https://oauth-login.cloud.huawei.com/oauth2/v3/token
API and then an app-level access token is obtained.
2. Integrate the beauty capability.
Code:
// Create an HVEAIBeauty instance.
HVEAIBeauty hveaiBeauty = new HVEAIBeauty();
// Initialize the engine of the capability.
hveaiBeauty.initEngine(new HVEAIInitialCallback() {
@Override
public void onProgress(int progress) {
// Callback when the initialization progress is received.
}
@Override
public void onSuccess() {
// Callback when engine initialization is successful.
}
@Override
public void onError(int errorCode, String errorMessage) {
// Callback when engine initialization failed.
}
});
// Initialize the runtime environment of the capability in OpenGL. The method is called in the rendering thread of OpenGL.
hveaiBeauty.prepare();
// Set textureWidth (width) and textureHeight (height) of the texture to which the capability is applied. This method is called in the rendering thread of OpenGL after initialization or texture change.
// resize is a parameter, indicating the width and height. The parameter value must be greater than 0.
hveaiBeauty.resize(textureWidth, textureHeight);
// Configure the parameters for skin softening, skin tone adjustment, face contour adjustment, eye size adjustment, and eye brightness adjustment. The value of each parameter ranges from 0 to 1.
HVEAIBeautyOptions options = new HVEAIBeautyOptions.Builder().setBigEye(1)
.setBlurDegree(1)
.setBrightEye(1)
.setThinFace(1)
.setWhiteDegree(1)
.build();
// Update the parameters, after engine initialization or parameter change.
hveaiBeauty.updateOptions(options);
// Apply the capability, by calling the method in the rendering thread of OpenGL for each frame. inputTextureId: ID of the input texture; outputTextureId: ID of the output texture.
// The ID of the input texture should correspond to a face that faces upward.
int outputTextureId = hveaiBeauty.process(inputTextureId);
// Release the engine.
hveaiBeauty.releaseEngine();
The development process ends here, so now we can check out how my demo works:
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Not to brag, but I do think the retouching result is ideal and natural: With all the effects added, the processed portrait does not appear deformed.
I've got my desired solution for creating a portrait retouching function. I believe this solution can also play an important role in an image editing app or any app that requires portrait retouching. I'm quite curious as to how you will use it. Now I'm off to find a solution that can "retouch" music instead of photos for a music player app, which can, for example, add more width to a song — Wish me luck!
ConclusionThe live-streaming app market is expanding rapidly, receiving various requirements from streamers and viewers. One of the most desired functions is portrait retouching, which is used to address the distorted portraits and unfavorable video watching experience.
Compared with other kinds of apps, a live-streaming app has two distinct requirements for the portrait retouching function, which are real-time processing of video images and accurate face detection. The beauty capability from HMS Core Video Editor Kit addresses them effectively, by using technologies such as the CPU+NPU+GPU heterogeneous parallel framework and 855 dense facial landmarks. The capability also offers several customizable parameters to enable different users to retouch their portraits as needed. On top of these, the capability can be easily integrated, helping develop an app requiring the portrait retouching feature.