Text to speech (TTS) is highly sought after by audio/video editors, thanks to its ability to automatically turn text into naturally sounding speech, as a low cost alternative to human dubbing. It can be used on all kinds of video, regardless of whether the video is long or short.
I recently stumbled upon the AI dubbing capability of HMS Core Audio Editor Kit, which does just that. It is able to turn input text into speech with just a tap, and comes loaded with a selection of smooth, naturally-sounding male and female timbres.
This is ideal for developing apps that involve e-books, creating audio content, and editing audio/video. Below describes how I integrated this capability.
Making PreparationsComplete all necessary preparations by following the official guide.
Configuring the Project1. Set the app authentication information
The information can be set via an API key or access token (recommended).
Use setAccessToken to set an access token during app initialization.
Java:
HAEApplication.getInstance().setAccessToken("your access token");
Or, use setApiKey to set an API key during app initialization. The API key needs to be set only once.
Java:
HAEApplication.getInstance().setApiKey("your ApiKey");
2. Initialize the runtime environment
Initialize HuaweiAudioEditor, and create a timeline and necessary lanes.
Java:
// Create a HuaweiAudioEditor instance.
HuaweiAudioEditor mEditor = HuaweiAudioEditor.create(mContext);
// Initialize the runtime environment of HuaweiAudioEditor.
mEditor.initEnvironment();
// Create a timeline.
HAETimeLine mTimeLine = mEditor.getTimeLine();
// Create a lane.
HAEAudioLane audioLane = mTimeLine.appendAudioLane();
Import audio.
Java:
// Add an audio asset to the end of the lane.
HAEAudioAsset audioAsset = audioLane.appendAudioAsset("/sdcard/download/test.mp3", mTimeLine.getCurrentTime());
3. Integrate AI dubbing.
Call HAEAiDubbingEngine to implement AI dubbing.
Java:
// Configure the AI dubbing engine.
HAEAiDubbingConfig haeAiDubbingConfig = new HAEAiDubbingConfig()
// Set the volume.
.setVolume(volumeVal)
// Set the speech speed.
.setSpeed(speedVal)
// Set the speaker.
.setType(defaultSpeakerType);
// Create a callback for an AI dubbing task.
HAEAiDubbingCallback callback = new HAEAiDubbingCallback() {
@Override
public void onError(String taskId, HAEAiDubbingError err) {
// Callback when an error occurs.
}
@Override
public void onWarn(String taskId, HAEAiDubbingWarn warn) {}
@Override
public void onRangeStart(String taskId, int start, int end) {}
@Override
public void onAudioAvailable(String taskId, HAEAiDubbingAudioInfo haeAiDubbingAudioFragment, int i, Pair<Integer, Integer> pair, Bundle bundle) {
// Start receiving and then saving the file.
}
@Override
public void onEvent(String taskId, int eventID, Bundle bundle) {
// Synthesis is complete.
if (eventID == HAEAiDubbingConstants.EVENT_SYNTHESIS_COMPLETE) {
// The AI dubbing task has been complete. That is, the synthesized audio data is completely processed.
}
}
@Override
public void onSpeakerUpdate(List<HAEAiDubbingSpeaker> speakerList, List<String> lanList,
List<String> lanDescList) { }
};
// AI dubbing engine.
HAEAiDubbingEngine mHAEAiDubbingEngine = new HAEAiDubbingEngine(haeAiDubbingConfig);
// Set the listener for the playback process of an AI dubbing task.
mHAEAiDubbingEngine.setAiDubbingCallback(callback);
// Convert text to speech and play the speech. In the method, text indicates the text to be converted to speech, and mode indicates the mode for playing the converted audio.
String taskId = mHAEAiDubbingEngine.speak(text, mode);
// Pause playback.
mHAEAiDubbingEngine.pause();
// Resume playback.
mHAEAiDubbingEngine.resume();
// Stop AI dubbing.
mHAEAiDubbingEngine.stop();
ResultIn the demo below, I successfully implement the AI dubbing function in app. Now, I can converts text into emotionally expressive speech, with default and custom timbres.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
To learn more, please visit:
>> Audio Editor Kit official website
>> Audio Editor Kit Development Guide
>> Reddit to join developer discussions
>> GitHub to download the sample code
>> Stack Overflow to solve integration problems
Follow our official account for the latest HMS Core-related news and updates.
Related
Overview
When I try to perform voice commands on my devices, the device will often fail to recognize what I am trying to say, because of my poor pronunciation. For example, sometimes I can't distinguish between syllables, or make the "ch" and "sh" sounds, which have led to some frustrating experiences. I've always envied people who can enunciate well, and recite tongue twisters with ease, and have dreamed of the day when that could be me. By chance, I came across the game Tongue Twister, which integrates HUAWEI ML Kit's ASR service, and has changed my life for the better. Let's take a look at how the game works.
Application Scenarios
There are five levels in Tongue Twister, and as you'd expect, each level contains a tongue twister. The key for passing each level is ML Kit's ASR service. By integrating the service, the game is able to recognize the player's voice with a high degree of accuracy. Players are thus able to pass each level when they demonstrate clear enunciation. The service has proven itself to be highly useful in certain fields, enhancing recognition capabilities for product, movie, and music searches, as well as navigation services.
Now, let's look at what the game looks like in practice.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Piqued your interest? With the ASR service, why not create a tongue twister game of your own? Here's how...
Development Procedures
1. For details about how to set the authentication information for your app, please refer to Notes on Using Cloud Authentication Information.
2. Call an API to create a speech recognizer.
Code:
MLAsrRecognizer mSpeechRecognizer = MLAsrRecognizer.createAsrRecognizer(context);
3. Create a speech recognition result listener callback.
Code:
/**
* Use the callback to implement the MLAsrListener API and methods in the API.
*/
protected class SpeechRecognitionListener implements MLAsrListener {
@Override
public void onStartListening() {
// The recorder starts to receive speech.
}
@Override
public void onStartingOfSpeech() {
// The user starts to speak, that is, the speech recognizer detects that the user starts to speak.
}
@Override
public void onVoiceDataReceived(byte[] data, float energy, Bundle bundle) {
// Return the original PCM stream and audio power to the user.
}
@Override
public void onRecognizingResults(Bundle partialResults) {
// Receive the recognized text from MLAsrRecognizer.
}
@Override
public void onResults(Bundle results) {
// Text data of ASR.
}
}
@Override
public void onError(int error, String errorMessage) {
// If you don't add this, there will be no response after you cut the network
}
@Override
public void onState(int state, Bundle params) {
// Notify the app status change.
}
}
4. Bind the new result listener callback to the speech recognizer.
Code:
mSpeechRecognizer.setAsrListener(new SpeechRecognitionListener());
5. Set the recognition parameters and initiate speech recognition.
Code:
// Set parameters and start the audio device.
Intent mSpeechRecognizerIntent = new Intent(MLAsrConstants.ACTION_HMS_ASR_SPEECH);
mSpeechRecognizerIntent
// Set the language that can be recognized to English. If this parameter is not set,
// English is recognized by default. Example: "zh-CN": Chinese;"en-US": English;"fr-FR": French;"es-ES": Spanish;"de-DE": German;"it-IT": Italian.
.putExtra(MLAsrConstants.LANGUAGE, language)
// Set to return the recognition result along with the speech. If you ignore the setting, this mode is used by default. Options are as follows:
// MLAsrConstants.FEATURE_WORDFLUX: Recognizes and returns texts through onRecognizingResults.
// MLAsrConstants.FEATURE_ALLINONE: After the recognition is complete, texts are returned through onResults.
.putExtra(MLAsrConstants.FEATURE, MLAsrConstants.FEATURE_WORDFLUX);mSpeechRecognizer.startRecognizing(mSpeechRecognizerIntent);
6. Release resources when the recognition ends.
Code:
if (mSpeechRecognizer != null) {
mSpeechRecognizer.destroy();
mSpeechRecognizer = null;
}
Maven repository address
Code:
buildscript {
repositories {
maven { url 'https://developer.huawei.com/repo/' }
}
}
allprojects {
repositories {
maven { url 'https://developer.huawei.com/repo/' }
}
}
SDK import
Code:
dependencies {
// Automatic speech recognition Long voice SDK.
implementation 'com.huawei.hms:ml-computer-voice-realtimetranscription:2.0.3.300'
// Automatic speech recognition SDK.
implementation 'com.huawei.hms:ml-computer-voice-asr:2.0.3.300'
// Automatic speech recognition plugin.
implementation 'com.huawei.hms:ml-computer-voice-asr-plugin:2.0.3.300'
}
Manifest files
Code:
<manifest
...
<meta-data
android:name="com.huawei.hms.ml.DEPENDENCY"
android:value="ocr />
...
</manifest>
Permission
Code:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
Dynamic permission application
Code:
private void requestCameraPermission() {
final String[] permissions = new String[]{Manifest.permission.RECORD_AUDIO};
if (!ActivityCompat.shouldShowRequestPermissionRationale(this,
Manifest.permission.RECORD_AUDIO)) { ActivityCompat.requestPermissions(this,
permissions,
TongueTwisterActivity.AUDIO_CODE);
return;
}
}
Summary
In addition to game applications, ML Kit's ASR service also takes effect in other scenarios, such as in shopping apps. The service is able to recognize a spoken product name or feature, and convert it into text to search for the product. For music apps, the service can likewise, recognize song and artist names. For navigation as well, the driver will naturally prefer to speak a destination rather than type it, and have it converted into text using ASR, to enjoy an optimally safe driving experience.
Learn More
For more information, please visit HUAWEI Developers.
For detailed instructions, please visit Development Guide.
You can join the HMS Core developer discussion by going to Reddit.
You can download the demo and sample code on GitHub.
To solve integration problems, please go to Stack Overflow.
"John, have you seen my glasses?"
Our old friend John, a programmer at Huawei, has a grandpa who despite his old age, is an avid reader. Leaning back, struggling to make out what was written on the newspaper through his glasses, but unable to take his eyes off the text — this was how my grandpa used to read, John explained.
Reading this way was harmful on his grandpa's vision, and it occurred to John that the ears could take over the role of "reading" from the eyes. He soon developed a text-reading app that followed this logic, recognizing and then reading out text from a picture. Thanks to this app, John's grandpa now can ”read” from the comfort of his rocking chair, without having to strain his eyes.
How to Implement
The user takes a picture of a text passage. The app then automatically identifies the location of the text within the picture, and adjusts the shooting angle to an angle directly facing the text.
The app recognizes and extracts the text from the picture.
The app converts the recognized text into audio output by leveraging text-to-speech technology.
These functions are easy to implement, when relying on three services in HUAWEI ML Kit: document skew correction, text recognition, and text to speech (TTS).
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Preparations
1. Configure the Huawei Maven repository address.
2. Add the build dependencies for the HMS Core SDK.
Code:
dependencies {
// Import the base SDK.
implementation 'com.huawei.hms:ml-computer-voice-tts:2.1.0.300'
// Import the bee voice package.
implementation 'com.huawei.hms:ml-computer-voice-tts-model-bee:2.1.0.300'
// Import the eagle voice package.
implementation 'com.huawei.hms:ml-computer-voice-tts-model-eagle:2.1.0.300'
// Import a PDF file analyzer.
implementation 'com.itextpdf:itextg:5.5.10'
}
Tap PREVIOUS or NEXT to turn to the previous or next page. Tap speak to start reading; tap it again to pause reading.
Development process
1. Create a TTS engine by using the custom configuration class MLTtsConfig. Here, on-device TTS is used as an example.
Java:
private void initTts() {
// Set authentication information for your app to download the model package from the server of Huawei.
MLApplication.getInstance().setApiKey(AGConnectServicesConfig.
fromContext(getApplicationContext()).getString("client/api_key"));
// Create a TTS engine by using MLTtsConfig.
mlTtsConfigs = new MLTtsConfig()
// Set the text converted from speech to English.
.setLanguage(MLTtsConstants.TTS_EN_US)
// Set the speaker with the English male voice (eagle).
.setPerson(MLTtsConstants.TTS_SPEAKER_OFFLINE_EN_US_MALE_EAGLE)
// Set the speech speed whose range is (0, 5.0]. 1.0 indicates a normal speed.
.setSpeed(.8f)
// Set the volume whose range is (0, 2). 1.0 indicates a normal volume.
.setVolume(1.0f)
// Set the TTS mode to on-device.
.setSynthesizeMode(MLTtsConstants.TTS_OFFLINE_MODE);
mlTtsEngine = new MLTtsEngine(mlTtsConfigs);
// Update the configuration when the engine is running.
mlTtsEngine.updateConfig(mlTtsConfigs);
// Pass the TTS callback function to the TTS engine to perform TTS.
mlTtsEngine.setTtsCallback(callback);
// Create an on-device TTS model manager.
manager = MLLocalModelManager.getInstance();
isPlay = false;
}
2. Create a TTS callback function for processing the TTS result.
Java:
MLTtsCallback callback = new MLTtsCallback() {
@Override
public void onError(String taskId, MLTtsError err) {
// Processing logic for TTS failure.
}
@Override
public void onWarn(String taskId, MLTtsWarn warn) {
// Alarm handling without affecting service logic.
}
@Override
// Return the mapping between the currently played segment and text. start: start position of the audio segment in the input text; end (excluded): end position of the audio segment in the input text.
public void onRangeStart(String taskId, int start, int end) {
// Process the mapping between the currently played segment and text.
}
@Override
// taskId: ID of a TTS task corresponding to the audio.
// audioFragment: audio data.
// offset: offset of the audio segment to be transmitted in the queue. One TTS task corresponds to a TTS queue.
// range: text area where the audio segment to be transmitted is located; range.first (included): start position; range.second (excluded): end position.
public void onAudioAvailable(String taskId, MLTtsAudioFragment audioFragment, int offset,
Pair<Integer, Integer> range, Bundle bundle) {
// Audio stream callback API, which is used to return the synthesized audio data to the app.
}
@Override
public void onEvent(String taskId, int eventId, Bundle bundle) {
// Callback method of a TTS event. eventId indicates the event name.
boolean isInterrupted;
switch (eventId) {
case MLTtsConstants.EVENT_PLAY_START:
// Called when playback starts.
break;
case MLTtsConstants.EVENT_PLAY_STOP:
// Called when playback stops.
isInterrupted = bundle.getBoolean(MLTtsConstants.EVENT_PLAY_STOP_INTERRUPTED);
break;
case MLTtsConstants.EVENT_PLAY_RESUME:
// Called when playback resumes.
break;
case MLTtsConstants.EVENT_PLAY_PAUSE:
// Called when playback pauses.
break;
// Pay attention to the following callback events when you focus on only the synthesized audio data but do not use the internal player for playback.
case MLTtsConstants.EVENT_SYNTHESIS_START:
// Called when TTS starts.
break;
case MLTtsConstants.EVENT_SYNTHESIS_END:
// Called when TTS ends.
break;
case MLTtsConstants.EVENT_SYNTHESIS_COMPLETE:
// TTS is complete. All synthesized audio streams are passed to the app.
isInterrupted = bundle.getBoolean(MLTtsConstants.EVENT_SYNTHESIS_INTERRUPTED);
break;
default:
break;
}
}
};
3. Extract text from a PDF file.
Java:
private String loadText(String path) {
String result = "";
try {
PdfReader reader = new PdfReader(path);
result = result.concat(PdfTextExtractor.getTextFromPage(reader,
mCurrentPage.getIndex() + 1).trim() + System.lineSeparator());
reader.close();
} catch (IOException e) {
showToast(e.getMessage());
}
// Obtain the position of the header.
int header = result.indexOf(System.lineSeparator());
// Obtain the position of the footer.
int footer = result.lastIndexOf(System.lineSeparator());
if (footer != 0){
// Do not display the text in the header and footer.
return result.substring(header, footer - 5);
}else {
return result;
}
}
4. Perform TTS in on-device mode.
Java:
// Create an MLTtsLocalModel instance to set the speaker so that the language model corresponding to the speaker can be downloaded through the model manager.
MLTtsLocalModel model = new MLTtsLocalModel.Factory(MLTtsConstants.TTS_SPEAKER_OFFLINE_EN_US_MALE_EAGLE).create();
manager.isModelExist(model).addOnSuccessListener(new OnSuccessListener<Boolean>() {
@Override
public void onSuccess(Boolean aBoolean) {
// If the model is not downloaded, call the download API. Otherwise, call the TTS API of the on-device engine.
if (aBoolean) {
String source = loadText(mPdfPath);
// Call the speak API to perform TTS. source indicates the text to be synthesized.
mlTtsEngine.speak(source, MLTtsEngine.QUEUE_APPEND);
if (isPlay){
// Pause playback.
mlTtsEngine.pause();
tv_speak.setText("speak");
}else {
// Resume playback.
mlTtsEngine.resume();
tv_speak.setText("pause");
}
isPlay = !isPlay;
} else {
// Call the API for downloading the on-device TTS model.
downloadModel(MLTtsConstants.TTS_SPEAKER_OFFLINE_EN_US_MALE_EAGLE);
showToast("The offline model has not been downloaded!");
}
}
}).addOnFailureListener(new OnFailureListener() {
@Override
public void onFailure(Exception e) {
showToast(e.getMessage());
}
});
5. Release resources when the current UI is destroyed.
Java:
@Override
protected void onDestroy() {
super.onDestroy();
try {
if (mParcelFileDescriptor != null) {
mParcelFileDescriptor.close();
}
if (mCurrentPage != null) {
mCurrentPage.close();
}
if (mPdfRenderer != null) {
mPdfRenderer.close();
}
if (mlTtsEngine != null){
mlTtsEngine.shutdown();
}
} catch (IOException e) {
e.printStackTrace();
}
}
Other Applicable Scenarios
TTS can be used across a broad range of scenarios. For example, you could integrate it into an education app to read bedtime stories to children, or integrate it into a navigation app, which could read out instructions aloud.
For more details, you can go to:
Reddit to join our developer discussion
GitHub to download demos and sample codes
Stack Overflow to solve any integration problems
Original Source
Well explained will it supports all languages?
Static biometric verification is a feature of HMS Core ML Kit, which captures faces in real time and can determine whether a face belongs to a real person or not, without prompting the user to move their head or face. In this way, the service helps deliver a convenient user experience that wins positive feedback.
Technical PrinciplesStatic biometric verification requires an RGB camera and is able to differentiate between a real person's face and a spoof attack (such as an image or screenshot of a face and a face mask), through details (such as the moiré pattern or reflection on a paper photo) in the image captured by the camera. The service supports data from a wide array of scenarios, including different lighting conditions, face accessories, genders, hairstyles, and mask materials. The service analyzes a face's surroundings to detect suspicious environments.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
The static biometric verification model adopts the lightweight convolutional module, and linear computation is converted to a single convolutional module or a fully connected layer in the inference phase through reparameterization. The MindSpore Lite inference framework is used for model deployment, which crops operators. The model's package size is then shrunk, making it more convenient for integration.
Application ScenariosLiveness detection is usually used before face verification. For example, when a user uses facial recognition to unlock their phone, liveness detection first determines whether the captured face is real or not. If yes, face verification will then check whether the face matches the one recorded in the system. These two technologies complement one another to protect a user's device from unauthorized access.
So it's safe to say that static biometric verification provides rigid protection for our apps, and I'm here to illustrate how this can be integrated.
Integration ProcedurePreparations
The detailed preparations are all provided in the official document for the service.
Two modes are available to call the service:
Call ModeLiveness Detection ProcessLiveness Detection UIFunctionDefault View ModeProcessed by ML KitProvided by ML KitDetermines whether a face is real or not.Customized View ModeProcessed by ML KitCustomDetermines whether a face is real or not.
Default View Mode
1. Create a callback to obtain the static biometric verification result
Java:
private MLLivenessCapture.Callback callback = new MLLivenessCapture.Callback() {
@Override
public void onSuccess(MLLivenessCaptureResult result) {
// Callback when verification is successful. The result indicates whether the face is of a real person.
}
@Override
public void onFailure(int errorCode) {
// Callback when verification fails. For example, the camera is abnormal (CAMERA_ERROR). Add the processing logic to deal with the failure.
}
};
2. Create a static biometric verification instance and start verification
Java:
MLLivenessCapture capture = MLLivenessCapture. getInstance();
capture.startDetect(activity, callback);
Customized View Mode
1. Create an MLLivenessDetectView instance and load it to the activity layout
Java:
/**
* i. Bind the camera preview screen to the remote view and set the liveness detection area.
* In the camera preview stream, static biometric verification determines whether a face is in the middle of the image. To improve the pass rate, you are advised to place the face frame in the middle of the screen and set the liveness detection area to be slightly larger than the face frame.
* ii. Set whether to detect the mask.
* iii. Set the result callback.
* iv. Load MLLivenessDetectView to the activity.
*/
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_liveness_custom_detection);
mPreviewContainer = findViewById(R.id.surface_layout);
// ObtainLLivenessDetectView
mlLivenessDetectView = new MLLivenessDetectView.Builder()
.setContext(this)
// Set whether to detect the mask.
.setOptions(MLLiveness DetectView.DETECT_MASK)
// Set the rectangle of the face frame relative to MLLivenessDetectView.
.setFaceRect(new Rect(0, 0, 0, 200))
// Set the result callback.
.setDetectCallback(new OnMLLivenessDetectCallback() {
@Override
public void onCompleted(MLLivenessCaptureResult result) {
// Callback when verification is complete.
}
@Override
public void onError(int error) {
// Callback when an error occurs during verification.
}
@Override
public void onInfo(int infoCode, Bundle bundle) {
// Callback when the verification prompt message is received. This message can be displayed on the UI.
// if(infoCode==MLLivenessDetectInfo.NO_FACE_WAS_DETECTED){
// No face is detected.
// }
// ...
}
@Override
public void onStateChange(int state, Bundle bundle) {
// Callback when the verification status changes.
// if(state==MLLivenessDetectStates.START_DETECT_FACE){
// Start face detection.
// }
// ...
}
}).build();
mPreviewContainer.addView(mlInteractiveLivenessDetectView);
mlInteractiveLivenessDetectView.onCreate(savedInstanceState);
}
2. Set a lifecycle listener for MLLivenessDetectView
Java:
@Override
protected void onDestroy() {
super.onDestroy();
mlLivenessDetectView.onDestroy();
}
@Override
protected void onPause() {
super.onPause();
mlLivenessDetectView.onPause();
}
@Override
protected void onResume() {
super.onResume();
mlLivenessDetectView.onResume();
}
@Override
protected void onStart() {
super.onStart();
mlLivenessDetectView.onStart();
}
@Override
protected void onStop() {
super.onStop();
mlLivenessDetectView.onStop();
}
To learn more, please visit:
>> HUAWEI Developers official website
>> Development Guide
>> Reddit to join developer discussions
>> GitHub to download the sample code
>> Stack Overflow to solve integration problems
Follow our official account for the latest HMS Core-related news and updates.
Efficient records management is more relevant now than ever. In our digital age, huge growth of information — audio, video, and more — must be handled in a limited time. This makes a real-time transcription function essential, because it is useful in many scenarios.
In audio or video conferencing, this function records meeting minutes that I can refer to later, which is more convenient than writing them all by myself. I've seen my kids struggling to take notes during their online courses, so I know this process can be so much easier with the help of the transcription function. In short, it removed the job of writing down everything the teacher says, allowing the kids to focus on the lecture itself and easily review the content again later. Also, the live captions provide viewers with real-time subtitles, for a better watching experience.
As a coder, I am a believer in "Actions speak louder than words". That's why I developed a real-time transcription function, with the help of a real-time transcription capability from ML Kit, like this.
Demo
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
This function transcribes up to 5 hours of speech into Chinese, English (or both), and French languages in real time. In addition, the output text is punctuated and contains timestamps.
This function has some requirements: the support for French is dependent on the mobile phone model, whereas Chinese and English are available on all mobile phone models. Also, the function requires Internet connection.
Okay, let's move on to the point of this article: How I developed this real-time transcription function.
Development Procedure1. Make necessary preparations. This is described in detail in the References section.
2. Create and then configure a speech recognizer.
Code:
MLSpeechRealTimeTranscriptionConfig config = new MLSpeechRealTimeTranscriptionConfig.Factory()
// Set the language, which can be Chinese, English, both Chinese and English, or French.
.setLanguage(MLSpeechRealTimeTranscriptionConstants.LAN_ZH_CN)
// Punctuate the text recognized from the speech.
.enablePunctuation(true)
// Set the sentence offset.
.enableSentenceTimeOffset(true)
// Set the word offset.
.enableWordTimeOffset(true)
.create();
MLSpeechRealTimeTranscription mSpeechRecognizer = MLSpeechRealTimeTranscription.getInstance();
3. Create a callback for the speech recognition result listener.
Code:
// Use the callback to implement the MLSpeechRealTimeTranscriptionListener API and methods in the API.
Protected class SpeechRecognitionListener implements MLSpeechRealTimeTranscriptionListener{
@Override
public void onStartListening() {
// The recorder starts to receive speech.
}
@Override
public void onStartingOfSpeech() {
// The speech recognizer detects the user speaking.
}
@Override
public void onVoiceDataReceived(byte[] data, float energy, Bundle bundle) {
// Return the original PCM stream and audio power to the user. The API does not run in the main thread, and the return result is processed in a sub-thread.
}
@Override
public void onRecognizingResults(Bundle partialResults) {
// Receive recognized text from MLSpeechRealTimeTranscription.
}
@Override
public void onError(int error, String errorMessage) {
// Callback when an error occurs during recognition.
}
@Override
public void onState(int state,Bundle params) {
// Notify the app of the recognizer status change.
}
}
4. Bind the speech recognizer.
Code:
mSpeechRecognizer.setRealTimeTranscriptionListener(new SpeechRecognitionListener());
5. Call startRecognizing to begin speech recognition.
Code:
mSpeechRecognizer.startRecognizing(config);
6. Stop recognition and release resources occupied by the recognizer when the recognition is complete.
Code:
if (mSpeechRecognizer!= null) {
mSpeechRecognizer.destroy();
}
ReferencesAudio Transcription: What It Is, What It Is Not, and Why It's in High Demand
Configuring Necessary Information During Preparation
Adding a Plug-In and the Maven Repository Address, and Configuring the Building Dependencies
BackgroundVideos are memories — so why not spend more time making them look better? Many mobile apps on the market simply offer basic editing functions, such as applying filters and adding stickers. That said, it is not enough for those who want to create dynamic videos, where a moving person stays in focus. Traditionally, this requires a keyframe to be added and the video image to be manually adjusted, which could scare off many amateur video editors.
I am one of those people and I've been looking for an easier way of implementing this kind of feature. Fortunately for me, I stumbled across the track person capability from HMS Core Video Editor Kit, which automatically generates a video that centers on a moving person, as the images below show.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Before using the capability
After using the capability
Thanks to the capability, I can now confidently create a video with the person tracking effect.
Let's see how the function is developed.
Development ProcessPreparationsConfigure the app information in AppGallery Connect.
Project Configuration1. Set the authentication information for the app via an access token or API key.
Use the setAccessToken method to set an access token during app initialization. This needs setting only once.
Code:
MediaApplication.getInstance().setAccessToken("your access token");
Or, use setApiKey to set an API key during app initialization. The API key needs to be set only once.
Code:
MediaApplication.getInstance().setApiKey("your ApiKey");
2. Set a unique License ID.
Code:
MediaApplication.getInstance().setLicenseId("License ID");
3. Initialize the runtime environment for HuaweiVideoEditor.
When creating a video editing project, first create a HuaweiVideoEditor object and initialize its runtime environment. Release this object when exiting a video editing project.
(1) Create a HuaweiVideoEditor object.
Code:
HuaweiVideoEditor editor = HuaweiVideoEditor.create(getApplicationContext());
(2) Specify the preview area position.
The area renders video images. This process is implemented via SurfaceView creation in the SDK. The preview area position must be specified before the area is created.
Code:
<LinearLayout
android:id="@+id/video_content_layout"
android:layout_width="0dp"
android:layout_height="0dp"
android:background="@color/video_edit_main_bg_color"
android:gravity="center"
android:orientation="vertical" />
// Specify the preview area position.
LinearLayout mSdkPreviewContainer = view.findViewById(R.id.video_content_layout);
// Configure the preview area layout.
editor.setDisplay(mSdkPreviewContainer);
(3) Initialize the runtime environment. LicenseException will be thrown if license verification fails.
Creating the HuaweiVideoEditor object will not occupy any system resources. The initialization time for the runtime environment has to be manually set. Then, necessary threads and timers will be created in the SDK.
Code:
try {
editor.initEnvironment();
} catch (LicenseException error) {
SmartLog.e(TAG, "initEnvironment failed: " + error.getErrorMsg());
finish();
return;
}
4. Add a video or an image.
Create a video lane. Add a video or an image to the lane using the file path.
Code:
// Obtain the HVETimeLine object.
HVETimeLine timeline = editor.getTimeLine();
// Create a video lane.
HVEVideoLane videoLane = timeline.appendVideoLane();
// Add a video to the end of the lane.
HVEVideoAsset videoAsset = videoLane.appendVideoAsset("test.mp4");
// Add an image to the end of the video lane.
HVEImageAsset imageAsset = videoLane.appendImageAsset("test.jpg");
Function Building
Code:
// Initialize the capability engine.
visibleAsset.initHumanTrackingEngine(new HVEAIInitialCallback() {
@Override
public void onProgress(int progress) {
// Initialization progress.
}
@Override
public void onSuccess() {
// The initialization is successful.
}
@Override
public void onError(int errorCode, String errorMessage) {
// The initialization failed.
}
});
// Track a person using the coordinates. Coordinates of two vertices that define the rectangle containing the person are returned.
List<Float> rects = visibleAsset.selectHumanTrackingPerson(bitmap, position2D);
// Enable the effect of person tracking.
visibleAsset.addHumanTrackingEffect(new HVEAIProcessCallback() {
@Override
public void onProgress(int progress) {
// Handling progress.
}
@Override
public void onSuccess() {
// Handling successful.
}
@Override
public void onError(int errorCode, String errorMessage) {
// Handling failed.
}
});
// Interrupt the effect.
visibleAsset.interruptHumanTracking();
// Remove the effect.
visibleAsset.removeHumanTrackingEffect();
ReferencesThe Importance of Visual Effects
Track Person