How to Convert Audio File into Text With Machine Learning

How to Convert Audio File into Text With Machine Learning - Huawei Developers

IntroductionConverting audio into text has a wide range of applications: generating video subtitles, taking meeting minutes, and writing interview transcripts. Machine learning makes doing so easier than ever before, converting audio files into meticulously accurate text, with correct punctuation as well!
Actual EffectsBuild and run an app with audio file transcription integrated. Then, select a local audio file and convert it into text.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Development PreparationsFor details about configuring the Huawei Maven repository and integrating the audio file transcription SDK, please refer to the Development Guide of ML Kit on HUAWEI Developers.
Declaring Permissions in the AndroidManifest.xml File
Open the AndroidManifest.xml in the main folder. Add the network connection, network status access, and storage read permissions before <application.
Please note that these permissions need to be dynamically applied for. Otherwise, Permission Denied will be reported.
Code:
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
Development ProcedureCreating and Initializing an Audio File Transcription Engine
Override onCreate in MainActivity to create an audio transcription engine.
Code:
private MLRemoteAftEngine mAnalyzer;
mAnalyzer = MLRemoteAftEngine.getInstance();
mAnalyzer.init(getApplicationContext());
mAnalyzer.setAftListener(mAsrListener);
Use MLRemoteAftSetting to configure the engine. The service currently supports Mandarin Chinese and English, that is, the options of mLanguage are zh and en.
Code:
MLRemoteAftSetting setting = new MLRemoteAftSetting.Factory()
.setLanguageCode(mLanguage)
.enablePunctuation(true)
.enableWordTimeOffset(true)
.enableSentenceTimeOffset(true)
.create();
enablePunctuation indicates whether to automatically punctuate the converted text, with a default value of false.
If this parameter is set to true, the converted text is automatically punctuated; false otherwise.
enableWordTimeOffset indicates whether to generate the text transcription result of each audio segment with the corresponding offset. The default value is false. You need to set this parameter only when the audio duration is less than 1 minute.
If this parameter is set to true, the offset information is returned along with the text transcription result. This applies to the transcription of short audio files with a duration of 1 minute or shorter. If this parameter is set to false, only the text transcription result of the audio file will be returned.
enableSentenceTimeOffset indicates whether to output the offset of each sentence in the audio file. The default value is false.
If this parameter is set to true, the offset information is returned along with the text transcription result. If this parameter is set to false, only the text transcription result of the audio file will be returned.
Creating a Listener Callback to Process the Transcription Result
Code:
private MLRemoteAftListener mAsrListener = new MLRemoteAftListener()
After the listener is initialized, call startTask in AftListener to start the transcription.
Code:
@Override
public void onInitComplete(String taskId, Object ext) {
Log.i(TAG, "MLRemoteAftListener onInitComplete" + taskId);
mAnalyzer.startTask(taskId);
Override onUploadProgress, onEvent, and onResult in MLRemoteAftListener.
Code:
@Override
public void onUploadProgress(String taskId, double progress, Object ext) {
Log.i(TAG, " MLRemoteAftListener onUploadProgress is " + taskId + " " + progress);
}
@Override
public void onEvent(String taskId, int eventId, Object ext) {
Log.e(TAG, "MLAsrCallBack onEvent" + eventId);
if (MLAftEvents.UPLOADED_EVENT == eventId) { // The file is uploaded successfully.
showConvertingDialog();
startQueryResult(); // Obtain the transcription result.
}
}
@Override
public void onResult(String taskId, MLRemoteAftResult result, Object ext) {
Log.i(TAG, "onResult get " + taskId);
if (result != null) {
Log.i(TAG, "onResult isComplete " + result.isComplete());
if (!result.isComplete()) {
return;
}
if (null != mTimerTask) {
mTimerTask.cancel();
}
if (result.getText() != null) {
Log.e(TAG, result.getText());
dismissTransferringDialog();
showCovertResult(result.getText());
}
List<MLRemoteAftResult.Segment> segmentList = result.getSegments();
if (segmentList != null && segmentList.size() != 0) {
for (MLRemoteAftResult.Segment segment : segmentList) {
Log.e(TAG, "MLAsrCallBack segment text is : " + segment.getText() + ", startTime is : " + segment.getStartTime() + ". endTime is : " + segment.getEndTime());
}
}
List<MLRemoteAftResult.Segment> words = result.getWords();
if (words != null && words.size() != 0) {
for (MLRemoteAftResult.Segment word : words) {
Log.e(TAG, "MLAsrCallBack word text is : " + word.getText() + ", startTime is : " + word.getStartTime() + ". endTime is : " + word.getEndTime());
}
}
List<MLRemoteAftResult.Segment> sentences = result.getSentences();
if (sentences != null && sentences.size() != 0) {
for (MLRemoteAftResult.Segment sentence : sentences) {
Log.e(TAG, "MLAsrCallBack sentence text is : " + sentence.getText() + ", startTime is : " + sentence.getStartTime() + ". endTime is : " + sentence.getEndTime());
}
}
}
}
Processing the Transcription Result in Polling Mode
After the transcription is completed, call getLongAftResult to obtain the transcription result. Process the obtained result every 10 seconds.
Code:
private void startQueryResult() {
Timer mTimer = new Timer();
mTimerTask = new TimerTask() {
@Override
public void run() {
getResult();
}
};
mTimer.schedule(mTimerTask, 5000, 10000); // Process the obtained long speech transcription result every 10s.
}
private void getResult() {
Log.e(TAG, "getResult");
mAnalyzer.setAftListener(mAsrListener);
mAnalyzer.getLongAftResult(mLongTaskId);
}
References:For more details, you can go to:
ML Kit official website
ML Kit Development Documentation page, to find the documents you need
Reddit to join our developer discussion
GitHub to download ML Kit sample codes
Stack Overflow to solve any integration problems

Thanks for sharing!

Related

Product Visual Search – Ultimate Guide

More information like this, you can visit HUAWEI Developer Forum
Introduction
HMS ML kit service searches in pre-established product image library for the same or similar products based on a product image taken by a customer, and returns the IDs of those products and related information.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Use Case
We will capture the product image using device camera from our developed shopping application.
We will show the returned products list in recycle view.
Prerequisite
Java JDK 1.8 or higher is recommended.
Android Studio is recommended.
Huawei android device with HMS core 4.0.0.300 or higher.
Before developing an app, you will need to register as a HUAWEI developer. Refer to Register a HUAWEI ID.
Integrating app gallery connect SDK. Refer to AppGallery Connect Service Getting Started.
Implementation
Enable ML kit in Manage APIs. Refer to Service Enabling.
Integrate following dependencies in app-level build.gradle.
Code:
// Import the product visual search SDK.
implementation 'com.huawei.hms:ml-computer-vision-cloud:2.0.1.300'
3. Add agc plugin in the top of app.gradle file.
Code:
apply plugin: 'com.huawei.agconnect'
4. Add the following permission in manifest.
Camera permission android.permission.CAMERA: Obtains real-time images or videos from a camera.
Internet access permission android.permission.INTERNET: Accesses cloud services on the Internet.
Storage write permission android.permission.WRITE_EXTERNAL_STORAGE: Upgrades the algorithm version.
Storage read permission android.permission.READ_EXTERNAL_STORAGE: Reads photos stored on a device.
5. To request camera permission in realtime.
Code:
private void requestCameraPermission() {
final String[] permissions = new String[] {Manifest.permission.CAMERA};
if (!ActivityCompat.shouldShowRequestPermissionRationale(this, Manifest.permission.CAMERA)) {
ActivityCompat.requestPermissions(this, permissions, this.CAMERA_PERMISSION_CODE);
return;
}
}
6. Add following code in Application class
Code:
public class MyApplication extends Application {
@Override
public void onCreate() {
super.onCreate();
MLApplication.getInstance().setApiKey("API KEY");
}
}
API key can be obtained either from AGC or integrated agcconnect-services.json.
7. To create an analyzer for product visual search.
Code:
private void initializeProductVisionSearch() {
MLRemoteProductVisionSearchAnalyzerSetting settings = new MLRemoteProductVisionSearchAnalyzerSetting.Factory()
// Set the maximum number of products that can be returned.
.setLargestNumOfReturns(16)
// Set the product set ID. (Contact [email protected] to obtain the configuration guide.)
// .setProductSetId(productSetId)
// Set the region.
.setRegion(MLRemoteProductVisionSearchAnalyzerSetting.REGION_DR_CHINA)
.create();
analyzer
= MLAnalyzerFactory.getInstance().getRemoteProductVisionSearchAnalyzer(settings);
}
8. To capture image from camera.
Code:
Intent intent = new Intent(MediaStore.ACTION_IMAGE_CAPTURE);
startActivityForResult(intent, REQ_CAMERA_CODE);
9. Once image has been captured, onActivityResult() method will be executed.
Code:
@Override
public void onActivityResult(int requestCode, int resultCode, Intent data) {
super.onActivityResult(requestCode, resultCode, data);
Log.d(TAG, "onActivityResult");
if(requestCode == 101) {
if (resultCode == RESULT_OK) {
Bitmap bitmap = (Bitmap) data.getExtras().get("data");
if (bitmap != null) {
// Create an MLFrame object using the bitmap, which is the image data in bitmap format.
MLFrame mlFrame = new MLFrame.Creator().setBitmap(bitmap).create();
mlImageDetection(mlFrame);
}
}
}
}
private void mlImageDetection(MLFrame mlFrame) {
Task> task = analyzer.asyncAnalyseFrame(mlFrame);
task.addOnSuccessListener(new OnSuccessListener>() {
public void onSuccess(List products) {
// Processing logic for detection success.
displaySuccess(products);
}})
.addOnFailureListener(new OnFailureListener() {
public void onFailure(Exception e) {
// Processing logic for detection failure.
// Recognition failure.
try {
MLException mlException = (MLException)e;
// Obtain the result code. You can process the result code and customize respective messages displayed to users.
int errorCode = mlException.getErrCode();
// Obtain the error information. You can quickly locate the fault based on the result code.
String errorMessage = mlException.getMessage();
} catch (Exception error) {
// Handle the conversion error.
}
}
});
}
private void displaySuccess(List productVisionSearchList) {
List productImageList = new ArrayList();
String prodcutType = "";
for (MLProductVisionSearch productVisionSearch : productVisionSearchList) {
Log.d(TAG, "type: " + productVisionSearch.getType() );
prodcutType = productVisionSearch.getType();
for (MLVisionSearchProduct product : productVisionSearch.getProductList()) {
productImageList.addAll(product.getImageList());
Log.d(TAG, "custom content: " + product.getCustomContent() );
}
}
StringBuffer buffer = new StringBuffer();
for (MLVisionSearchProductImage productImage : productImageList) {
String str = "ProductID: " + productImage.getProductId() + "
ImageID: " + productImage.getImageId() + "
Possibility: " + productImage.getPossibility();
buffer.append(str);
buffer.append("
");
}
Log.d(TAG , "display success: " + buffer.toString());
FragmentTransaction transaction = getFragmentManager().beginTransaction();
transaction.replace(R.id.main_fragment_container, new SearchResultFragment(productImageList, prodcutType ));
transaction.commit();
}
onSuccess() callback will give us list of MLProductVisionSearch objects, which can be used to get product id and image URL of each product. Also we can get the product type using productVisionSearch.getType(). getType() returns number which can be mapped.
10. We can achieve the product type mapping with following code.
Code:
private String getProductType(String type) {
switch(type) {
case "0":
return "Others";
case "1":
return "Clothing";
case "2":
return "Shoes";
case "3":
return "Bags";
case "4":
return "Digital & Home appliances";
case "5":
return "Household Products";
case "6":
return "Toys";
case "7":
return "Cosmetics";
case "8":
return "Accessories";
case "9":
return "Food";
}
return "Others";
}
11. To get product id and image URL from MLVisionSearchProductImage.
Code:
@Override
public void onBindViewHolder(ViewHolder holder, int position) {
final MLVisionSearchProductImage mlProductVisionSearch = productVisionSearchList.get(position);
holder.tvTitle.setText(mlProductVisionSearch.getProductId());
Glide.with(context)
.load(mlProductVisionSearch.getImageId())
.diskCacheStrategy(DiskCacheStrategy.ALL)
.into(holder.imageView);
}
Images
Reference
https://developer.huawei.com/consumer/en/doc/development/HMSCore-Guides-V5/sdk-data-security-0000001050040129-V5

How to Integrate HUAWEI Drive Kit

More information like this, you can visit HUAWEI Developer Forum
Hello everyone, in this article we will create an Android application using the capabilities of HUAWEI Drive Kit that will allow users to manage and edit files in HUAWEI Drive, as well as store photos, drawings, designs, recordings and videos.
Why should we use HUAWEI Drive Kit?
With HUAWEI Drive Kit, you can let your users store data on the cloud quick and easy. Users can upload, download, synchronize, and view images, videos, and documents whenever and wherever they want.
There are 3 main functions provided by HUAWEI Drive Kit.
User file management: Includes file search, comment, and reply in addition to basic functions.
App data management: Supports app data storage and restoration.
Cross-platform capability: Provides RESTful APIs to support app access from non-Android devices.
Integration Preparations
Before you get started, you must first register as a HUAWEI developer and verify your identity on the HUAWEI Developer website. For more details, please refer to Register a HUAWEI ID.
Software Requirements
Java JDK installation package
Android SDK package
Android Studio 3.X
HMS Core (APK) 3.X or later
Supported Locations
To see all supported locations, please refer to HUAWEI Drive Kit Supported Locations
Integrating HMS Core SDK
To integrate HMS Core SDK in your application and learn creating a new project on AppGallery Connect, follow this great guide: https://medium.com/huawei-developers/android-integrating-your-apps-with-huawei-hms-core-1f1e2a090e98
Keep in mind: Users need to sign in with their HUAWEI ID before they can use the functions provided by HUAWEI Drive Kit. If the user has not signed in with a HUAWEI ID, the HMS Core SDK will prompt the user to sign in first.
Implementation
Code:
implementation 'com.huawei.hms:drive:5.0.0.301'
implementation 'com.huawei.hms:hwid:5.0.1.301'
Now that we’re ready to implement our methods, let’s see how they work.
Signing In with a HUAWEI ID
To implement the sign-in function through the HMS Core SDK, you will need to set the Drive scope for obtaining the permission to access Drive APIs.
Each Drive scope corresponds to a certain type of permissions. You may apply for the permissions as needed. For details about the corresponding APIs, please refer to HUAWEI Account Kit Development Guide.
Code:
private void driveSignIn() {
List<Scope> scopeList = new ArrayList<>();
scopeList.add(new Scope(DriveScopes.SCOPE_DRIVE));
scopeList.add(new Scope(DriveScopes.SCOPE_DRIVE_APPDATA));
scopeList.add(new Scope(DriveScopes.SCOPE_DRIVE_FILE));
scopeList.add(new Scope(DriveScopes.SCOPE_DRIVE_METADATA));
scopeList.add(new Scope(DriveScopes.SCOPE_DRIVE_METADATA_READONLY));
scopeList.add(new Scope(DriveScopes.SCOPE_DRIVE_READONLY));
scopeList.add(HuaweiIdAuthAPIManager.HUAWEIID_BASE_SCOPE);
HuaweiIdAuthParams authParams = new HuaweiIdAuthParamsHelper(HuaweiIdAuthParams.DEFAULT_AUTH_REQUEST_PARAM)
.setAccessToken()
.setIdToken()
.setScopeList(scopeList)
.createParams();
HuaweiIdAuthService authService = HuaweiIdAuthManager.getService(this, authParams);
startActivityForResult(authService.getSignInIntent(), REQUEST_SIGN_IN_LOGIN);
}
@Override
protected void onActivityResult(int requestCode, int resultCode, @Nullable Intent data) {
super.onActivityResult(requestCode, resultCode, data);
if (requestCode == REQUEST_SIGN_IN_LOGIN) {
Task<AuthHuaweiId> authHuaweiIdTask = HuaweiIdAuthManager.parseAuthResultFromIntent(data);
if (authHuaweiIdTask.isSuccessful()) {
AuthHuaweiId authHuaweiId = authHuaweiIdTask.getResult();
mAccessToken = authHuaweiId.getAccessToken();
mUnionId = authHuaweiId.getUnionId();
int returnCode = init(mUnionId, mAccessToken, refreshAccessToken);
if (returnCode == DriveCode.SUCCESS) {
Log.d(TAG, "onActivityResult: driveSignIn success");
} else {
Log.d(TAG, "onActivityResult: driveSignIn failed");
}
} else {
Log.d(TAG, "onActivityResult, signIn failed: " + ((ApiException) authHuaweiIdTask.getException()).getStatusCode());
}
}
}
Handling the Permissions
Add the read and write permissions for the phone storage to AndroidManifest.xml
Code:
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission
android:name="android.permission.WRITE_MEDIA_STORAGE"
tools:ignore="ProtectedPermissions" />
Add the permissions to request in MainActivity.java.
Code:
private static String[] PERMISSIONS_STORAGE = {
Manifest.permission.READ_EXTERNAL_STORAGE,
Manifest.permission.WRITE_EXTERNAL_STORAGE,
Manifest.permission.CAMERA
};
Request the permissions in the onCreate method.
Code:
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M) {
requestPermissions(PERMISSIONS_STORAGE, 1);
}
Create a Folder and Upload a File
Now that we handled necessary permissions, let’s upload some files!
Create a folder in the root directory of Drive, and upload the file in the designated directory.
Keep in mind: the code below contains global variables and functions, you can download the sample code from the link at the end of the article to view their meanings.
Code:
private void uploadFiles() {
new Thread(new Runnable() {
@Override
public void run() {
try {
if (mAccessToken == null) {
Log.d(TAG, "need to sign in first");
return;
}
if (StringUtils.isNullOrEmpty(et_uploadFileName.getText().toString())) {
Log.d(TAG, "file name is required to upload");
return;
}
String path = getExternalFilesDir(null).getAbsolutePath()
+ "/" + et_uploadFileName.getText();
Log.d(TAG, "run: " + path);
java.io.File fileObj = new java.io.File(path);
if (!fileObj.exists()) {
Log.d(TAG, "file does not exists");
return;
}
Drive drive = buildDrive();
Map<String, String> appProperties = new HashMap<>();
appProperties.put("appProperties", "property");
String dirName = "Uploads";
File file = new File();
file.setFileName(dirName)
.setAppSettings(appProperties)
.setMimeType("application/vnd.huawei-apps.folder");
directoryCreated = drive.files().create(file).execute();
// Upload the file
File fileToUpload = new File()
.setFileName(fileObj.getName())
.setMimeType(mimeType(fileObj))
.setParentFolder(Collections.singletonList(directoryCreated.getId()));
Drive.Files.Create request = drive.files()
.create(fileToUpload, new FileContent(mimeType(fileObj), fileObj));
fileUploaded = request.execute();
Log.d(TAG, "upload success");
} catch (Exception e) {
Log.d(TAG, "upload error " + e.toString());
}
}
}).start();
}
If applicationData is selected, the operation will be performed on the app data folder. The app data folder is invisible to users and is used to store app-specific data.
After the upload is done, the users can view the file by going to Files > HUAWEI Drive.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Query File Details
With this method we can get the details of a file in Drive. Below, we get the id, file name and size information.
Code:
private void queryFiles() {
new Thread(new Runnable() {
@Override
public void run() {
try {
if (mAccessToken == null) {
Log.d(TAG, "need to sign in first");
return;
}
String containers = "";
String queryFile = "fileName = '" + et_searchFileName.getText()
+ "' and mimeType != 'application/vnd.huawei-apps.folder'";
if (cb_isApplicationData.isChecked()) {
containers = "applicationData";
queryFile = "'applicationData' in parentFolder and ".concat(queryFile);
}
Drive drive = buildDrive();
Drive.Files.List request = drive.files().list();
FileList files;
while (true) {
files = request
.setQueryParam(queryFile)
.setPageSize(10)
.setOrderBy("fileName")
.setFields("category,nextCursor,files(id,fileName,size)")
.setContainers(containers)
.execute();
if (files == null || files.getFiles().size() > 0) {
break;
}
if (!StringUtils.isNullOrEmpty(files.getNextCursor())) {
request.setCursor(files.getNextCursor());
} else {
break;
}
}
String text = "";
if (files != null && files.getFiles().size() > 0) {
fileSearched = files.getFiles().get(0);
text = fileSearched.toString();
} else {
text = "empty";
}
final String finalText = text;
MainActivity.this.runOnUiThread(new Runnable() {
@Override
public void run() {
tv_queryResult.setText(finalText);
}
});
Log.d(TAG, "query success");
} catch (Exception e) {
Log.d(TAG, "query error " + e.toString());
}
}
}).start();
}
Download Files
We can also download files from Drive. Just query the file (notice the fileSearched global variable above) and download with Drive.Files.Get request.
This is not the end. For full content, you can visit https://forums.developer.huawei.com/forumPortal/en/topicview?tid=0202357120930420217&fid=0101187876626530001

Is there any restriction on type of files which can be uploaded and while uploading are there any security measures taken like encryption of my data ?

Nice and useful article
DOes Huawei drive kit work in India?

Integrating Text to speech (TTS) in B4A Platform

{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Introduction
This article talks about how to convert text to Text to speech (TTS) can convert text information into audio output. Rich timbres are provided and the volume and speed can be adjusted, thereby natural voices can be produced. This service uses the deep neural network (DNN) synthesis mode and can be quickly integrated through the on-device SDK to generate audio data in real time. In the current version, two standard male voices and six standard female voices are available.
TTS is available only on Huawei phones.
The text in a single request can contain a maximum of 500 characters and is encoded using UTF-8.
TTS in French, Spanish, German, and Italian is deployed only in Asia, Africa, Latin America, and Europe.
TTS depends on on-cloud APIs. During commissioning and usage, ensure that the device can access the Internet.
Default specifications of the real-time output audio data are as follows: PCM mono, 16-bit depth, and 16 kHz audio sampling rate.
Follow all the steps mentioned in Basic Setup to start HMS integration on B4A Platform.
Refer to previous Article
Enable ML Kit in App gallery connect.
Tips & Tricks:
In case voice is not audible please make ensure that ML-Kit is enabled or not in App Gallery Connect.
Creating Wrapper Class
1. Downloading the AAR Packages and JSON File
Sign in to HUAWEI Developer and download the AAR packages required.
AAR packages related to HMS TTS kit is displayed as below
Agconnect-core:
https://developer.huawei.com/repo/com/huawei/agconnect/agconnect-core/1.0.0.300/agconnect-core-1.0.0.300.aar
HMSSDKBase:
https://developer.huawei.com/repo/com/huawei/hms/base/4.0.1.300/base-4.0.1.300.aar
Ml-computer-agc-inner:
https://developer.huawei.com/repo/com/huawei/hms/ml-computer-agc-inner/2.0.1.300/ml-computer-agc-inner-2.0.1.300.aar
Ml-computer-commonutils-inner:
https://developer.huawei.com/repo/com/huawei/hms/ml-computer-commonutils-inner/2.0.1.300/ml-computer-commonutils-inner-2.0.1.300.aar
Ml-computer-ha-inner:
https://developer.huawei.com/repo/com/huawei/hms/ml-computer-ha-inner/2.0.0.300/ml-computer-ha-inner-2.0.0.300.aar
Tasks:
https://developer.huawei.com/repo/com/huawei/hmf/tasks/1.4.1.300/tasks-1.4.1.300.aar
Network-common:
https://developer.huawei.com/repo/com/huawei/hms/network-common/4.0.2.301/network-common-4.0.2.301.aar
Network-grs:
https://developer.huawei.com/repo/com/huawei/hms/network-grs/4.0.2.301/network-grs-4.0.2.301.aar
Ml-computer-grs-inner:
https://developer.huawei.com/repo/com/huawei/hms/ml-computer-grs-inner/2.0.0.300/ml-computer-grs-inner-2.0.0.300.aar
Ml-computer-net:
https://developer.huawei.com/repo/com/huawei/hms/ml-computer-net/1.0.4.300/ml-computer-net-1.0.4.300.aar
Okhttp:
https://jar-download.com/artifacts/com.squareup.okhttp3/okhttp/3.12.0/source-code
Okio:
https://mvnrepository.com/artifact/com.squareup.okio/okio/1.15.0
Ml-computer-voice-tts:
https://developer.huawei.com/repo/com/huawei/hms/ml-computer-voice-tts/2.0.0.401/ml-computer-voice-tts-2.0.0.401.aar
2. Open AAR packages with rar tool and rename the class.jar and AndroidManifest.xml files. And save those jar file inside libs folder (It is recommended that the two files be renamed consistently with the AAR package names.)
3. Copy required permissions in the <manifest> section in B4A IDE
4. Copy all configurations in the <application> section
5. Change ${applicationId} to $PACKAGE$.
6. Download the configuration file (agconnect-services.json) from App Gallery Connect and Add the JSON File to the assets Folder of the AAR file as shown below.
[/CENTER]
B4A will automatically incorporate files in the assets folder of an AAR package to the assets folder of your main project.
Encapsulating Java Files Using SLC
1. Create Library as parent and then create bin, libs and src as subfolder in the project directory.
2. Develop java project inside the following path, Choose Library Folder > src > b4x > tts > demo
3. Import the following two lines of code to each Java file
Code:
import anywheresoftware.b4a.BA;
import anywheresoftware.b4a.BA.*;
import anywheresoftware.b4a.IOnActivityResult;
4. Add necessary annotations to the Java files
Code:
@DependsOn(values={
"base-4.0.1.300.aar",
"agconnect-core-1.0.1.300.aar",
"tasks-1.3.1.302.aar",
"network-common-4.0.2.301.aar",
"network-grs-4.0.2.301.aar",
"ml-computer-net-2.0.1.300.aar",
"ml-computer-grs-inner-2.0.1.300.aar",
"okhttp-3.12.0.jar",
"okio-1.15.0.jar",
"ml-computer-voice-tts-2.0.1.300.aar",
"ml-computer-agc-inner-2.0.1.300.aar",
"ml-computer-commonutils-inner-2.0.1.300.aar",
"ml-computer-ha-inner-2.0.1.300.aar"
})
5. Modify the context.
B4A does not recognize the Context class. Therefore, when parsing a class that contains the @vercion(1.0f) annotation, it will report an error if a method of the class has referenced Context. In this case, you need to change Context to the B4A object BA
Code:
@ShortName("ttsWork")
@Events(values = {"TTSStop (token As String)"})
public class ttsWork {
public static String eventName = "listener";
public static final String TAG = "tts_Kit";
public static void init(Context context) {
AGConnectServicesConfig.fromContext(context).overlayWith(new LazyInputStream(context) {
@Override
public InputStream get(Context context) {
try {
return context.getAssets().open("agconnect-services.json");
} catch (IOException e) {
e.printStackTrace();
BA.Log(e.toString());
}
return null;
}
});
}
private static IOnActivityResult ion;
public static void ListenFortts(final Context context, String text) {
MLApplication.getInstance().setApiKey("API_KEY");
MLTtsConfig mlConfigs = new MLTtsConfig().setLanguage(MLTtsConstants.TTS_EN_US).setPerson(MLTtsConstants.TTS_SPEAKER_FEMALE_EN).setSpeed(1.0f).setVolume(1.0f);
MLTtsEngine mlTtsEngine = new MLTtsEngine(mlConfigs);
Log.d(" text from ui ", text);
mlTtsEngine.updateConfig(mlConfigs);
mlTtsEngine.setTtsCallback(callback);
String sourceText = "Welcome";
String id = mlTtsEngine.speak(text, MLTtsEngine.QUEUE_APPEND);
}
static MLTtsCallback callback = new MLTtsCallback() {
@Override
public void onError(String taskId, MLTtsError err) {
}
@Override
public void onWarn(String taskId, MLTtsWarn warn) {
}
@Override
public void onRangeStart(String taskId, int start, int end) {
}
@Override
public void onAudioAvailable(String taskId, MLTtsAudioFragment audioFragment, int offset, Pair<Integer, Integer> range, Bundle bundle)
{
}
@Override
public void onEvent(String taskId, int eventId, Bundle bundle) {
// Log.v(" stop service ","text1111");
switch (eventId) {
case MLTtsConstants.EVENT_PLAY_START:
Log.v(" stop service ", "textplaystart");
ba.raiseEventFromUI(this, eventName + "_TTSText", "text");
Log.v(" stop service ", "text11");
break;
case MLTtsConstants.EVENT_PLAY_STOP:
boolean isInterrupted = bundle.getBoolean(MLTtsConstants.EVENT_PLAY_STOP_INTERRUPTED);
Log.v(" stop service ", "textplaystop");
ba.raiseEventFromUI(this, eventName + "_TTSStop", "Service Stopped");
Log.v(" stop service ", "text11");
break;
case MLTtsConstants.EVENT_PLAY_RESUME:
break;
case MLTtsConstants.EVENT_PLAY_PAUSE:
break;
case MLTtsConstants.EVENT_SYNTHESIS_START:
break;
case MLTtsConstants.EVENT_SYNTHESIS_END:
break;
case MLTtsConstants.EVENT_SYNTHESIS_COMPLETE:
boolean isInterrupted1 = bundle.getBoolean(MLTtsConstants.EVENT_SYNTHESIS_INTERRUPTED);
break;
default:
break;
}
}
};
}

Can we customize languages ?

How a Programmer Used 300 Lines of Code to Help His Grandma Shop Online with Voice Input

"John, why the writing pad is missing again?"
John, programmer at Huawei, has a grandma who loves novelty, and lately she's been obsessed with online shopping. Familiarizing herself with major shopping apps and their functions proved to be a piece of cake, and she had thought that her online shopping experience would be effortless — unfortunately, however, she was hindered by product searching.
John's grandma tended to use handwriting input. When using it, she would often make mistakes, like switching to another input method she found unfamiliar, or tapping on undesired characters or signs.
Except for shopping apps, most mobile apps feature interface designs that are oriented to younger users — it's no wonder that elderly users often struggle to figure out how to use them.
John patiently helped his grandma search for products with handwriting input several times. But then, he decided to use his skills as a veteran coder to give his grandma the best possible online shopping experience. More specifically, instead of helping her adjust to the available input method, he was determined to create an input method that would conform to her usage habits.
Since his grandma tended to err during manual input, John developed an input method that converts speech into text. Grandma was enthusiastic about the new method, because it is remarkably easy to use. All she has to do is to tap on the recording button and say the product's name. The input method then recognizes what she has said, and converts her speech into text.
Actual Effects
Real-time speech recognition and speech to text are ideal for a broad range of apps, including:
Game apps (online): Real-time speech recognition comes to users' aid when they team up with others. It frees up users' hands for controlling the action, sparing them from having to type to communicate with their partners. It can also free users from any potential embarrassment related to voice chatting during gaming.
Work apps: Speech to text can play a vital role during long conferences, where typing to keep meeting minutes can be tedious and inefficient, with key details being missed. Using speech to text is much more efficient: during a conference, users can use this service to convert audio content into text; after the conference, they can simply retouch the text to make it more logical.
Learning apps: Speech to text can offer users an enhanced learning experience. Without the service, users often have to pause audio materials to take notes, resulting in a fragmented learning process. With speech to text, users can concentrate on listening intently to the material while it is being played, and rely on the service to convert the audio content into text. They can then review the text after finishing the entire course, to ensure that they've mastered the content.
How to Implement
Two services in HUAWEI ML Kit: automatic speech recognition (ASR) and audio file transcription, make it easy to implement the above functions.
ASR can recognize speech of up to 60s, and convert the input speech into text in real time, with recognition accuracy of over 95%. It currently supports Mandarin Chinese (including Chinese-English bilingual speech), English, French, German, Spanish, Italian, and Arabic.
l Real-time result output
l Available options: with and without speech pickup UI
l Endpoint detection: Start and end points can be accurately located.
l Silence detection: No voice packet is sent for silent portions.
l Intelligent conversion to digital formats: For example, the year 2021 is recognized from voice input.
Audio file transcription can convert an audio file of up to five hours into text with punctuation, and automatically segment the text for greater clarity. In addition, this service can generate text with timestamps, facilitating further function development. In this version, both Chinese and English are supported.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Development Procedures
1. Preparations
（1） Configure the Huawei Maven repository address, and put the agconnect-services.json file under the app directory.
Open the build.gradle file in the root directory of your Android Studio project.
Add the AppGallery Connect plugin and the Maven repository.
l Go to allprojects > repositories and configure the Maven repository address for the HMS Core SDK.
l Go to buildscript > repositories and configure the Maven repository address for the HMS Core SDK.
l If the agconnect-services.json file has been added to the app, go to buildscript > dependencies and add the AppGallery Connect plugin configuration.
Code:
<p style="line-height: 1.5em;">buildscript {
repositories {
google()
jcenter()
maven { url 'https://developer.huawei.com/repo/' }
}
dependencies {
classpath 'com.android.tools.build:gradle:3.5.4'
classpath 'com.huawei.agconnect:agcp:1.4.1.300'
// NOTE: Do not place your app dependencies here; they belong
// in the individual module build.gradle files.
}
}
allprojects {
repositories {
google()
jcenter()
maven { url 'https://developer.huawei.com/repo/' }
}
}</p>
2） Add the build dependencies for the HMS Core SDK.
Code:
<p style="line-height: 1.5em;">dependencies {
//The audio file transcription SDK.
implementation 'com.huawei.hms:ml-computer-voice-aft:2.2.0.300'
// The ASR SDK.
implementation 'com.huawei.hms:ml-computer-voice-asr:2.2.0.300'
// Plugin of ASR.
implementation 'com.huawei.hms:ml-computer-voice-asr-plugin:2.2.0.300'
...
}
apply plugin: 'com.huawei.agconnect' // AppGallery Connect plugin.</p>
(3） Configure the signing certificate in the build.gradle file under the app directory.
Code:
<p style="line-height: 1.5em;">signingConfigs {
release {
storeFile file("xxx.jks")
keyAlias xxx
keyPassword xxxxxx
storePassword xxxxxx
v1SigningEnabled true
v2SigningEnabled true
}
}
buildTypes {
release {
minifyEnabled false
proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
}
debug {
signingConfig signingConfigs.release
debuggable true
}
}</p>
(4） Add permissions in the AndroidManifest.xml file.
Code:
<p style="line-height: 1.5em;"><uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.ACCESS_WIFI_STATE" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<application
android:requestLegacyExternalStorage="true"
...
</application>
</p>
2. Integrating the ASR Service
（1） Dynamically apply for the permissions.
Code:
<p style="line-height: 1.5em;">if (ActivityCompat.checkSelfPermission(this, Manifest.permission.RECORD_AUDIO) != PackageManager.PERMISSION_GRANTED) {
requestCameraPermission();
}
private void requestCameraPermission() {
final String[] permissions = new String[]{Manifest.permission.RECORD_AUDIO};
if (!ActivityCompat.shouldShowRequestPermissionRationale(this, Manifest.permission.RECORD_AUDIO)) {
ActivityCompat.requestPermissions(this, permissions, Constants.AUDIO_PERMISSION_CODE);
return;
}
}
</p>
（2） Create an Intent to set parameters.
Code:
<p style="line-height: 1.5em;">// Set authentication information for your app.
MLApplication.getInstance().setApiKey(AGConnectServicesConfig.fromContext(this).getString("client/api_key"));
//// Use Intent for recognition parameter settings.
Intent intentPlugin = new Intent(this, MLAsrCaptureActivity.class)
// Set the language that can be recognized to English. If this parameter is not set, English is recognized by default. Example: "zh-CN": Chinese; "en-US": English.
.putExtra(MLAsrCaptureConstants.LANGUAGE, MLAsrConstants.LAN_EN_US)
// Set whether to display the recognition result on the speech pickup UI.
.putExtra(MLAsrCaptureConstants.FEATURE, MLAsrCaptureConstants.FEATURE_WORDFLUX);
startActivityForResult(intentPlugin, "1");</p>
（3） Override the onActivityResult method to process the result returned by ASR.
Code:
<p style="line-height: 1.5em;">@Override
protected void onActivityResult(int requestCode, int resultCode, @Nullable Intent data) {
super.onActivityResult(requestCode, resultCode, data);
String text = "";
if (null == data) {
addTagItem("Intent data is null.", true);
}
if (requestCode == "1") {
if (data == null) {
return;
}
Bundle bundle = data.getExtras();
if (bundle == null) {
return;
}
switch (resultCode) {
case MLAsrCaptureConstants.ASR_SUCCESS:
// Obtain the text information recognized from speech.
if (bundle.containsKey(MLAsrCaptureConstants.ASR_RESULT)) {
text = bundle.getString(MLAsrCaptureConstants.ASR_RESULT);
}
if (text == null || "".equals(text)) {
text = "Result is null.";
Log.e(TAG, text);
} else {
// Display the recognition result in the search box.
searchEdit.setText(text);
goSearch(text, true);
}
break;
// MLAsrCaptureConstants.ASR_FAILURE: Recognition fails.
case MLAsrCaptureConstants.ASR_FAILURE:
// Check whether an error code is contained.
if (bundle.containsKey(MLAsrCaptureConstants.ASR_ERROR_CODE)) {
text = text + bundle.getInt(MLAsrCaptureConstants.ASR_ERROR_CODE);
// Troubleshoot based on the error code.
}
// Check whether error information is contained.
if (bundle.containsKey(MLAsrCaptureConstants.ASR_ERROR_MESSAGE)) {
String errorMsg = bundle.getString(MLAsrCaptureConstants.ASR_ERROR_MESSAGE);
// Troubleshoot based on the error information.
if (errorMsg != null && !"".equals(errorMsg)) {
text = "[" + text + "]" + errorMsg;
}
}
// Check whether a sub-error code is contained.
if (bundle.containsKey(MLAsrCaptureConstants.ASR_SUB_ERROR_CODE)) {
int subErrorCode = bundle.getInt(MLAsrCaptureConstants.ASR_SUB_ERROR_CODE);
// Troubleshoot based on the sub-error code.
text = "[" + text + "]" + subErrorCode;
}
Log.e(TAG, text);
break;
default:
break;
}
}
}
</p>
3. Integrating the Audio File Transcription Service
（1） Dynamically apply for the permissions.
Code:
<p style="line-height: 1.5em;">private static final int REQUEST_EXTERNAL_STORAGE = 1;
private static final String[] PERMISSIONS_STORAGE = {
Manifest.permission.READ_EXTERNAL_STORAGE,
Manifest.permission.WRITE_EXTERNAL_STORAGE };
public static void verifyStoragePermissions(Activity activity) {
// Check if the write permission has been granted.
int permission = ActivityCompat.checkSelfPermission(activity,
Manifest.permission.WRITE_EXTERNAL_STORAGE);
if (permission != PackageManager.PERMISSION_GRANTED) {
// The permission has not been granted. Prompt the user to grant it.
ActivityCompat.requestPermissions(activity, PERMISSIONS_STORAGE,
REQUEST_EXTERNAL_STORAGE);
}
}
</p>
（2） Create and initialize an audio transcription engine, and create an audio file transcription configurator.
Code:
<p style="line-height: 1.5em;">// Set the API key.
MLApplication.getInstance().setApiKey(AGConnectServicesConfig.fromContext(getApplication()).getString("client/api_key"));
MLRemoteAftSetting setting = new MLRemoteAftSetting.Factory()
// Set the transcription language code, complying with the BCP 47 standard. Currently, Mandarin Chinese and English are supported.
.setLanguageCode("zh")
// Set whether to automatically add punctuations to the converted text. The default value is false.
.enablePunctuation(true)
// Set whether to generate the text transcription result of each audio segment and the corresponding audio time shift. The default value is false. (This parameter needs to be set only when the audio duration is less than 1 minute.)
.enableWordTimeOffset(true)
// Set whether to output the time shift of a sentence in the audio file. The default value is false.
.enableSentenceTimeOffset(true)
.create();
// Create an audio transcription engine.
MLRemoteAftEngine engine = MLRemoteAftEngine.getInstance();
engine.init(this);
// Pass the listener callback to the audio transcription engine created beforehand.
engine.setAftListener(aftListener);</p>
（3） Create a listener callback to process the audio file transcription result.
l Transcription of short audio files with a duration of 1 minute or shorter:
Code:
<p style="line-height: 1.5em;">private MLRemoteAftListener aftListener = new MLRemoteAftListener() {
public void onResult(String taskId, MLRemoteAftResult result, Object ext) {
// Obtain the transcription result notification.
if (result.isComplete()) {
// Process the transcription result.
}
}
@Override
public void onError(String taskId, int errorCode, String message) {
// Callback upon a transcription error.
}
@Override
public void onInitComplete(String taskId, Object ext) {
// Reserved.
}
@Override
public void onUploadProgress(String taskId, double progress, Object ext) {
// Reserved.
}
@Override
public void onEvent(String taskId, int eventId, Object ext) {
// Reserved.
}
};
</p>
l Transcription of audio files with a duration longer than 1 minute:
Code:
<p style="line-height: 1.5em;">private MLRemoteAftListener asrListener = new MLRemoteAftListener() {
@Override
public void onInitComplete(String taskId, Object ext) {
Log.e(TAG, "MLAsrCallBack onInitComplete");
// The long audio file is initialized and the transcription starts.
start(taskId);
}
@Override
public void onUploadProgress(String taskId, double progress, Object ext) {
Log.e(TAG, " MLAsrCallBack onUploadProgress");
}
@Override
public void onEvent(String taskId, int eventId, Object ext) {
// Used for the long audio file.
Log.e(TAG, "MLAsrCallBack onEvent" + eventId);
if (MLAftEvents.UPLOADED_EVENT == eventId) { // The file is uploaded successfully.
// Obtain the transcription result.
startQueryResult(taskId);
}
}
@Override
public void onResult(String taskId, MLRemoteAftResult result, Object ext) {
Log.e(TAG, "MLAsrCallBack onResult taskId is :" + taskId + " ");
if (result != null) {
Log.e(TAG, "MLAsrCallBack onResult isComplete: " + result.isComplete());
if (result.isComplete()) {
TimerTask timerTask = timerTaskMap.get(taskId);
if (null != timerTask) {
timerTask.cancel();
timerTaskMap.remove(taskId);
}
if (result.getText() != null) {
Log.e(TAG, taskId + " MLAsrCallBack onResult result is : " + result.getText());
tvText.setText(result.getText());
}
List<MLRemoteAftResult.Segment> words = result.getWords();
if (words != null && words.size() != 0) {
for (MLRemoteAftResult.Segment word : words) {
Log.e(TAG, "MLAsrCallBack word text is : " + word.getText() + ", startTime is : " + word.getStartTime() + ". endTime is : " + word.getEndTime());
}
}
List<MLRemoteAftResult.Segment> sentences = result.getSentences();
if (sentences != null && sentences.size() != 0) {
for (MLRemoteAftResult.Segment sentence : sentences) {
Log.e(TAG, "MLAsrCallBack sentence text is : " + sentence.getText() + ", startTime is : " + sentence.getStartTime() + ". endTime is : " + sentence.getEndTime());
}
}
}
}
}
@Override
public void onError(String taskId, int errorCode, String message) {
Log.i(TAG, "MLAsrCallBack onError : " + message + "errorCode, " + errorCode);
switch (errorCode) {
case MLAftErrors.ERR_AUDIO_FILE_NOTSUPPORTED:
break;
}
}
};
// Upload a transcription task.
private void start(String taskId) {
Log.e(TAG, "start");
engine.setAftListener(asrListener);
engine.startTask(taskId);
}
// Obtain the transcription result.
private Map<String, TimerTask> timerTaskMap = new HashMap<>();
private void startQueryResult(final String taskId) {
Timer mTimer = new Timer();
TimerTask mTimerTask = new TimerTask() {
@Override
public void run() {
getResult(taskId);
}
};
// Periodically obtain the long audio file transcription result every 10s.
mTimer.schedule(mTimerTask, 5000, 10000);
// Clear timerTaskMap before destroying the UI.
timerTaskMap.put(taskId, mTimerTask);
}
</p>
（4） Obtain an audio file and upload it to the audio transcription engine.
Code:
<p style="line-height: 1.5em;">// Obtain the URI of an audio file.
Uri uri = getFileUri();
// Obtain the audio duration.
Long audioTime = getAudioFileTimeFromUri(uri);
// Check whether the duration is longer than 60s.
if (audioTime < 60000) {
// uri indicates audio resources read from the local storage or recorder. Only local audio files with a duration not longer than 1 minute are supported.
this.taskId = this.engine.shortRecognize(uri, this.setting);
Log.i(TAG, "Short audio transcription.");
} else {
// longRecognize is an API used to convert audio files with a duration from 1 minute to 5 hours.
this.taskId = this.engine.longRecognize(uri, this.setting);
Log.i(TAG, "Long audio transcription.");
}
private Long getAudioFileTimeFromUri(Uri uri) {
Long time = null;
Cursor cursor = this.getContentResolver()
.query(uri, null, null, null, null);
if (cursor != null) {
cursor.moveToFirst();
time = cursor.getLong(cursor.getColumnIndexOrThrow(MediaStore.Video.Media.DURATION));
} else {
MediaPlayer mediaPlayer = new MediaPlayer();
try {
mediaPlayer.setDataSource(String.valueOf(uri));
mediaPlayer.prepare();
} catch (IOException e) {
Log.e(TAG, "Failed to read the file time.");
}
time = Long.valueOf(mediaPlayer.getDuration());
}
return time;
}</p>
For more details, you can go to:
l Reddit to join our developer discussion
l GitHub to download demos and sample codes
l Stack Overflow to solve any integration problems
| Orignal Source

Audio File Transcription, for Super-Efficient Recording

Introduction
Converting audio into text has a wide range of applications: generating video subtitles, taking meeting minutes, and writing interview transcripts. HUAWEI ML Kit's service makes doing so easier than ever before, converting audio files into meticulously accurate text, with correct punctuation as well!
Actual Effects
Build and run an app with audio file transcription integrated. Then, select a local audio file and convert it into text.
{
"lightbox_close": "Close",
"lightbox_next": "Next",
"lightbox_previous": "Previous",
"lightbox_error": "The requested content cannot be loaded. Please try again later.",
"lightbox_start_slideshow": "Start slideshow",
"lightbox_stop_slideshow": "Stop slideshow",
"lightbox_full_screen": "Full screen",
"lightbox_thumbnails": "Thumbnails",
"lightbox_download": "Download",
"lightbox_share": "Share",
"lightbox_zoom": "Zoom",
"lightbox_new_window": "New window",
"lightbox_toggle_sidebar": "Toggle sidebar"
}
Development Preparations
For details about configuring the Huawei Maven repository and integrating the audio file transcription SDK, please refer to the Development Guide of ML Kit on HUAWEI Developers.
Declaring Permissions in the AndroidManifest.xml File
Open the AndroidManifest.xml in the main folder. Add the network connection, network status access, and storage read permissions before <application.
Please note that these permissions need to be dynamically applied for. Otherwise, Permission Denied will be reported.
Code:
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
Development Procedure
Creating and Initializing an Audio File Transcription Engine
Code:
Override onCreate in MainActivity to create an audio transcription engine.
private MLRemoteAftEngine mAnalyzer;
mAnalyzer = MLRemoteAftEngine.getInstance();
mAnalyzer.init(getApplicationContext());
mAnalyzer.setAftListener(mAsrListener);
Use MLRemoteAftSetting to configure the engine. The service currently supports Mandarin Chinese and English, that is, the options of mLanguage are zh and en.
Code:
MLRemoteAftSetting setting = new MLRemoteAftSetting.Factory()
.setLanguageCode(mLanguage)
.enablePunctuation(true)
.enableWordTimeOffset(true)
.enableSentenceTimeOffset(true)
.create();
enablePunctuation indicates whether to automatically punctuate the converted text, with a default value of false.
If this parameter is set to true, the converted text is automatically punctuated; false otherwise.
enableWordTimeOffset indicates whether to generate the text transcription result of each audio segment with the corresponding offset. The default value is false. You need to set this parameter only when the audio duration is less than 1 minute.
If this parameter is set to true, the offset information is returned along with the text transcription result. This applies to the transcription of short audio files with a duration of 1 minute or shorter.
If this parameter is set to false, only the text transcription result of the audio file will be returned.
enableSentenceTimeOffset indicates whether to output the offset of each sentence in the audio file. The default value is false.
If this parameter is set to true, the offset information is returned along with the text transcription result.
If this parameter is set to false, only the text transcription result of the audio file will be returned.
Creating a Listener Callback to Process the Transcription Result
private MLRemoteAftListener mAsrListener = new MLRemoteAftListener()
After the listener is initialized, call startTask in AftListener to start the transcription.
Code:
@Override
public void onInitComplete(String taskId, Object ext) {
Log.i(TAG, "MLRemoteAftListener onInitComplete" + taskId);
mAnalyzer.startTask(taskId);
}
Override onUploadProgress, onEvent, and onResult in MLRemoteAftListener.
@Override
public void onUploadProgress(String taskId, double progress, Object ext) {
Log.i(TAG, " MLRemoteAftListener onUploadProgress is " + taskId + " " + progress);
}
@Override
public void onEvent(String taskId, int eventId, Object ext) {
Log.e(TAG, "MLAsrCallBack onEvent" + eventId);
if (MLAftEvents.UPLOADED_EVENT == eventId) { // The file is uploaded successfully.
showConvertingDialog();
startQueryResult(); // Obtain the transcription result.
}
}
@Override
public void onResult(String taskId, MLRemoteAftResult result, Object ext) {
Log.i(TAG, "onResult get " + taskId);
if (result != null) {
Log.i(TAG, "onResult isComplete " + result.isComplete());
if (!result.isComplete()) {
return;
}
if (null != mTimerTask) {
mTimerTask.cancel();
}
if (result.getText() != null) {
Log.e(TAG, result.getText());
dismissTransferringDialog();
showCovertResult(result.getText());
}
List<MLRemoteAftResult.Segment> segmentList = result.getSegments();
if (segmentList != null && segmentList.size() != 0) {
for (MLRemoteAftResult.Segment segment : segmentList) {
Log.e(TAG, "MLAsrCallBack segment text is : " + segment.getText() + ", startTime is : " + segment.getStartTime() + ". endTime is : " + segment.getEndTime());
}
}
List<MLRemoteAftResult.Segment> words = result.getWords();
if (words != null && words.size() != 0) {
for (MLRemoteAftResult.Segment word : words) {
Log.e(TAG, "MLAsrCallBack word text is : " + word.getText() + ", startTime is : " + word.getStartTime() + ". endTime is : " + word.getEndTime());
}
}
List<MLRemoteAftResult.Segment> sentences = result.getSentences();
if (sentences != null && sentences.size() != 0) {
for (MLRemoteAftResult.Segment sentence : sentences) {
Log.e(TAG, "MLAsrCallBack sentence text is : " + sentence.getText() + ", startTime is : " + sentence.getStartTime() + ". endTime is : " + sentence.getEndTime());
}
}
}
}
Processing the Transcription Result in Polling Mode
After the transcription is completed, call getLongAftResult to obtain the transcription result. Process the obtained result every 10 seconds.
Code:
private void startQueryResult() {
Timer mTimer = new Timer();
mTimerTask = new TimerTask() {
@Override
public void run() {
getResult();
}
};
mTimer.schedule(mTimerTask, 5000, 10000); // Process the obtained long speech transcription result every 10s.
}
private void getResult() {
Log.e(TAG, "getResult");
mAnalyzer.setAftListener(mAsrListener);
mAnalyzer.getLongAftResult(mLongTaskId);
}
References:
To learn more, please visit:
HUAWEI Developers official website
Development Guide
Reddit to join developer discussions
GitHub or Gitee to download the demo and sample code
Stack Overflow to solve integration problems
Follow our official account for the latest HMS Core-related news and updates.
Original Source

Better Mobile App

welcome

How to Convert Audio File into Text With Machine Learning - Huawei Developers

Thanks for sharing!

Related

Product Visual Search – Ultimate Guide

How to Integrate HUAWEI Drive Kit

Integrating Text to speech (TTS) in B4A Platform

How a Programmer Used 300 Lines of Code to Help His Grandma Shop Online with Voice Input

Audio File Transcription, for Super-Efficient Recording

Categories

Resources