industry

The experience of using Google Cloud’s Text-to-Speech AI

Just using the Python API of Text-to-Speech AI to transform a PDF file to mp3 audio, as the example:

from google.cloud import texttospeech
from PyPDF2 import PdfReader

client = texttospeech.TextToSpeechClient()

reader = PdfReader("xxx.pdf")

voice = texttospeech.VoiceSelectionParams(
    language_code="cmn-CN", name="cmn-CN-Wavenet-B", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)

audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3,
    speaking_rate=0.8,
)

text = ""
index = 1
# try first 10 pages
for page in reader.pages[:10]:
    text += page.extract_text()

print(len(text))
synthesis_input = texttospeech.SynthesisInput(text=text)
response = client.synthesize_speech(
    input=synthesis_input, voice=voice, audio_config=audio_config
)

with open("outout.mp3", "wb") as out:
    out.write(response.audio_content)
    print("Written")

Very simple, right? But it just reported an error:

google.api_core.exceptions.InvalidArgument: 400 Either `input.text` or `input.ssml` is longer than the limit of 5000 bytes. This limit is different from quotas. To fix, reduce the byte length of the characters in this request, or consider using the Long Audio API: https://cloud.google.com/text-to-speech/docs/create-audio-text-long-audio-synthesis.

It seems the request is too long. Let’s use the “Long Audio API”:

from google.cloud import texttospeech
from PyPDF2 import PdfReader

client = texttospeech.TextToSpeechLongAudioSynthesizeClient()

reader = PdfReader("xxx.pdf")

voice = texttospeech.VoiceSelectionParams(
    language_code="cmn-CN", name="cmn-CN-Wavenet-B", ssml_gender=texttospeech.SsmlVoiceGender.MALE
)

audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.LINEAR16,
    speaking_rate=0.8,
)

text = ""
index = 1
for page in reader.pages[:10]:
    text += page.extract_text()

print(len(text))
synthesis_input = texttospeech.SynthesisInput(text=text)
request = texttospeech.SynthesizeLongAudioRequest(
    parent="projects/robin-00000/locations/us",
    input=synthesis_input, voice=voice, audio_config=audio_config,
    output_gcs_uri="gs://robin_tts/xxx.mp3"
)

operation = client.synthesize_long_audio(request=request)
result = operation.result(timeout=300)
print(result)

It couldn’t work still:

google.api_core.exceptions.InvalidArgument: 400 The long audio API does not support the language zh. Supported languages: en, es.

Okay. It doesn’t support the Chinese language. Then, what should I do if I want to translate a Chinese pdf to mp3? Convert them page by page into 500 mp3 files? This is terrible. Even for the short mp3 it generated, it definitely sounds like a machine, not a human.

Google has the state-of-the-art technology of deep learning but some of their products in the cloud are ridiculously hard to use (such as Vertex AI, and this Text-to-Speech).

After some searching (at least Google search is perfect as before), I found this NaturalReader. Surprisingly, it supports the Chinese language and the voice is as well as a real human. The only problem is it is very expensive for individual users.

Google Cloud Summit 2019

Yesterday I joined the Google Cloud Summit 2019 in Sydney.
The meeting place is quite huge. And there are lot of booths from different partners of Google Cloud.

The keynote was quite abstract and a little boring, so I chose to walk around different booths for fun. Here are some useful conversations and information I collected:
[HashiCorp: company for Terraform and Nomad]
Q: How short could Terraform support a new service in cloud provider, such as Lake Formation in AWS?
A: It depends on requirements from users.
Q: Only users who paid for enterprise version of Terraform?
A: No. All users, include who use open source version, will be considered to support a new service in cloud. We published new version of Terraform quarterly, although we can’t make sure every new services in this quarter will be included.
[Confluent: company for Apache Kafka]
Q: Could I use ksql in Kafka as standard SQL?
A: Currently, no. ksql only supports self-defined syntax in Kafka. It looks really like SQL, but it’s actually another language.
Q: Could I use ksql to access a table in MySQL?
A: Yes. You can export a table in MySQL to be a ‘kstream‘. Then you can use ksql to access this ‘kstream’.
[Tableau: you know what I mean…]
Q: What are the new updates for Tableau in recent one year?
A: We published a new function called ‘Ask Data’. You can type in query with natural language, and it will translate them to tableau query, by using state-of-the-art NLP technologies.
After I type in some query like ‘avg price in Manly’, it worked very well. But if I type in query like ‘top 5 price near Chatswood’, Tableau failed to get the right answer.
A: You know, NLP is really hard so we only support a range of anonymous for query words.
[elastic: company for ElasticSearch and Kibana]
Q: What’s the biggest cluster of ElasticSearch in production?
A: Well, a lot of big companies use quit big ElasticSearch clusters, such as Netflix, eBay. But we don’t know which one is the biggest because they won’t tell us every details of their clusters 🙂
[Google Cloud]
Q: Is there a product in Google Cloud that could continually import data from MySQL and export them to Cloud Storage or BigTable?
A: Yes. Cloud Data Fusion will be your best choice.
In the booth of ASUS (it produced a lot of chrome books for Google), I noticed the Dev Board which contain a edge-TPU.

The demo use “mobilenet_ssd v2” as the backbone for object detection. Just as my choice.

Technical Meeting with Nvidia Corporation

Last week I went to Nvidia Corporation of Santa Clara (California) with my colleagues to join a technical meeting about cutting-edge hardware and software of Deep Learning.

The new office building of NVIDIA

At the first day, team leaders from Nvidia introduced their developing plan of new hardware and software. The new hardware are about Tesla V100, NVLink, and HGX (next generation of DGX). And the software is about CUDA-9.2 NCCL-2.0 and TensorRT-3.0
Here are some notes about their introducing:

The next generation of Tesla P4 GPU will have tensor-core, 16GB memory, and H264 decoder (performance as Tesla P100) for better inference performance, especially for image/video processing.
The software support of tensor-core (mainly in Tesla V100 GPU) has been integrated into Tensorflow-1.5 version.
The TensorRT could turn three layers of Deep Learning (Conv layer, Bias layer, Relu layer) to one CBR layer, eliminate concatenation layers, to accelerate inference computing.
The tool ‘nvidia-smi’ could show ‘util’ of GPU. But ‘80%’ utility only means this GPU run task (no matter how many CUDA-cores has been used) for 0.8 seconds in one second period. Therefore it’s not an accurate metrics for real GPU load. NVPROF is the much powerful and accurate tool for profiling of GPU

The TITAN V GPU

At the second day, many teams from Alibaba (my company) ask Nvidia different questions. Here are some questions and answers:

Q: Some Deep Learning Compilers such as XLA (Google) and TVM(from AWS) could compile python code to GPU intermediate representation directly. How will Nvidia work with these application-oriented compilers?
A: The google XLA team will be shut off and move to optimize TPU performance only. Nvidia will still focus on a library such as CUDA/cuDNN/TensorRT and will not build frameworks like Tensorflow or Mxnet.

Q: There are many new types of hardware launched for Deep Learning: Google’s TPU, some ASICs developed by other companies. How will Nvidia keep cost performance over these new competitors?
A: ASICs are not programmable. If models of Deep Learning changes, the ASIC will be in the trash. For example, TPU has Relu/Conv instructions, but if it comes to a new type of activation function, it will not work anymore. Furthermore, customers can only run TPU on Google’s cloud, which means they have to put their data on the cloud, without other choices.

The DGX server

We also visited the Demo Room of Nvidia’s state-of-art hardware for auto-driving and deep learning. It was an effective meeting, and we learn a lot.

The car of auto-driving testing platform

I am standing before the NVIDIA logo

Enable audit log for AWS Redshift

When I was trying to enable the Audit Log for AWS Redshift, I chose to use a exists bucket in S3. But it reports error:

"Cannot read ACLs of bucket redshift-robin. Please ensure that your IAM permissions are set up correctly."
"Service: AmazonRedshift; Status Code: 400; Error Code: InsufficientS3BucketPolicyFault ...."

According to this document, I need to change permission of bucket “redshift-robin”. So I entered the AWS Console of S3, click bucket name of “redshift-robin” in left panel, and saw description of permissions:

Press “Add Bucket Policy”, and in the pop-out-window, press “AWS Policy Generator”. Here came the generator, which is easy to use for creating policy.
Add two policy for “redshift-robin”:

The “902366379725” is the account-id of us-west-2 region (Oregon)

Click “Generate Policy”, and copy the generated JSON to “Bucket Policy Editor”:

Press “Save”. Now, we could enable Audit Log of Redshift for bucket “redshift-robin”:

DCTC 2016 conference

Yesterday I went to attend DCTC(Data Center Technology Conference) 2016 in Beijing. Although it is called “Data Center Technology”, most topics is about storage, because the conference is hold by Memblaze, a famous flash-storage startup company in China.
Xuebing Yin, The CEO of Memblaze, gave the first topic:

2016 is a important year for flash-storage because the revenue of SSD become bigger than Hard-disk for the first time. As we can see, data center will become full silicon (Hard-Disk is the only non-silicon component in servers) in the near future. As the speed of SSD and its interface (from SATA to PCIE) become faster and faster, many old softwares become bottleneck of performance: mysql 5.6 can’t use up the performance of high-speed SSD, but mysql 5.7 could.
Janene Ellefson from NVME organization introduced why we need a standard for high-speed data transfer.

Could use up to 64K queues with 64K command for each queue, NMVE is definitely the most powerful protocol for modern (or future) IO devices.
Xin Wu from GBase introduced the problems they face in using SSD for database

GBase is a serial of database products for OLTP/OLAP and global data storage. The SSD benefited the OLTP application, but for OLAP application the SSD is too expensive because the Hard-disk Array could also provide the same bandwidth. Maybe that’s why AWS released new type of EBS months ago.
Coly Li (Yes, my old friend ^_^) from SUSE Labs showed us the improvements of Linux Soft RAID in recent years. Many years ago, Linux soft RAID was used only for low speed Hard Disks, so the cost of bad software implement is not significant. But recently, the widely use of SSD expose many bottlenecks in soft RAID, and the developers of open source community commit many patches to improve performance. And, many patches came from Shaohua Li, a sophisticate kernel developer.(He worked first for Intel, and then Fusionio, and now Facebook).

In the tea break, I visited the exhibition of Memblaze.

This is a 1U server built by SuperMicro, it use eight NVME SSD by SFF8639 interfaces in the front. Running Mysql at full speed (4600% CPU usage), the wind run out from back is still not hot. Looks the server of SuperMicro is very effective, and cool 🙂

C++/Java developers needed

I worked in Alibaba Group for more than 9 years. Recently I am working in Alimama, a sub-company of Alibaba Group and has been the biggest Advertisement Publishing Company in China. At present, we need C++/Java developers to build new back-end basic services for our new business.
[Job Description]
Role: C++/Java Developer for storage system or high performance computing
Location: Beijing
Your responsibilities:
1. Building and optimizing the distributed key-value storage system
2. Building and optimizing the distributed computing engine of Linear Regression algorithm
3. Building and maintaining the backend service for Advertisement Publishing System
Skins & experience required:
1. Familiar with storage system or hight performance computing system
2. Strong background about Redis/Rocksdb/Hadoop/Glusterfs
3. Very familiar with one of C/C++/Java/Scala language
4. More than 3 years experience about storage system or HPC as a developer
5. Passionate about new Technologies and wanting to continuously push the boundaries
Any one who is interesting in the job above could send email to my email: haodong@alibaba-inc.com

Book notes about “Amazon Redshift Database Developer Guide”

Although be already familiar with Cloud Computing for may years, I haven’t look inside many services provided by Amazon Web Service. Because my company (Alibaba) has it’s own cloud platform: Aliyun, so we are only allowed to use home-made cloud products, such as ECS(like EC2 in AWS), RDS(like RDS in AWS), ODPS(like EMR in AWS).
These days I have read some sections of “Amazon Redshift Database Developer Guide” on my Kindle at my commute time.
Amazon Redshift is built on PostgreSQL, which is not very popular in China but pretty famous in Japan and USA. The book said that primary key and foreign key are only used for informal and constrains are totally not supported. I guess Redshift do distributed the rows of every tables into different servers in the cluster therefore keeping constrains is almost impossible.
Columnar Storage is used in Redshift because it is a perfect solution for OLAP (OnLine Analytical Processing) in which situation users tends to retrieve or load tremendous of records. Column-oriented Storage is also suitable for compression and will conserve colossal disk space.
The interesting thing is the architecture of Amazon Redshift and Greenplum looks very similar: both distribute the rows, both use PostgreSQL as back-end engine. Greenplum has open-sourced recently, which make common users to build private OLAP platform much easier. This lead a new question for me: if users could build a private cloud on their bare-metal servers very easily (by the software of OpenStack, OpenShift, Mesos, Greenplum etc.), is it still necessary to build their services and store their data into public cloud? Or the only value of public cloud will be maintaining and managing large mount of bare-metal servers?

China Linux Storage & Filesystem 2015 workshop (second day)

Zheng Liu from Alibaba lead the topic about ext4. The most important change in EXT-series filesystem this year is: ext3 has gone, people could only use ext3 by mount ext4 with special arguments in latest kernel (actually, in CentOS 7.0). Encrypt feature has complete in ext4.
Robin Dong (Yes, it’s me) from Alibaba give a presentation about cold storage (Slide is here). We develop distributed storage system based on a small open-source software called “sheepdog“, and modified it heavily to improve data recovery performance and make sure it could run in low-end but high-density storage servers.

Discussion in tea break

Yanhai Zhu from Alibaba (We have done so much works on storage) lead a topic about cache in virtual machines environment. Alibaba choose Bcache as code base to develop a new cache software.
Robin: Why Bcache? Why not flashcache?
Yanhai: I started my work on flashcache first, but flashcache is not profit to the product environment. First, flashcache is unfriendly to sequential-write. Second, it use hash data structure to distributed IO requests at beginning, which will split the cache data in multi-tenant environment. Bcache use B-tree instead of hash-table to store data, it’s better for our requirements.
They use radical write-back strategy on cache. It works very well because the cache sequentialize the write IOs and make backend easy to absorb the pressure peak.
The last topic is lead by Zhongjie Wu from Memblaze, a famous startup company in China on flash storage technology. It’s about NVDIMM, the most hot hardware technology in recent years. A NVDIMM is not expensive, it is only a DDR DIMM with a capacitance. Memblaze has develop a new 1U storage server with a NVDIMM and many flash cards. It contain their own developed OS and could use Fabric-Channel/Ethernet to connect to client. The main purpose of NVDIMM is to reduce latency, and they use write-back strategy(Surely).
The big problem they face with NVDIMM is CPU can’t flush data in its L1 cache to NVDIMM when whole server powers down. To solve this problem, Memblaze use write-combining in CPU multi-cores, it hurts the performance a little but avoid the data missing finally.

clsf2015

All the staff in this CLSF 2015

Articles from other attenders:
https://blogs.oracle.com/linuxkernel/entry/china_linux_storage_and_file1

China Linux Storage & Filesystem 2015 workshop (first day)

The first topic is lead by Haomai Wang from XSKY. He first introduce some basic concepts about Ceph and I did catch this opportunity to ask some questions.
Robin(from Alibaba): Dose Ceph cache all meta-data information (may called “cluster map”) on monitor-nodes so client could fetch data by just one jump in network?
Haomai: Yes. One jump, and comes to the OSD.
Robin: If I use Cephfs, is it still one jump?
Haomai: Still one jump. Although we add MDS in Cephfs, the MDS does not store data or meta-data of filesystem but only to store the context of distributed lock.
Ceph also support samba2.0/3.0 now. In linux, it is recommend to use iSCSI to access ceph storage cluster because it will have to update kernel in clients if we use rbd/libceph kernel modules. Ceph use pipeline model in message processing therefore it is good to Hard Disk but not SSD. In the future, developers will use async-framework (such as Seastar) to refactor the ceph.
Robin: If I use three replications in ceph, will the client write three copies concurrently?
Haomai: No. Firstly, the IO will come to the primary OSD, and then the primary OSD will issue two other replicated IOs to other two OSDs, waiting until the two IOs back, and return to client “the IO is success”.
Robin: Ok, now we still have two jumps….Is it difficult to change OSD to write at the same time so we can make the latency of ceph low?
Haomai: That will not be easy. Ceph use primary OSD to make sure the consistent of writing transaction.
The future developing plan for ceph is de-duplication on pool level. Coly Li(from Suse) said that de-duplication is better to be made on business level instead of block level because the duplicated information has be split in block level. But the developers in ceph community looks still want make ceph to be omnipotent.

Discussion about ceph

Jiaju Zhang from Redhat lead the topic about use cases of ceph in enterprises. Ceph has become the most famous open source storage software around the world and also be used in Redhat/Intel/Sandisk(Low-end Storage Array)/Samsung/Suse.
Next topic is about ScyllaDB and Seastar. Asias He from OSv lead this topic. ScyllaDB is a distributed Key/Value store engine which is written in C++14 code and completely compatible to Cassandra. It could also run CQL (Cassandra Query Language). In the graph, ScyllaDB is 40 times more faster than Cassandra. The asynchronous developing framework in ScyllaDB is called Seastar.
Robin: What’s the magic in ScyllaDB?
Asias: We shard requests to every CPU core, and run with no locks/no threads. Data is zero-copy and use bi-direction queue to transfer messages between cores. The test result is base on kernel TCP/IP network stack but we will use our own network stack in our future.
Yanhai Zhu(from Alibaba): I think the test you do is not fair enough: ScyllaDB is designed to be run in multi-cores but Cassandra is not. You guys should run 24 Cassandra instances to compare with ScyllaDB, not just one.
Asias: May be you are right. But ScyllaDB use message queues to transfer messages between CPU cores, so it avoid atomic-operation and lock-operation cost. And, Cassandra is written by Java, which means the performance will be low when the JVM do garbage- collection. ScyllaDB is written completely by c++ so its performance is much steady.
Last topic today is lead by Xu Wang, the CTO of Hyper ( A startup company in china, works on how to run container like VM).
Hyper means “hypervisor” addd “docker image”. Customers could run docker image on Xen/KVM/Virtualbox now.

clsf2015