On March 16th, Baidu had a press conference in Beijing about its new conversational AI product: Ernie Bot. (you can watch it, dubbed, on YouTube) It was widely anticipated in China, as GPT4 and chatGPT products have been making waves outside China. The Chinese government has banned any foreign-made AI products to be accessed from within China, because, you know, they might say something the party does not like. So a home-made response with Chinese characteristics to the AI craze is greatly needed.
Unfortunately, Baidu’s presentation was not live, but pre-recorded. Li Yanhong (Robin), Baidu cofounder, presented the comprehensive capabilities in five usage scenarios: literary creation, commercial copywriting, mathematical reasoning, Chinese comprehension, and multi-media generation, according to a Twitter user. Even the Q&A session was pre-recorded.
On that introduction day, only selected institutions could try Ernie. And Li Yanhong “assured” the public that “Ernie’s internal test results are technically imperfect, and it is published because there is market demand.” All of these signaled a lack of confidence and Baidu’s stock plummeted by 6.4% on the 16th.
But then, many good reviews came out after the institutions tested the bot. Especially notable were the analysts from the Bank of America, Jefferies, and Citi Bank. So the stock price soared by 15% on the 17th. See the amazing roller coast performance of Ernie here by Bloomberg: “Baidu Soars After Analysts Give Ernie a Thumbs-Up After Test-Run”.
While I was not part of the selected institutions, it all sounds very legit to me, hehehe.
However, as more and more people got access to Ernie, Ernie’s image generation became a hot trending topic on social media. And what’s trending ends up in this newsletter.
Ernie images are hilarious. Not six-fingers hilarious but ‘WTF’ hilarious!!!
Here are some examples:
Hot and spicy beef offal (夫妻肺片, literally “couple’s lungs” (see Deepl or Google Translate ))
Shredded pork with fish flavour (鱼香肉丝, fish flavour shredded pork)
Beggar’s chicken (叫花鸡, call flower chicken) (Chinese slang for a beggar is 叫花, literal translation is called/calling/call it a flower).
Lujiazui (financial center of Shanghai) (陸家嘴, does not mean anything).
You can see more bizarre images in this link. It is in Chinese.
Many were scratching their heads: how can a Chinese AI bot fail to understand Chinese.
Someone decided to try differently. He asked the bot to draw “一可以豆子”, which means nothing in Chinese, but 一 is one, 可以 is can, 豆子 is bean. So Ernie gave him “one can bean”:
Aha! People start to understand, Ernie is translating Chinese to English and uses an image generator that understands only English to get the image.
That created many more examples, like the word “spring” in English has two meaning 1) the season after winter and before summer, 2)an elastic device and 3) a place where water or oil wells up from an underground source. In Chinese these are all very different words, so if you type 弹簧 (an elastic device) and you get 春天 (the season after winter and before summer), it also proves that Ernie was translating Chinese to English.
e.g. an elastic device is not a season:
After many internet jokes and images, finally, on March 23rd, Baidu responded:
We saw the reaction to the images generated by Ernie
1) Ernie is a LLM solely developed by Baidu, the image generator comes from multi-media model ErnieVilG.
2) To train the model, we used all of the internet, which is the industry standard, you will see how the iterations will improve the image generating function, and see our research ability.
Ernie is learning as you are using it, please have confidence in our research ability and product and give us some time, please don’t spread rumours. I also hope Ernie can bring more happiness to everyone.
So some people took the question to Ernie, they asked it if it used Stable Diffusion, Ernie said “Yes”. Ernie is a stupid AI, hahaha. (Stable Diffusion is a text-to-image model that is ‘open source’.)
ERNIE-ViLG 2.0 is online, so you can practice your chinese too… And there is even a scientific paper about it. They do say “the training data contains both Chinese text-image pairs and English text-image pairs translated into Chinese.” No mention of Stable Diffusion though.