A month ago, on one occasion, a senior engineer at Google was invited to speak to us. When asked what it was like for her to work at Google, she said:

1 个月前,在某个场合中,一位谷歌的高级工程师被邀请来给我们讲话,当被问到她在谷歌工作是什么样时,她说:

These days, things are changing very fast here.


Considering that the industry is rapidly being affected by AI, since the popularity of chatGPT, Midjourney, and Stable Diffusion, basically all products are currently or will be updated in the future and are related to AI. So I also want to sort out and look forward to future development trends.

考虑到行业正在迅速地被 AI 所影响着自从 chatGPT,Midjourney,Stable Diffusion 的火爆,基本所有的产品目前或者下一步更新都和 AI 有关。所以我也想梳理和展望一下未来的发展趋势。

Half a year ago, people were still teasing how distorted the hands of characters in AI drawings were. However, today, we can see some images that are comparable to photography. Perhaps one of the differences is that: AI image is too smooth and perfect.

半年前,人们还在调侃 AI 画的人物手有多扭曲。 然而今天,我们可以看到一些堪比摄影的图像, 也许差异之一是:AI 图像过于流畅和完美。

In terms of images (图像处理方面)

It is now possible to embed text into natural images here and there , which is perfect for posters or advertisement. This may takes a long time for human designers to design and create. We can even make some short animations to make the pictures move (check here and here). Such an achievement is not difficult to expect before: a video is made of pictures frame by frame. Now AI can generate pictures, how far are we from videos and animations?

现在我们可以将文本嵌入到自然图像中(此处此处),非常适合海报或广告。 这可能需要人类设计师花费很长的时间来设计和创造。我们甚至可以制作一些简短的动画(查看此处此处)。这样的成就并不难预料到:视频是由一帧一帧的图片组成的。现在 AI 可以生成图片了,我们离视频、动画还有多远?

In terms of speech (语音方面)

today we can already see that AI can imitate a person's tone of voice. In terms of computer vision, we can now easily make a person's mouth make various mouth shapes. So with the combination of these technologies, not surprisingly, we have such a tool: a person speaks in a video in English, and then through AI, the content of his speech, the characteristics of his voice, and even the shape of his mouth have become speaking another language. Of course, if we want to truly understand and integrate into a place and culture, we will still learn the language, but imagine how much less inconvenient this will be for a monolingual person.

今天我们已经可以看到人工智能可以模仿人的语气。在计算机视觉方面,我们现在可以轻松地让一个人的嘴巴做出各种嘴型。那么有了这些技术的结合,不出意外,我们就有了这样一个工具:一个人用英语在视频中说话,然后通过 AI,得到他说话的内容,他声音的特征,甚至他嘴的形状 已经变成说另一种语言了。 当然,如果我们想真正理解并融入一个地方和文化,我们仍然会学习语言,但想象一下这对于只会一种语言的人来说会减少多少不便。

In terms of hardware (硬件方面)

we have seen IT giants entering space computing and the metaverse. Yesterday I saw Meta Glasses, which can help us immediately determine how long it takes the barbecue to finish, whether there is any foul play during sports, and landmarks information. Relevantly, chatGPT’s latest image input feature can guide us on how to repair bicycles. Think about what this will look like if it is placed on glasses.

我们已经看到 IT 巨头进入空间计算和元宇宙的布局。昨天我看到了 Meta Glasses,它可以帮助我们立即判断面前的烤肉要烤多久,运动过程中是否有犯规行为,以及地标信息。 与此相关的是,chatGPT 最新的图像输入功能可以指导我们如何修理自行车。 想象一下如果把它戴在眼镜上会是什么样子。

So we can imagine a future like this (未来展望) :

  1. In the future, everyone can make short movies and animations. It is as easy as writing a blog or post. You can choose the characters (even yourself) and their voice characteristics. (A 2-hour movie or TV series may be as difficult as writing a book, but of course it’s difficult from another perspective). No wonder Hollywood wants to obtain the rights to use the actors’ AI portraits, haha.

未来人人都可以制作短片、动画,就像写博客或帖子一样简单。你可以选择角色(甚至您自己)及其声音特征。(一部 2 小时的电影或电视剧可能和写一本书一样困难,但是这是另一个角度的困难)。难怪好莱坞要获得演员 AI 肖像的使用权,哈哈。

  1. Everyone has a customized Jarvis. It is no longer sci-fi. Put it in glasses, it can teach you how to draw on the white paper in front of you and display the music sheet in front of your face when you play instrument, telling you what to play next when you practice the piano, or teaching you how to cook step by step.


  1. No customer service. Although I still hate the AI's voice and stupidity when I'm on the phone now, I've been asking chatGPT a lot of things. If product descriptions and so on are used for training by large models in the future, then we can solve any product usage problems locally through glasses or mobile phones.

没有客服。虽然我现在打电话时仍然讨厌 AI 的声音和愚蠢,但我已经向 chatGPT 询问了很多事情。如果未来大模型使用产品使用说明等进行训练,那么我们可以通过眼镜或手机在本地解决任何产品使用问题。

  1. Other scenarios. You don’t want to read several pages of a contract word for word, so you take a picture of it and give it to the AI (distrustful though), and it can help you figure out where there are hidden things agains you or unreasonable aspects. Or you are parking on the side of the road, and the complicated instructions make you confused, but AI can directly tell you whether you can park and how long you can park.

其他场景。你不想逐字逐句地阅读几页合同,所以你把它拍下来交给人工智能(尽管存在不信任问题),它可以帮助你找出哪里有对你不利或不合理的地方。或者你在路边停车,复杂的指令让你一头雾水,但AI 可以直接告诉你是否可以停车以及可以停车多长时间。

The current large models are based on the huge amount of data training, and winning by quantity is never the optimal solution. That’s why Sam Altman said,

目前的大型模型都是基于海量数据训练,而以量取胜从来都不是最优方案。这就是为什么 Sam Altman 说,

we’re at the end of the era where it’s gonna be these giant models, and we’ll make them better in other ways.


Ending (结语)

I really want to pick up the knowledge of machine learning and deep learning again when I have free time, and keep up with the current trends, but I always feel that I am unable to do so. Because whether it is papers or products, they are developing too fast.

我很想有空的时候重新拾起机器学习和深度学习的知识,跟上当前的潮流,但总觉得力不从心。 因为无论是论文还是产品,它们发展得太快了。

The picture at the beginning is of a steam engine. Now that steam engines are available, would the industrial revolution be far in the future? We truly live in an age of wonder.
