Hello everyone, I'm Shi Kan from the Institute of Computing Technology at the Chinese Academy of Sciences, a 'slash technology worker'. I have over a decade of experience in the chip industry, and currently, I conduct academic research related to chips at the Chinese Academy of Sciences; at the same time, I am also a science and technology Bilibili UP host 'Lao Shi Tan Xin', and my viewers call me 'Lao Shi'.
Chips: The Cornerstone of Modern Society
When it comes to chips, everyone surely knows their importance.
Whether it's the currently hot artificial intelligence, life sciences and medicine, autonomous driving, network communications, and so on, almost all the technologies of modern society you can imagine are inseparable from chips—this foundational technology of the information age.
I have been involved in chip work for a long time, and the chip development process is actually a very interesting one, mainly due to two aspects.
Firstly, the applications of chips are extremely wide-ranging. Once you enter this industry, you probably don't have to worry about unemployment because many industries require chip technology.
The second reason might be more important: chip development is a very difficult endeavor. As chip engineers, we need to constantly learn and enrich ourselves to face and embrace this era full of opportunities and challenges.
So the question arises: what exactly makes chip technology so difficult?
Why Are Chips So Difficult?
Completed: 10%//////////
Everyone might know that the manufacturing process of a chip is essentially an evolution journey of a grain of sand. Sand might be something inexhaustible and abundant on this planet; but turning low-value sand into high-value chips adds up to nothing but human intelligence.
Starting from sand, we need to purify it to obtain wafers. Then, through a series of steps such as photolithography, ion implantation, etching, packaging, and so on, the abundant, inexhaustible sand is transformed into the final tiny chip.
So, having said all that, with so many steps, chip manufacturing is actually only part of the entire chip development process; it does not equal chip development itself.
There is another crucial step, which is chip design. It refers to completing the circuit design according to requirements and making the circuit function properly. Then, we hand over the designed circuit to chip manufacturers for the later stages of manufacturing, ultimately obtaining the physical chip.
But there is another question here: how do you ensure that the chip's functionality matches your initial design?
There is an interesting little story here. In 1947, a very famous female programmer named Grace Hopper found her computer wasn't working. After careful investigation and exploration, she discovered that a moth had flown into a relay of the computer. So, she carefully used tweezers to remove this moth and taped it onto a piece of paper.
This might be the first 'bug' discovered in the entire history of computer development, meaning a vulnerability.
If the previous example seems too distant, we actually have more examples. Here's a math problem for everyone: What is the final result of this expression? Actually, this problem is simple because the numerator and denominator in the later part are the same and can cancel out; then the numbers before and after the minus sign are also the same, so subtracting the same number should result in 0. However, in actual computers and chips, the result might not be this.
For example, in an Intel Pentium chip, the result was 255.00000000. What happened? It turned out that when an American scientist was conducting scientific research, he kept getting incorrect calculations when running this expression. Finally, he discovered that there was an undetected design flaw in a floating-point division unit of this chip.
Don't underestimate this design flaw; its consequences were actually very serious. In the 1990s, Intel spent $475 million to recall all problematic Pentium chips worldwide.
So, returning to the earlier question: what exactly makes chip technology so difficult?
In my view, the difficulty of chips lies in the need to succeed on the first try. Making chips is not like software, where you can patch and fix various problems later. In contrast, once a chip completes the evolution journey from sand to chip, you may have already spent tens of thousands, millions, or even hundreds of millions to complete the tape-out and manufacturing, making it very difficult to modify afterwards.
Then, the next question is: how many chip projects today can achieve success on the first try?
The 'Bottleneck' of Chip Verification
Completed: 40%//////////
According to survey data, only 24% of chip projects can achieve success on the first try. That is to say, 3/4 of chip projects, due to various major and minor undetected design flaws, require at least one more tape-out, which consumes a lot of time and money.
Therefore, the key question is: how can we ensure, as much as possible, that chips have as few or no bugs/design flaws as possible before tape-out and manufacturing? This is the direction I have been dedicated to researching over the past few years.
Also according to this research data, throughout the entire chip development process, especially with the current development of AI and various high-tech fields, chips are becoming increasingly complex. As a result, chip verification has become a very high proportion of the entire chip development cycle, even exceeding half, reaching 70% of the entire chip design cycle.
But unfortunately, chip verification is also a difficult task. I list some astronomical numbers here, such as the Earth's circumference, the possible number of stars in the Milky Way, or the length of a light-year.
In chip verification, there is also an astronomical number, which is the number of cycles needed to fully verify a CPU core. What exactly does this astronomical number represent?
If we use the most advanced software simulation technology available today to fully verify a CPU core, it would take at least 15,000 years. Using the most advanced hardware emulation technology can slightly reduce this time to 30 years. But we all know that developing a chip cannot wait 15,000 years, nor can it wait 30 years.
So, what is the essence of the problem? We have actually been researching this over the past few years. We found that in chip verification, there exists a so-called 'impossible triangle', namely the high performance of chip verification, good debugging capability, and low cost; these three factors crucial to chip verification cannot be satisfied simultaneously. For current mainstream research or methods, at most, two out of the three can be achieved, and this is the fundamental reason for the low efficiency of chip verification.
Someone Must Do Something Different
Completed: 60%//////////
Due to these reasons, chip verification has not seen significant development over the past period.
In chip companies, chip engineers may spend more time writing test cases and running regression verification. Essentially, it's dirty and tiring work. The same goes for academia; very few scholars are devoted to chip verification research, especially compared to current hot fields like artificial intelligence, research related to chip verification is very scarce.
So, an academic leader once told me that in the same amount of time, they could publish three or even more papers in the field of artificial intelligence, but in chip verification, they might not even publish one.
Unfortunately, what they said is true.
However, someone must do something different.
Therefore, over the past few years, I have led a team in conducting research related to chip verification and have built an agile verification research system from scratch. The core of this research system is a verification platform called ENCORE, which is based on a special chip—the Field-Programmable Gate Array (FPGA). ENCORE can significantly improve verification efficiency while achieving good verification debuggability.
To build this agile verification research system, on one hand, we need to continuously optimize the efficiency of vulnerability mining, debugging, and repair at the algorithmic level; on the other hand, we also hope to build an end-to-end agile verification acceleration platform based on programmable logic chips (FPGAs). At the application level, we hope this platform can be suitable for both general-purpose processor verification, such as CPUs or GPUs, and specialized chip verification, such as the currently very popular AI accelerators.
Over the past period, we have done a lot of cutting-edge exploratory work in this field, including the aforementioned ENCORE and many new research projects. We have also published these research results at many internationally renowned academic conferences.
We are actually working on some interesting projects afterwards, but since these works have not been published yet, I won't show them to you one by one for now.
Letting More People Understand Chips
Completed: 80%//////////
However, during the research process, I gradually realized that these scientific or academic achievements are mainly for people within our small circle who only understand chip verification and related fields. So, how can we let more people see our work, understand our research, and even participate in our endeavors?
Naturally, I thought of chip science popularization, which also feels very interesting to me. I have been engaged in science popularization for four or five years, starting from text initially to later making videos on Bilibili. Chip science popularization has not only brought me many gains but also helped me meet many like-minded friends, as well as viewers who like and support me.
However, making chip science popularization videos is not a simple task, especially in today's era of short video proliferation. A fellow science popularization blogger and leader told me that in the same amount of time it takes me to produce one long, hardcore chip science popularization video, they could make 10 or even more short videos related to hot topics, and the traffic could be many times greater than mine.
Unfortunately, what they said is also true.
But based on this, I think there still needs to be people who persist in doing difficult things. I hope to combine chip science popularization and chip verification—two equally difficult but equally interesting things—and use video and text formats to show everyone what we have done, the papers we have published, and the open-source chip projects our entire large team is researching.
Besides chips, I will also share hardcore technologies like artificial intelligence and computers with everyone, as well as share my growth experiences, the books I have read, and the knowledge I have acquired. I know that I am not a genius myself, nor am I a so-called all-around expert or guru. I would rather be a 'guide' for everyone, sharing the path I have walked.
So, returning to the question I wanted to share with everyone today: chip research and chip science popularization, which one is more interesting? Of course, for me, both are equally interesting. The reason is simple: because they are equally difficult. At the same time, they both require me to persist very long-term and enduringly.
Many people say we need to do difficult and right things. But the problem actually is: how do you judge if something is right before you do it? If something is seen as sitting on a cold bench in others' eyes, or seen as doing dirty, tiring work, would you still persist in doing it?
Therefore, I prefer to do difficult and long-term things, such as academic research in chip verification, or making long hardcore chip science popularization videos. Because if something is difficult and requires long-term persistence, then it is probably right.
That's all I wanted to share with you today. I am Lao Shi, thank you, everyone!
This article comes from the WeChat public account: Gezhi Lundaotan , Author: Shi Kan, Original Title: 'How Difficult is Chip Making? A Division Error Costs 475 Million Dollars | Shi Kan'




















