I find that sometimes people think of neural networks as “just one of the tools in the machine learning toolbox.” It has advantages and disadvantages, is useful in certain fields, and can help you win Kaggle competitions. Unfortunately, this view is entirely about seeing trees, not forests. Neural network is more than just another classifier. It represents the beginning of a fundamental transformation, which is related to how we develop software. It is Software 2.0.
We are already familiar with software 1.0-they are developed by computer languages (such as Python, C++, etc.). The computer instructions are explicitly specified by the programmer. With the writing of lines of code, the programmer can determine a point in the entire program space that meets the expected behavior. In contrast, Software 2.0 is developed in a more abstract and more difficult language for humans (for example, weights in neural networks). . No one can directly participate in the writing of this kind of code, because it involves a lot of weights (often on the order of a million), and (I tried) it is difficult to write weights directly in a sense.
Instead, we specify goals for the behavior of the program (for example, “match the input and output pairs of the samples in the data set”, or “win the Go game”), and write the skeleton of the program (such as the structure of the neural network), so that In the entire program space, a subset that can be used for searching is determined, and then all our computing resources can be used to search for available programs in this space.
For neural networks, we limit the search to a continuous subset of the program space, and use backpropagation and SGD methods to search, (unexpectedly) this search method is quite effective.
More specifically, software 1.0 compiles artificially designed source code (such as a .cpp file) into a binary file that can work effectively. The source code of software 2.0 usually consists of two parts: 1) a data set that defines the target behavior and 2) a general structure of the given code, but a neural network structure that needs to fill in the details. The process of training a neural network is the process of compiling a data set into a binary file—to get the final neural network. Today, in most practical applications, the structure of neural networks and training systems have been increasingly standardized as a commodity. Therefore, most of the active “software development” work has become organization, increase, adjustment, and cleanup in some form. Labeled data set. This has fundamentally changed our iterative software programming paradigm, dividing the development team into two groups: Software 2.0 programmers (data markers) are responsible for editing and expanding data sets, and another small group of people are maintaining training related Infrastructure and interfaces for analysis, visualization, and annotation.
It turns out that for many problems in the real world, collecting data (more generally, determining the expected behavior) is much easier than writing programs explicitly.
Because of the many benefits of software 2.0 that I will introduce above and below, we are witnessing a major shift in the migration of a large number of codes from software 1.0 to software 2.0 in the industry. Software 1.0 is devouring the entire world, and software 2.0 (AI) is devouring software 1.0.
1. When the transition is in progress
Let us look at examples of specific areas in this transition. We will find that in the past few years, for these areas, we have given up trying to solve complex problems by explicitly writing code, instead, we have turned to software 2.0.
Image recognition: Before image recognition is usually composed of feature engineering, only a little machine learning (such as SVM) is added at the end.
Later, by using larger data sets (such as ImageNet) and searching in the convolutional neural network structure space, we found more powerful visual features. Recently, we no longer even believe in our handwritten network structure, and started to search (optimal network structure) in a similar way.
Speech recognition: The previous speech recognition work involved a lot of preprocessing, Gaussian mixture model and hidden Markov model, but now, almost only a neural network is needed. There is also a funny quote that is very related to it. Fred Jelinek said in 1985: “Whenever I fire a linguist, the performance of my speech recognition system will improve a bit.”
Speech synthesis: Historically, speech synthesis has always used various splicing technologies, but now, large convolutional networks of the SOTA (State Of The Art) type (such as WaveNet) can directly generate the original audio signal output.
Machine translation: Before the realization of machine translation, phrase-based statistical methods were often used, but neural networks are quickly occupying a dominant position. My favorite network structure is related to multilingual training: a model can translate any source language into any target language, and it is learned unsupervised (or only requires weak supervision).
Game: For a long time, people used handwritten Go programs to play against each other, but now, AlphaGo Zero (a convolutional network that observes the original state of the board and fights against each other) has become the strongest player in the Go field. I predict that in other fields, such as DOTA 2, StarCraft, there will be similar results.
Database: More traditional systems that are outside of the AI field are also showing early signs of transition to software 2.0. For example, in the “learning case of index structure”, neural network is used to replace the original data management core components, and its speed is up to 70% faster than the cache-optimized B-tree, and it also saves an order of magnitude of memory.
You may have noticed that many of the links above are done by Google. This is because Google is currently the vanguard of turning a lot of its own code into software 2.0. The sketch drawn by “A Universal Model” is to integrate statistical-based effects originally scattered in various fields into a whole to understand the world.
2. Benefits of Software 2.0
Why are we more inclined to port complex programs to Software 2.0? Obviously, a simple answer is that practice has proved that it works better. However, there are many other reasons why we should choose Software 2.0. Let us look at the benefits of software 2.0 (represented by convolutional neural networks) and software 1.0 (represented by production-level C++ code bases). For software 2.0:
Homogeneous calculation: A typical neural network is composed of only two operations: matrix multiplication and activation function (ReLu). Compared with the instructions in traditional software, it is obviously more complicated and heterogeneous. Since only a very small part of the core code (such as matrix multiplication) is implemented by the software 1.0 method, the correctness/performance verification will be much easier.
More friendly to chips: Because neural networks require relatively smaller instruction sets, as a corollary, it will be easier to implement them on chips, for example, using custom ASIC chips, neuromorphic chips, and so on. When low-energy smart devices flood around us, the world will also change for this. For example, load the pre-trained convolutional network, speech recognition, and WaveNet speech synthesis network into a cheap and compact device so that you can use it to connect to other things.
Constant-level running time: For each forward iteration of a typical neural network, the amount of calculation (FLOPs) required is highly consistent. The various execution branches that appear in your handwritten complex C++ code do not exist in Software 2.0. Of course, you may have dynamic graph requirements, but the execution flow is usually strictly limited. Even in this case, we can almost guarantee that we will not fall into an unexpected infinite loop.
Constant-level memory consumption: related to the above point, because there is no need to dynamically allocate memory, it is almost impossible to swap with the hard disk, and there is no possibility of memory leaks in the code.
Highly portable: Compared with traditional binary files or scripts, a series of matrix multiplication operations are easier to run in various computer environments.
Agile development: If you are writing C++ and someone wants your development speed to be increased by 2 times (at the expense of performance), then adjusting the system to adapt to new requirements is not a trivial matter. However, in software 2.0, we only need to remove half of the path (in the calculation graph) and then retrain, and we can get a result that is slightly less accurate, but twice as fast as the training. This is amazing. Conversely, as long as you get more data and computing power, you can immediately get better practical results by expanding the calculation graph and retraining.
Fusion of modules for optimality: Ordinary software is usually decomposed into multiple modules, and each module communicates through shared functions, APIs, or end-to-end. However, for software 2.0, if the two interacting modules are trained independently at the beginning, it will be easy for us to backpropagate in the entire system later. Think about it, if your browser can automatically design the low-level instructions to increase the speed of loading the page; or the computer vision library you import (such as OpenCV) can automatically adjust the behavior according to your specific data; how wonderful it will be. In Software 2.0, these are basic operations.
Better than you: Finally, and most importantly, in many vertical fields, the code generated by neural networks is better than the code written by you or me. For now, at least this is true in the fields of image, video, and voice.
3. Disadvantages of Software 2.0
Software 2.0 also has some disadvantages. When the optimization is completed, we can get a huge network that is very effective in practice, but it is difficult to explain why it is effective. In many fields, we can choose a model that is well understood but only has 90% accuracy; or we can choose a model that does not understand but has 99% accuracy.
Software 2.0 will have unintuitive, embarrassing errors, or even worse, “silent errors.” For example, if biased data is silently adopted during training, it becomes very difficult to analyze and check the reasons when the amount of data reaches millions.
Finally, the strange features of Software 2.0 are constantly emerging. For example, the existence of adversarial samples and attack samples makes the inexplicability problem of Software 2.0 more prominent.
4. Software 2.0 programming
The code of software 1.0 is our handwritten code. The code of Software 2.0 is optimized based on evaluation criteria (such as “correctly classify training data”). For those programs whose principles are not obvious, but whose performance can be evaluated repeatedly, are suitable for this transformation, because the code found by the optimization method is much better than the code written by humans.
Vision is very important. When you realize that neural networks are not just a useful classifier for machine learning tools, but software 2.0 as a budding new programming paradigm, it becomes obvious what can be extrapolated, and there is a lot of work. Can do it.
Specifically, we have invented a large number of tools to assist programmers in software 1.0 development, such as a powerful IDE, which can have many functions, such as syntax highlighting, debugger, profiler, symbol jump, integrated git, and so on. In Software 2.0, programming work has become the accumulation, adjustment, and cleaning of data sets. For example, when the neural network fails in some extreme cases, we will not fix the problem by writing code, but import more data in this case.
Who will develop the first software 2.0 IDE? It should be able to play a role in all workflows related to data sets, including accumulating data, visualization, cleaning data, marking data, and production data. Perhaps this kind of IDE will pick up images suspected of being mislabeled by the network according to the loss of each sample, or assist in labeling the data by predicting the label that should be selected, or according to the uncertainty of the network prediction, it is recommended Sample suitable for labeling.
Similarly, Github is a very successful website in the software 1.0 era. Is it possible that Github in the software 2.0 era will appear? In the software 2.0 era, warehouses will be data sets, and commit will consist of adding and editing data tags.
Traditional package management tools and deployment methods, such as pip, conda, docker, etc., help us deploy and install software more easily. In the software 2.0 era, how to deploy, share, import and run software more effectively? What would be the equivalent of conda in a neural network?
In short, software 2.0 will become increasingly popular in areas where low-cost repeated evaluations are possible and algorithms are difficult to explicitly design. When we consider the entire development ecology and how to adapt to this new programming paradigm, we will find many exciting opportunities. In the long run, this programming paradigm has a bright future, because it is increasingly confirmed that when we develop general artificial intelligence one day, we must use software 2.0.