Yunfeng’s BLOG: Breakpoint single-step tracking is an inefficient debugging method

The interactive debugger that single-stepped through breakpoints was a major invention in the history of software development. But I think that, like the graphical interface, it sacrifices efficiency to lower the learning threshold. Essentially an extremely inefficient method of debugging.

When I was young (more than ten years of development experience before 2005), I relied heavily on this kind of debugger, and carefully used each version from Turbo C to Visual C++. It’s only natural that practice makes perfect any tool after ten years of use. I think I can use this kind of tool to efficiently locate bugs at will. But after moving to cross-platform development after 2005, perhaps because I couldn’t find a suitable graphics tool on the Linux platform at first, I had some time to reflect on my debugging methods. GDB is powerful, but the graphical interactive shell at the time was not as complete as it is today. At that time, the mainstream insight ddd had some minor problems, and it was not very easy to use. I started to change the way I usually do development. In addition to improving the quality of your own code as much as possible: writing concise and obviously no problem code, you should use continuous code review (Code Review) and consciously increase log output to locate bugs.

Later, the focus of development was gradually shifted from client-side graphics development to the server, which revealed the disadvantage of using a debugger to interrupt program execution. For software with C/S structure, it is very difficult to interrupt the running of the code on one side and track the operation in a single step with the frequency of human interaction, while the other side operates at the frequency of machine interaction. It is very difficult to keep the software running process normally.

In these years of work, I have slowly added some development work under Windows. I’ve found that after another decade of training, even the occasional interactive debugger doesn’t have much of an advantage. Often the finger presses the mechanical operation on the tracking and debugging button, but what is in mind is not the code on the screen that you see in front of you. Often, it is not executed to the position where the bug is triggered, and it has suddenly dawned on the wrong place. There are many such things, and it is natural to question the past methods, what caused the inefficiency of the debugger.

Sometimes chat with people about how to locate bugs. I always say, half-jokingly, just open the editor and stare at the code. After staring for a long time, the bug will naturally be highlighted. It’s a joke, but in my opinion, no debugging method can compare to Code Review. Whether it is the code written by yourself, or the code of others who intervene halfway. The first priority is to understand the general structure of the program.

Programs are always composed of small code segments that are executed sequentially, supplemented by branching structures. A sequentially executed code segment is very stable, and the input state of its code segment entry determines the output result. What we care about is what the input state is. Most of the time, we can skip the process and see the result directly. Because such a piece of code, no matter how long, has a unique execution flow. The existence of the branch structure will cause the execution flow to do different data processing according to different intermediate states. All branch points need to be considered when considering correctness of code. What conditions cause the code to go to this branch, and what conditions cause the code to go to that branch. It can be said that the number of branches determines the complexity of the code. Now the more mainstream method of measuring code complexity McCabe code complexity is roughly like this.

The overall McCabe complexity of a piece of software must be far beyond what the human brain can handle at one time. But usually we can divide the software into modules, and the structure of high cohesion and low coupling can reduce the complexity of the software. A highly cohesive module can be isolated from the outside so that we can focus on the inside of the module for analysis. When the size of the focus code is small enough, all processes including all branches can be processed by the brain at once. For observing the execution flow of the program with the aid of a debugger, each execution process driven by real input data must run along a unique path. In order to locate the bug, we need to design the input state that can trigger the bug. For a local module, this is not always easy. But relying on the brain to analyze a module is different. When McCabe’s complexity is not high, almost all execution paths can be processed in parallel. That is, while you scan the code, your brain is analyzing all possible scenarios at the same time, while also pruning less important branches. Of course, like all skills, the speed of analysis, the width (complexity) of the analysis, and the correctness of pruning require repeated training to expand. Too much reliance on interactive debugging tools will affect this kind of training. The brain is affected by the tool and will be more concerned about the current state: where is it running now, (in order to improve debugging efficiency) where to set the next breakpoint, and now this set of variables What is the value of . . . and doesn’t care much: how the program would behave if the input were otherwise. Because the tool has cut out these processes that didn’t happen, waiting for you to design another set of inputs to show you next time.

Interactive debugging tools usually lack the ability to look back, that is, they usually reflect the current state rather than record the past. Some of this can be improved by improving the tool, some cannot. A common scenario is that you set the position of the next breakpoint. When the debugger stops and finds that the status is abnormal, you can only determine that the problem lies between the last breakpoint and the current position, but you want to go back to the end. What happened, what was some intermediate state, and the tools were powerless. If you rely on the brain to deduce the running process of the program, everything is a static map. There is not much difference between going back and forward, but focusing on a certain position on the time axis. That’s why a well-trained programmer can see where a bug is at a glance, while an expert debugger needs to run it two or three times to find it.

Running a program correctly in the brain certainly requires enough training, much harder than training to use a debugger, but it’s worth it. I don’t know if other students have similar experiences: when I participated in the informatics competition in middle school, the exam papers were not all programming questions, especially in the preliminary stage, which were usually paper exam papers. output the result. Thanks to this experience, I had to do this kind of training when I first learned to program. When I was in junior high school, the time I could touch the real machine was calculated by the hour, and most of the time was still in traditional studies. In order to write the game program that I play, I can only secretly write code in the notebook during class. After I finish writing, if I don’t leave the get out of class, I will run a simulation in my brain to see if there are any bugs. If I can correct it before I get on the computer, I can make more effective use of the limited computer time every day. These experiences made me think that reading code is actually not that boring, and it is a way to improve efficiency.

Using Code Review as a primary means of locating bugs can encourage you to write less complex (and less error-prone) programs. Because you know where the limits of complexity your brain can process at once with your current capabilities. In terms of branch reduction, I watched an interview with Linus. He talks about code taste, citing a small example: a program that handles linked lists. The head of the linked list is usually different from the structure in the middle. The nodes outside the head have a next pointer to refer to the next node, and the head node is an exception, which is referenced by a different data structure. In the negative example listed by Linus, the code determines whether the head pointer is empty; in the positive example, the next pointer is implemented with a pointer reference variable. For the head node, it refers to a different data structure variable, so It avoids one more exception (for the head node) judgment. Code can be handled consistently. In that small fragment of only 5 or 6 lines of code, it seems that the judgment semantics are very clear, and one more judgment is trivial, but Linus emphasizes that this is a matter of taste. In my opinion, this is really about increasing the reduction of code complexity into the instinct of writing code.

You have no control over the quality of the code for someone else’s project that gets in the way. But long-term Code Review training can help you quickly segment software modules. Usually, you need to use your knowledge of the relevant field, and the usual design patterns of similar software, to preset the possible module division of the software. This process requires an understanding of the domain and should not get too bogged down in code implementation details. The general running process of running the software first by opening the debugger as soon as you get started is a method that I do not recommend. In this way, the field of view is too narrow, and it took a lot of time to observe only part of it. In fact, there is no need to be obsessed with top-down or bottom-up. You can first take a rough look at the file structure of the source code to make a guess on the module division, then choose a module at random, find the related parts, and then follow the clues. For projects that need to be built, the time to find out the program context can even be completed synchronously at the first time of waiting for the compilation and construction, instead of waiting for the completion of the construction to track and run step by step, even without downloading the code to the local, github is friendly ‘s web interface can be read comfortably in a browser, and an ipad can be done comfortably in bed.

One of the reasons I don’t like C++ so much is that C++ code is read from a part and it’s hard to have a unique explanation. The literal meaning of its code is likely to correspond to a variety of practical operational meanings, and the certainty is insufficient. Function name overloading and operator overloading are hidden from local code. Even if you see a variable name, if you don’t read the context and header files at the same time, it is difficult to determine whether it is a local variable or a class member variable (the scope of influence of the former is very different from that of the latter, and when the brain is analyzing The pruning strategy is completely different); when I saw a variable, I thought it was an input value, until I saw it at the end, I found that it could also do output, and when I looked back at the function declaration, it was actually a reference. If you use template generics, it’s even more terrible, and even the data type is uncertain. The local code alone cannot tell what the associated operations do after the template is instantiated. Reading C++ projects often requires cross-referencing between codes, which adds too much burden to the brain.

So, is it enough to rely on the brain Code Review? If I can improve my ability infinitely, I think it is possible. Through the accumulation of experience, the complexity of the code that I can directly index and read in these years is significantly higher than in previous years. But there are always times when people are out of reach. At this time the best way is to add log output as a secondary means.

Imagine what we actually want to know when we use interactive debugging tools? It is nothing more than the running path of the program, whether it has really come here, and when the program runs here, what is the state of the variable, and whether there is any abnormality. The log output is actually doing the same job. A line of log is output on the critical path, which can express the running path of the program. By outputting important variables in the log, you can query the running status of the program at that time. How to effectively output logs is naturally a skill that needs to be trained. Don’t worry too much about the impact of log output on performance, the performance fluctuation of the final software around 20% is insignificant for the maintainability of the software.

Compared with plug-in debugging tools, the log has a good backtracking query capability. As an aid to Code Review, what our brain actually needs is a correction to judgment: to confirm whether the program is traveling along the route simulated in the brain, and whether the internal state is consistent and normal. Unlike debugging tools, logs will not interrupt the running process, which is more important for software where multiple programs run in parallel, such as systems with C/S structure.

In fact, preserving state information is also a very important skill in interactive debugging tools. I believe that many people, like me, sometimes add some temporary global variables when debugging programs, and write some intermediate states to these variables. It is occasionally necessary to view these status values ​​during interactive debugging. This temporary state temporary storage variable actually also acts as a log function.

The advantage of text logs is that text processing tools can be used for secondary information extraction. grep awk vim python lua are all good tools for analyzing logs. If the log is huge and exists on a remote machine, you may not find a more efficient and quick way. In many cases, the cost of constantly re-running a buggy program is far more than analyzing the log after getting a detailed log in one run.

So, is it important to learn to use interactive debugging tools? I think it’s still important. Occasionally, it can work wonders. Especially when the program crashes, attach to the process to observe the state of the crash. Most operating systems can also dump the process state at the time of the crash for later analysis. These all require you to use debugging tools. But through the static state of the grass-grey snake line to deduce what happened before the crash, it also requires a sufficient understanding of the code itself. Since I don’t have much time to use it, I think gdb from the command line will suffice. The command-line version is more flexible and has a wider range of applications in analyzing corrupted stack frames and writing scripts to analyze some complex data structures. The inconvenience in interaction and the increased learning cost are acceptable.

The Links:   6MBI25J-120 NL6448BC28-01