Design of TCP/IP protocol stack based on single chip ARM

With the rapid development of computer network technology and Electronic information technology, the use of the Internet has become more and more popular, and more and more non-PC devices such as information appliances and smart meters can be connected to Iriternet. Also getting bigger.

Authors: Liao Rikun, Ji Yuefeng, Huang Xiaoxun

With the rapid development of computer network technology and Electronic information technology, the use of the Internet has become more and more popular, and more and more non-PC devices such as information appliances and smart meters can be connected to Iriternet. Also getting bigger.

Electronic devices are connected to the Internet. There are various solutions: run the tailored TCP/IP protocol stack on the 51 series MCU; use some ASIC chips to implement TCP/IP, such as the Internet Modem introduced by Analog Devices; or use the TCP that comes with the embedded operating system /IP protocol stack. In some areas that do not require high network speed, the single-chip microcomputer can be used to realize TCP/IP; in the case of high performance requirements, the latter two schemes can be selected.

1 The hardware structure of embedded TCP/IP

Fig. 1 is the hardware structure of embedded TCP/IP system. Among them, CS8900A is the network controller of Cirrus Logic. The frame filter has been set inside the chip to automatically discard invalid frames, so as to reduce the CPU load and improve the access efficiency of the CPU to the network. The working mechanism of CS8900A is mainly by setting the value of each internal register, and then it can start to work automatically. In the network interface part, since it is an RJ45 interface, an E2023 transmission line transformer must be used to convert the signals in the network.

Design of TCP/IP protocol stack based on single chip ARM

Usually the TCP/IP protocol stack needs a lot of RAM to store the TCP packets that need to be answered. If it is not answered within the specified time, the TCP packet will be resent; it will be released after being answered. In order to reduce the usage of RAM, when the data packet needs to be resent, if the data required by the data packet can be regenerated, the TCP packet that needs to be acknowledged may not be stored.

Because there is a lot of data in the network, it is obviously not efficient to read all the data into the memory and then judge whether the frame should be discarded. Therefore, while reading the data, it is judged without reading the entire frame into memory at the beginning. The relative address of each part in the frame is defined in the program, and each byte of the frame can be easily addressed. This design is based on the consideration of improving access speed.

In the storage method of the frame in the CPU, the PacketRAM variable is defined as the first address of the storage frame. Figure 2 shows the memory division of TCP/IP in the CPU, as well as the definition and relative position of each byte of the frame in the memory.

Design of TCP/IP protocol stack based on single chip ARM

2 Optimal Design of Embedded TCP/IP

TCP/IP generally adopts C language or mixed assembly. Using reentrant functions and general pointers (gellerc pointer) makes the program code larger and slower; when using function pointers, you need to manually rebuild the call tree (calltree), or set the function called by the function pointer as a reentrant function.

2.1 Embedded TCP/IP input and output process

Like PC TCP/IP protocol, embedded TCP/IP adopts protocol layered structure: application layer, TCP layer, IP layer and network device interface layer. Figure 3 describes the flow of input and output packets and the functions that need to be called.

Design of TCP/IP protocol stack based on single chip ARM

When outputting, the TCP layer first checks the unsend queue, finds that it is not empty, and inserts the data packet into the queue; if it is found to be empty, it checks whether the other party’s window is large enough (to be able to receive the data packet). Then, fill in the TCP header information. The IP layer selects the interface of the network device, whether the destination IP and the subnet mask of the interface are “AND” equal to the subnet mask, and then calls the Output function of this interface to send.

On input, the Timer() function calls the Input function of each interface. The IP layer judges the IP version, IP checksum, and whether the data packet should be forwarded, and then passes the packet to the corresponding high-level processing according to the protocol field of the IP header. The TCP layer must determine the TCP checksum, and look it up in the existing socket, determine whether there is a socket that can receive the packet, determine whether the TCP sequence number is desired, and then update the status of the connection (including release The received data packet and the conversion of the TCP state machine, etc.), call the socket’s callback function recv.

2.2 Program Structure of Embedded TCP/IP

The function of Tliner function is to call TCPTimer to process the retransmission of TCP data packets, and call the Input function of each interface to receive the arriving data packets. The Timer function must be called once within a short time (usually 20ms), otherwise functions such as receiving packets and TCP timing will stop.

As shown in Figure 4, the main flow of the program is a large loop. While processing application layer protocols such as sending data packets in the loop, the variable bTimeOut is queried, and bTimerOut is set to true in the timing interrupt. The application layer repeatedly queries whether bTimerOut is true in the program flow. If it is true, it calls Timer(), and then sets bTimerOut to false.

Design of TCP/IP protocol stack based on single chip ARM

When using an embedded operating system, also pay attention to the problem of network device driver functions being reentrant. Take the NE2K Ethernet card as an example, before copying the data packet to the network card cache, you must set the register (such as the start address). If the interrupt occurs after the register is set and re-entry is placed. Then the setting of the register is modified, and the copy will be wrong after the interrupt returns.

2.3 Embedded TCP/IP running speed optimization

The main calculation amount in the TCP/IP sending process is concentrated in three parts: the application program copies the data to the RAM; calculates the TCP checksum; and copies the data packets in the RAM to the sending buffer of the network device. For each byte of data, roughly 12 × 2 = 24 instruction cycles are used for the two copies; 16 instruction cycles are used to calculate the TCP checksum. Using 12MHz crystal oscillator, the maximum network transmission speed is 25KB/s.

In order to increase the speed, you can use a fast CPU or increase the crystal frequency. Also, try to avoid using the Reentrant function. Reentrant type functions are much slower than general functions, but sometimes Reentrant must be used for program structure needs, which requires a choice between speed and structure. The selected methods are: using a memory-specific pointer (memory-specific pointer); simplifying the protocol stack to remove functions that require a large amount of computation but are not very useful. Currently, the TCP timing retransmission time is fixed, and there is no congestion window control and IP layer routing; prevent unnecessary copying of data packets; optimize checksum calculation and memory copy functions.

3 Embedded Realization of TCP/IP

The implementation of TCP/IP protocol is generally embedded in ROM by software, and then connected to a dedicated embedded gateway through network communication technology, runs the TCP/IP protocol, and provides the connection and routing functions from TCP/IP to the user’s light network.

3.1 Memory management method and implementation without redundant data packet copying

The memory management of embedded TCP/IP can use the linked list method, that is, allocate memory blocks of corresponding sizes according to the size of the data packets. As shown in Figure 5, the linked list links the memory blocks, the used field indicates whether the memory block is in use, and pStart and pEnd indicate the start address and end address of the valid data in the data part.

Design of TCP/IP protocol stack based on single chip ARM

When allocating, search the memory linked list to find an unallocated memory block larger than the required space, and intercept the required size. After the memory block is intercepted, there may be more remaining parts. At this time, the remaining part is separated from the original memory block to become a new memory block and inserted into the linked list. On release, set used to false. If the linked list cell pointed to by pNext or pPre is also free, it is merged with itself. to prevent memory fragmentation. To transmit data packets between protocol layers, simply transmit the starting address of the memory block. This memory management method wastes less space, but the amount of computation is relatively large.

3.2 Implementation of reordering, retransmission and window control

Use queue buffering to implement reordering, retransmission and window control. An element of the queue points to a packet, and there is no limit to the maximum length of the queue.

For the whole sequence, use the ooSeq queue. If it is found that the received TCP packet sequence number is not expected, but the sequence number is within the receiving window, the packet cannot be received immediately and should not be discarded, and the packet should be put into the ooSeq queue first. When a desired TCP packet is received, check whether the ooseq queue now has a TCP packet that has become the desired data packet, and if so, take it out and process it.

For retransmission, the unacked queue is used, and each TCP packet that needs to be answered is put into the unacked queue after it is sent, and is not deleted from the queue until it is answered. The TCP retransmission timing is only for the first TCP packet in the unacked queue. If the timing is exceeded, it will be resent; if the number of retransmissions exceeds the specified value, an error will be reported.

For window control, use the unsend queue. If it is found that the other party’s window is too small to receive the data packet, only part of the data will be sent, and the extra part will be put into the unsend queue. When the other party sends a TCP packet to notify the new window size, judge again. Is it possible to send. In the case that the unsend queue is not empty, the data packets to be sent should be inserted into the unsend queue.

3.3 Implementation of Piggyback Reply

The piggybacking response means that when the other party arrives a TCP packet that needs a response, it does not give a response immediately, but waits for a short period of time. If there is data to send during this time, it will be piggybacked to give an acknowledgment, thereby reducing the number of packets sent.

If there is no data to be sent to the other party or the data is not ready, wait for a certain time; if the data is ready within this time, you can use the piggyback response. With piggybacked acknowledgments it is not possible to acknowledge every frame, and an acknowledgment for a frame can be used instead of acknowledgments for all frames preceding that frame.

4 Summary

A large number of embedded systems are 8/16-bit low-speed processors. When accessing the Internet, it is difficult to implement a complete TCP/IP protocol due to the limitation of its own resources. From the perspective of realizing the corresponding functions and saving system resources, this paper makes targeted modular tailoring and optimization design for the protocol. The TCP/IP protocol cluster can be embedded in the single-chip ARM to realize embedded Internet access.

The optimized design of embedded TCP/IP supports multiple TCP connections in the form of sockets; supports multiple network devices; supports sending data packets and data packet forwarding functions through the gateway, and responds to ping commands; Retransmission and window control flow control. Practice has proved that this design method is flexible and can realize many complex functions according to user needs.

The Links:   G649DX1R010 SKM200GB128D SHARP-LCD