How to Rapidly Build and Deploy Adaptable Vision Applications for the Edge With a Production-Ready Platform

The application of artificial intelligence (AI) on edge-based smart cameras has quickly gained acceptance across a growing range of embedded vision applications such as machine vision, security, retail, and robotics. While the rapid emergence of accessible machine learning (ML) algorithms has led to this interest in AI, developers still struggle to meet tight project schedules while delivering high performance at low power for edge-based applications.

Further complicating matters, even newly deployed solutions can rapidly become sub-optimal due to the fast-changing nature of application requirements and continued improvements in evolving algorithms. Developers struggle to future-proof their applications with technology that isn’t scalable in this fast moving environment.

This article introduces a flexible system-on-module (SOM) solution from Xilinx that developers can use to rapidly implement smart camera solutions for edge deployment. It shows how they can more easily adapt those solutions in response to changing needs without compromising key requirements for latency and power.

Accelerating vision application execution

Based on a custom-built Zynq UltraScale+ multiprocessor system-on-chip (MPSoC), Xilinx’s Kria K26 SOM provides a robust embedded processing system comprising a 64-bit quad-core Arm Cortex-A53 application processing unit (APU), a 32-bit dual-core Arm® Cortex®-R5F real-time processing unit (RPU), and an Arm Mali-400MP2 3D graphics processing unit (GPU). The SOM combines the MPSoC with four gigabytes of 64-bit wide double data rate 4 (DDR4) memory and associated memory controller, as well as multiple non-volatile memory (NVM) devices including 512 megabits (Mbits) of quad serial peripheral interface (QSPI) memory, 16 gigabytes (Gbytes) of embedded Multi-Media Card (eMMC) memory, and 64 kilobits (Kbits) of electrically erasable programmable read-only memory (EEPROM) (Figure 1).

How to Rapidly Build and Deploy Adaptable Vision Applications for the Edge With a Production-Ready PlatformFigure 1: The Xilinx Kria K26 SOM combines the extensive processing capabilities of a custom-built Zynq UltraScale+ MPSoC with a trusted platform module 2.0 (TPM2) and dynamic and non-volatile memory. (Image source: Xilinx)

Xilinx complements its processing and memory assets with an extensive programmable logic system comprising 256K system logic cells, 234K configurable logic block (CLB) flip-flops, 117K CLB look-up tables (LUTs), and a total of 26.6 megabits (Mbits) of memory in various configurations of distributed random-access memory (RAM), block RAM, and ultraRAM blocks. In addition, the programmable logic system includes 1,248 digital signal processing (DSP) slices, four transceivers, and a video codec for H.264 and H.265 capable of supporting up to 32 streams of simultaneous encode/decode, up to a total of 3840 x 2160 pixels at 60 frames per second (fps). The SOM’s two 240-pin connectors provide ready access to functional blocks and peripherals through user-configurable input/output (I/O).

This combination of processor cores, memory and programmable logic provides a unique level of flexibility and performance that overcomes key drawbacks in GPUs used for high-speed execution of ML algorithms. Unlike the fixed data flow in GPUs, developers can reconfigure the K26 SOM datapath to optimize throughput and reduce latency. Furthermore, the K26 SOM’s architecture is particularly well-suited to the kind of sparse networks at the heart of a growing number of ML applications.

The K26 SOM’s programmability also addresses memory bottlenecks that both increase power consumption and limit performance in memory-intensive applications, such as ML built with conventional architectures using GPUs, multicore processors, or even advanced SoCs. In any application designed with these conventional devices, external memory typically accounts for about 40% of system power consumption, while the processor cores and internal memory typically account for about 30% each. In contrast, developers can take advantage of the K26 SOM’s internal memory blocks and reconfigurability to implement designs that require little or no external memory access. The result is increased performance and lower power consumption compared to conventional devices (Figure 2).

How to Rapidly Build and Deploy Adaptable Vision Applications for the Edge With a Production-Ready PlatformFigure 2: While systems based on embedded CPUs and typical SoCs require multiple, power-consuming memory accesses to run their applications, systems based on the Xilinx Kria use an efficient vision pipeline that can be designed for significantly fewer, if any, DDR accesses. (Image source: Xilinx)

Along with its high performance, low power consumption, and extensive reconfigurability, the K26 SOM helps ensure security in smart camera designs for sensitive applications. Along with the SOM’s built-in TPM security device, the MPSoC integrates a dedicated configuration security unit (CSU) that supports secure boot, tamper monitoring, secure key storage, and cryptographic hardware acceleration. Together, the CSU, internal on-chip memory (OCM), and secure key storage provide the secure foundation for ensuring a hardware root of trust for implementing secure boot and a trusted platform for application execution.

The extensive capabilities available with the K26 SOM provide a powerful foundation for implementing demanding edge-based applications. However, each application brings its own requirements for features and functionality associated with an application-specific set of peripherals and other components. To simplify implementation of application-specific solutions, the K26 SOM is designed to be plugged into a carrier card that hosts the additional peripherals. Xilinx demonstrates this approach with its Kria K26-based KV260 Vision AI Starter Kit.

Starter kit simplifies vision application development

Comprising a K26 SOM plugged into a vision-centric carrier board, the Xilinx KV260 Vision AI Starter Kit provides an out-of-the-box platform specifically designed for immediate evaluation and rapid development of smart vision applications. While the K26 SOM provides the required processing capabilities, the starter kit’s carrier board provides power management, including power-on and reset sequencing, as well as interface options and connectors for camera, display, and microSD card (Figure 3).

How to Rapidly Build and Deploy Adaptable Vision Applications for the Edge With a Production-Ready PlatformFigure 3: The Xilinx KV260 Vision AI Starter Kit provides a complete smart vision solution using the K26 SOM plugged into a vision-centric carrier board. (Image source: Xilinx)

Along with its multiple interfaces, the carrier board provides multicamera support through its Raspberry Pi camera interface connector and a pair of image access system (IAS) connectors, one of which links to a dedicated onsemi 13 megapixel (MP) AP1302 image sensor processor (ISP) that is able to handle all image processing functions.

To further speed implementation of vision-based applications, Xilinx supports this pre-defined vision hardware platform with a series of pre-built accelerated vision applications, along with a comprehensive set of software tools and libraries for custom development.

Accelerated applications provide immediate solutions

For immediate evaluation and rapid development of accelerated vision applications, Xilinx and its partners offer several pre-built applications that demonstrate execution of several popular use cases, including smart camera face detection using its programmable logic, pedestrian identification and tracking, defect detection, and natural language processing (NLP) using the MPSoC’s processing system. Available on the Xilinx Kria App Store, each application provides a complete solution for its specific use case with accompanying tools and resources. For example, the smart camera face detection application uses the KV260 carrier card’s built-in AR1335 image sensor and AP1302 ISP to acquire images, and the card’s HDMI or DisplayPort (DP) output to render the result. For face detection processing, the application configures the K26 SOM to provide a vision pipeline accelerator and pre-built machine learning inference engine for face detection, people counting, and other smart camera applications (Figure 4).

How to Rapidly Build and Deploy Adaptable Vision Applications for the Edge With a Production-Ready PlatformFigure 4: Available for download from the Xilinx Kria App Store, pre-built accelerated applications are ready to run immediately on the KV260 starter kit, providing complete solutions for vision use models such as face detection. (Image source: Xilinx)

By providing complete implementation and support, pre-built accelerated applications from the Xilinx App Store let developers get designs up and running in less than one hour, even if they lack FPGA experience. As they evaluate the application, they can use the provided software stack to modify functionality to explore alternative solutions. For more extensive custom development, Xilinx provides a comprehensive suite of development tools and libraries.

AI development environment and tools speed custom development

For custom development of AI-based applications, Xilinx’s Vitis AI development environment provides optimized tools, libraries, and pre-trained models that can serve as the foundation for more specialized custom models. For the runtime operating environment, Xilinx’s Yocto-based PetaLinux embedded Linux software development kit (SDK) provides the full suite of capabilities needed to build, develop, test, and deploy embedded Linux systems. Users can also leverage Ubuntu with Linux packages to accelerate the development of vision AI platforms.

Designed for both experts and developers without FPGA experience, the Vitis AI environment abstracts away the details of the underlying silicon hardware, enabling developers to focus on building more effective ML models. In fact, the Vitis AI environment is integrated with the open-source Apache Tensor Virtual Machine (TVM) deep learning compiler stack, enabling developers to compile their models from different frameworks to a processor, GPU, or accelerator. Using Vitis AI with TVM, developers can enhance their existing designs with accelerated vision capabilities, offloading compute-intensive vision workloads like deep learning models to the Kria SOM. To help developers further optimize their deep learning models, Xilinx’s AI Optimization tool can prune neural networks to reduce complexity in terms of number of giga operations per second (Gops), increase frames per second (fps), and reduce over-parameterized models, compressing them by up to 50x with little impact on accuracy in terms of mean average precision (mAP) (Figure 5).

How to Rapidly Build and Deploy Adaptable Vision Applications for the Edge With a Production-Ready PlatformFigure 5: A Xilinx Research case study showed how a few iterations of pruning using the Xilinx AI Optimization tool can rapidly reduce neural network complexity in terms of number of Gops, while increasing frames per second, all with little impact on accuracy. (Image source: Xilinx)

For implementation of custom vision applications, Xilinx’s open-source Vitis Vision Libraries are optimized for high performance and low resource utilization on Xilinx platforms, providing a familiar interface based on OpenCV. For analytics, the Xilinx Vitis Video Analytics SDK application framework helps developers build more effective vision and video analytics pipelines without requiring deep FPGA knowledge. Based on the widely adopted open-source GStreamer framework, the Video Analytics SDK lets developers quickly create custom acceleration kernels as GStreamer plug-ins for integration into the SDK framework.

Using these tools, a typical embedded developer can easily assemble custom acceleration pipelines with or without custom acceleration kernels.

Conclusion

Compute-intensive ML algorithms have enabled use of smart vision technology in multiple applications running at the edge, but developers face multiple challenges in meeting requirements for high performance, low power, and adaptability of edge-based vision systems. The Kria K26 SOM solution from Xilinx provides the hardware foundation for accelerating advanced algorithms without exceeding stringent power budgets. Using a KV260 Vision AI starter kit with pre-built applications, developers can immediately begin evaluating smart vision applications and use a comprehensive development environment to create custom edge device solutions.