Does Machine Learning Open Your Industrial System to Hackers?
Every method of computing invites its own security challenges and machine learning (ML) is no exception. Fortunately, the vulnerabilities in this segment of artificial intelligence (AI) are fairly predictable. Unfortunately, however, they are not very easy to spot.
When we consider the vast amount of data involved, the fine granularity of that data, and the fact that machine learning both learns and improves as it goes along, therein lies the challenges. Machine learning processes data from patterns that are imperceptible to humans, which is both an asset and a vulnerability.
Each area of artificial intelligence yields high efficiency, high quality, and often unprecedented innovation. In the manufacturing process, for example, AI enables problems to be easily found and corrected and AI-based security methods protect the processes involved.
Machine learning “learns” through training algorithms and determines the probable outcome of a situation, while with deep learning (DL), another subset of AI, algorithms enable software to train itself to perform tasks. In this case, multilayered neural networks are exposed to millions of data points, mirroring the human brain’s ability to recognize patterns, and categorize and clarify information.
Vulnerabilities in machine learning
So, back to the question at hand. Does machine learning open your industrial system to hackers? The answer is, nothing is foolproof, especially rapidly evolving technology. That said, there are both well-designed and poorly designed machine learning/deep learning systems and thus some are more susceptible to hacking than others.
Gartner predicts that by 2025, machine learning will be part of every security solution. In the meantime, the security breaches that need to be handled are doubling. Examples of effective efforts include Google’s blocking of approximately 99% of spam emails using machine learning. IBM’s Watson is said to have prevented 200 million cyber attacks that targeted Wimbledon in 2017. Machine learning algorithms are playing a major role in securing cloud-based platforms and analyzing suspicious activities, including logins and other anomalies.
The most commonly used attack mode is an adversarial technique that tries to infiltrate models through malicious input so that the model makes a mistake. When a new input comes along that includes subtle yet maliciously crafted data, the model will behave poorly, but the statistical performance of the model may be unimpaired. Machine learning models can also be attacked in the following ways:
- Compromising integrity is one. Should the ML model not filter out one or more negative cases and they sneak by the system, it can be hacked.
- Exploratory attacks are undertaken to understand model predictions via input record values.
- Causative attacks alter training data and the model. Input records that pass through the system can have a bad record that sneaks in or a good record that is blocked from entering.
- Integrity attacks if bad inputs pass through; the attacker could enter regularly, and the system may label bad inputs as good ones.
- Availability attacks come when the model is trained with an attacker’s data and good inputs are filtered out of the system. In this scenario, legitimate records can be removed.
While it’s true that criminal activity is stepping up attacks on machine learning, it’s not nearly as easy as it might sound. Fortunately, there are very simple places to start to protect your system before adding advanced technologies to beef up security. For example, if your system software is outdated and patches aren’t downloaded when they become available, it’s easier to launch an attack. Strong credentials and multi-factor authentication are both important. Additionally, networks should implement security beyond the simple username/password.
Where to start
To aid in the development of AI applications, the following kits are available:
The NVIDIA Jetson Nano Developer Kit from Seeed Technology delivers the performance needed for such AI workloads as deep learning, computer vision, GPU computing, and multimedia processing (Figure 1). It lets users run AI frameworks and models for such applications as image classification, object detection, segmentation, and speech processing. In the end, it’s a simple way to connect a diverse set of sensors to enable a variety of AI applications.
Figure 1: The Jetson Nano is supported by Seeed Technology’s JetPack, which includes a board support package, Linux OS, NVIDIA CUDA, cuDNN, and TensorRT software libraries for AI apps. (Image source: Seeed)
Adafruit recently unveiled the BrainCraft EDGE BADGE embedded evaluation board (Figure 2), bringing machine learning to the edge via small microcontrollers running a miniature version of TensorFlow Lite. The credit-card sized board, shown in Figure 2, is powered by Microchip’s ATSAMD51J19 with 512 kbytes of Flash memory and 192 kbytes of RAM. The kit includes built-in microphone input for speech recognition and an Arduino library with demos to recognize various word pairs and gestures.
Figure 2: This Supercon badge can also be a name badge programmed with Circuit Python. It shows up as USB drive, with no IDE needed to display a name, QR codes, or other information. (Image source: Adafruit)
Finally, advanced sensors, such as the STMicroelectronics LSM6DOX, combine a machine learning core, a finite state machine, and advanced digital functions, providing a boost for the company’s STM32 microprocessor family so it can address the performance and accuracy necessary for AI functions.
Trends going forward
Today, there are cloud-based computing models that include machine learning platforms available via cognitive computing, automated machine learning, ML model management, ML model serving, and GPU-based computing. However, when considering the wealth of data necessary for ML and deep learning applications, it’s clear that the headlines are replete with ever-larger incidents of cloud hacking.
Companies are smart to be cautious in the movement of sensitive data to the cloud when that data involves AI/ML. The security policies necessary to truly protect the sensitive data and the means to control the hacking, aren’t necessarily as trustworthy as they will need to be.
The sheer amount of data produced by the IoT is mind boggling. The data necessary to launch AI, automation, machine learning, etc., especially if there is legacy data in the mix, absolutely must be the right data for the application.
Here’s a short list of the steps a developer should take when implementing AI/ML:
- Know and understand where the gaps are in existing data
- Understand which workflows will be affected by a potential AI project
- Ensure full corporate buy-in to an established and communicated end game and know how each participates in that process
- Harness the technology and opportunity rather than set out to cut costs
- Start with data cleansing to detect, correct, and remove corrupt or inaccurate records
AI and machine learning require that the data-driving algorithms and decisions be of high quality. Artificial intelligence, machine learning, and deep learning will likely have a major impact on the future of most companies at some point. Already, machine learning algorithms are the major method for detection of file-based malware and to block malware. They are also establishing the applications that are unsafe to use and are isolating them from production systems. AI is also being used in financial services, health care, and insurance to protect extremely sensitive data.
It’s true that we’re captivated by the concept of AI/machine learning. It will be an amazing tool when used to its full potential. Make sure that you have substantial in-house knowledge or a cloud or implementation partner to get you through the hacking minefields.