Securing the Deep Learning Stack
Aug 02, 2016
Aug 02, 2016
This is the first post of Nervana’s “Security Meets Deep Learning” series. Security is one of the biggest concerns for any enterprise, but it’s especially critical for companies deploying deep learning solutions since datasets often contain extremely sensitive information.
Fundamentally, “security” refers to the protection of a system against the many forms of malicious attacks. Common types of attacks include:
Because attacks target vulnerabilities at any level of the software and hardware stack, protection must also be provided at every level.
In this post we cover potential security issues and the methods Nervana uses to secure our deep learning stack — all the way from “electrons-to-applications”. Subsequent posts will discuss how Nervana extends security to deep learning datasets, the security requirements that are specific to cloud-based deep learning, and the ways that we meet those challenges. Lastly, we’ll cover a sampling of case studies that showcase how our architecture protects against a variety of attacks.
“Trust” is a (if not – “the”) cornerstone of computer security. Before something is considered secure, it has to be trusted. The question is, however, “How do you determine trust?” In person-to-person transactions, you can make a determination about someone’s trustworthiness either directly or by outsourcing that determination to a ‘trust verification’ service such as a credit agency. But how do you know to trust that agency?
Computer security faces a similar problem. You can perform a check against a portion of your system to assess its integrity, but how do you verify that the “checker”, itself, is not compromised? For example, if you rely on a tool that scans for unapproved binaries running on your server (by looking at the process table, for example), how do you know the tool hasn’t been hacked? In fact, altering commonly used detection tools to render the attack as invisible is the first thing most attacks do.
The naive answer is to have a different program verify the integrity of the checkers. However, and perhaps obviously, this raises the question of “Who checks the checker checker?” To avoid an infinite series of checkers, a “root” must be established – a mechanism that can be implicitly trusted, and so can be securely relied upon to verify the integrity of the next layer up. This is called the “root-of-trust.”
Establishing a robust, hardened root-of-trust has occupied legions of very smart computer scientist for decades. It requires carefully designed solutions that are deeply integrated into CPU architectures and can be leveraged to create entire systems in which every component can be trusted. This “hardware root-of-trust” is the anchor on which secure systems must be built.
Note that while much progress has been made in recent years, there is still no such thing as a 100% secure hardware root-of-trust. It is important for a security architect to realize that the goal of computer security is not to make your system perfectly secure — which is most likely impossible — but is instead to make it so expensive to hack that it is not worth an attacker’s trouble.
Effective protection against attacks requires an overlapping system of security technologies starting from a hardware “root-of-trust”. That trust then must be extended through every layer of the system’s software and hardware stack. Implementing this level of security requires physical security, secure hardware systems, a hardened software infrastructure, cryptography for data both at rest and in motion, and a robust set of user authorization policies that ensure privacy and isolation. In this section, we explore how the hardware root-of-trust can be established for the servers and how it is extended through the OS and made available to the application layer.
Physical access — and even physical proximity — to a server can be fatal to its security. Once attackers gain physical access to a server, they can leverage any number of hacking techniques, ranging from probing the electrical signals on a server’s motherboard to listening for RF interference from the CPUs performing crypto to extract their secret keys. The only way to mitigate this class of attacks is to place the servers in a highly secure facility. This problem is common to many application types, so most colocation and cloud providers offer physical security as a service.
In this context, “secure hardware” refers to the CPU, any peripherals needed to establish a hardware root-of-trust, and the extension of the trust “upwards” through the rest of the stack via a chain of trust. The Nervana Platform “chain of trust” is shown in Figure 1.
Figure 1. The Nervana Platform Chain of Trust
In general terms, establishing the root-of-trust requires that the CPU be able to boot securely. This process involves loading a small, immutable kernel that calculates a hash over the boot firmware (BIOS) binary and compares it against the expected hash value. The boot firmware, being trusted, can then repeat that process one layer up, calculating a hash over the OS’s boot loader to compare it against a trusted value.
This process is generally called “secure booting” and is supported by most recent CPUs. It also requires a secure place to store the trusted hash values. In Intel’s trusted execution technology ecosystem, this is usually a highly secure hardware peripheral called a Trusted Platform Module.
The accelerators used in deep learning (GPUs and the Nervana Engine) must also be secured — a topic which we’ll address in a subsequent post.
In theory, once a hardware root-of-trust is established, that trust can be extended to every bit of software executing on that platform. This is a nice theory! However, it is completely impractical, as this would require that every piece of software be immutable, cryptographically signed, and checked every time it is executed. And even if this were achieved, many attacks inject code dynamically, so vulnerabilities would still exist. In practice, software systems are too complex and dynamic to be completely secure in this fashion.
Instead, security architectures focus on securing the operating system itself, then carefully restrict users to certain operations. There are a variety of technologies built into Linux that facilitate this, including Mandatory Access Controls (of which SELinux is one implementation), Integrity Measurement Architecture (IMA/EVM), ASLR, Signed Kernel Modules, and many more. Enabling and correctly configuring these features goes a long way towards locking down the OS and preventing the bulk of attacks.
Securing the OS provides a safe, isolated environment to run applications. Unfortunately, while securing the OS is difficult, ensuring that applications are secure is nearly impossible (outside of very restricted environments such as embedded systems). There are too many applications to enforce safe coding practices across all of them, and many development environments are not amenable to safe coding. Instead of trying to secure every application, security approaches focus on limiting the damage that an application can cause to ensure that a compromised application can only affect the “sandbox” in which it is run.
In the case of cloud computing, this is typically achieved by running most applications in some form of a “container” (e.g. virtual machines, LXC, docker, etc) enabled through operating-system-level virtualization with carefully constrained privileges. Everything else, including the application that manages the containers, is then secured using the same techniques used to protect the OS.
Deep learning applications present an additional set of challenges for two reasons. First, they typically require direct access to acceleration hardware. Second, they are typically written at least partially in Python, which is inherently less secure than modern code-safe languages such as Go. We’ll see how containerization can be used to mitigate these challenges in a subsequent post.
In this post, we have outlined the computer security challenges, discussed the hardware root of trust, and explored how this could be leveraged to secure operating systems and applications. In the next post, we will discuss the problem of securing data (both at rest and in motion), encryption and user authentication and authorization.
To learn more about Nervana, please contact us at email@example.com.
Nervana Cloud 1.5.0 contains enormous under-the-hood changes and improvements. We’ve revamped and updated a lot of the core underlying code, separated the various application components into their own microservices, re-written our job launcher, added support for a new container orchestration service, squashed more than 75 bugs, and greatly expanded our testing coverage. The biggest changes…
In our previous Security post, we discussed the Root of Trust, and how it is used to create a secure, trusted environment in which to execute deep learning applications. In this post, we explore the challenges involved in securing data, and how we can build on the aforementioned hardened software environment to meet those challenges.…
Nervana Cloud is a full-stack hosted platform for deep learning that enables businesses to develop and deploy high-accuracy deep learning solutions at a fraction of the cost of building their own infrastructure and data science teams. We recently updated Nervana Cloud’s ncloud command-line interface (CLI) syntax to support subcommands and shortcuts for improved usability and…
Keep tabs on all the latest news with our monthly newsletter.