Nervana Cloud 1.5.0 contains enormous under-the-hood changes and improvements.  We’ve revamped and updated a lot of the core underlying code, separated the various application components into their own microservices, re-written our job launcher, added support for a new container orchestration service, squashed more than 75 bugs, and greatly expanded our testing coverage. The biggest changes visible to the end user are the aeon dataloader and auxiliary file volume support.

Aeon dataloader – The aeon dataloader enables fast and flexible access to training data sets that are too large to load directly into memory. Data is first loaded in chunks called “macrobatches” that are then split further into minibatches to feed the model. An easy interface enables you to configure the dataloader for custom datasets and to load data from disk with minimal latency.  A manifest file is used to indicate your local input and target paths.  On the backend we’ve built an entirely new data service to handle fetching, caching, and serving requests for dataset batches. See the Aeon User Guide for more information.

Auxiliary file volume support – We’ve also introduced support for arbitrary file volumes to handle non-aeon formatted data such as older datasets, vocabulary files, and output data.  The volumes are mounted read/write during model training and inference jobs. You can append new files to existing volumes or download their contents. See Attaching Data for details on how to use this feature.

This release of Nervana Cloud includes a number of additional features and fixes:

  • New ncloud commands and API endpoints for retrieving command history, getting machine information, and revoking access tokens.
  • Automatic command retry and other enhancements to improve stability when uploading large individual files and directories of many small files. In addition, batch sizes are now configurable to better cope with network lag and disruptions. We also cap the number of simultaneous open file descriptors.
  • Enhancements such as automatic scaling and load balancing to improve streaming inference performance.
  • Revamped administration of users, groups, and tenants to improve the content displayed and fix removal operations in certain scenarios.  Added administration support via web user interface.
  • Nervana Cloud now defaults to neon v1.9.0 — i.e., all training jobs, interactive Jupyter sessions, and model deployment jobs will now assume neon v1.9.0 unless you explicitly override them to use a different version.

Related Blog Posts

Security at Nervana Part 2: Securing Data

In our previous Security post, we discussed the Root of Trust, and how it is used to create a secure, trusted environment in which to execute deep learning applications. In this post, we explore the challenges involved in securing data, and how we can build on the aforementioned hardened software environment to meet those challenges.…

Read more

#Intel DL Cloud & Systems

Simplified ncloud syntax and other improvements to Nervana Cloud

Nervana Cloud is a full-stack hosted platform for deep learning that enables businesses to develop and deploy high-accuracy deep learning solutions at a fraction of the cost of building their own infrastructure and data science teams. We recently updated Nervana Cloud’s ncloud command-line interface (CLI) syntax to support subcommands and shortcuts for improved usability and…

Read more

#Intel DL Cloud & Systems

Securing the Deep Learning Stack

This is the first post of Nervana’s “Security Meets Deep Learning” series. Security is one of the biggest concerns for any enterprise, but it’s especially critical for companies deploying deep learning solutions since datasets often contain extremely sensitive information. Fundamentally, “security” refers to the protection of a system against the many forms of malicious attacks.…

Read more

#Intel DL Cloud & Systems