Applying Deep Learning to Genomics Analysis
May 23, 2018
May 23, 2018
Synthetic Genomics, Incorporated (SGI) is a synthetic biology company that aims to bring genomic-driven solutions to market. They design and build biological systems and conduct interdisciplinary research by combining biology and engineering to address global sustainability problems
SGI asked for Intel’s help to conduct a deep learning proof of concept that would automatically tag a protein sequence. Tagging protein sequences with the corresponding protein family labels and other annotations is important to facilitate genomic research – as there are thousands of protein families and millions of sequences. SGI and Intel collaborated to design a deep-learning software framework flexible enough to design and train various kinds of models each predicting multiple properties of protein sequences using amino-acid sequences as the only input. The system was further trained on the IntelⓇ Deep Learning Cloud (IntelⓇ DL Cloud) to produce basic protein descriptions for each sequence.
To do this, Intel designed a topology to handle multi-task learning. One deep network was designed which could generate output types including classifications, tags, segmentations, and text descriptions. There were multiple tasks trained of each type, and all of these tasks shared the same feature extraction layers and contributed to a single embedding. This embedding has been shown to be a useful representation not only for the tasks it was trained on but also on tasks it had never before seen. This architecture makes it possible to predict a large number of properties in a single pass requiring far less computation than classical methods which require running individual models for each prediction and a complex pipeline to coordinate the execution of these models.
This effort relied on the Intel DL Cloud and the latest in deep learning techniques to facilitate analysis, classification, and prediction in synthetic biology applications.
As an innovative and collaborative data science effort, the Intel AI Lab team leveraged AI techniques and computing power to bring extra tools and insights to SGI’s protein sequence analysis. It allowed the data scientists at SGI to utilize the large amount of existing data and extract new insights that traditional methods couldn’t provide. In the next phase of the project, more capabilities are being added to the model. The ultimate goal is to bring these innovative methods of protein sequence analysis into production. If your company or organization is interested in working with Intel to solve deep learning challenges, please contact us.
Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© Intel Corporation
Stochastic Gradient Descent and its variants, referred here collectively as SGD, have been the de facto methods in training neural networks. These methods aim to minimize a network-specific loss function F(x) whose lower values correspond to better-trained versions of the neural network in question. To find a minimal point x*, SGD relies solely on knowing the…
The 2018 Conference of Computer Vision and Pattern Recognition (CVPR) takes place June 18th-22nd in Salt Lake City, Utah, USA. CVPR is known as the premier annual computer vision event consisting of poster sessions, co-located workshops, and tutorials. Intel’s presence at CVPR consists of 12 accepted papers/poster sessions, one competition, one Intel AI sponsored Doctoral Consortium, two…
Currently, more than 75% of all internet traffic is visual (video/images). Total traffic is exploding, projected to jump from 1.2 zettabytes per year in 2016, to 3.3 zettabytes in 2021, and visual data will comprise roughly 2.6 zettabytes of that. A major challenge for applications is how to process and understand this visual data, a…
Get the latest from Intel AI