Susan Diamond is the engineering manager and architect of Watson Deep Learning as a Service platform. She led the Watson product engineering team to work with IBM research to create and productize the Deep Learning as a Service platform. The DLaaS platform is being used by Watson service model training as well as Watson customers. Previously, she was one of the handful people that started the Watson Developer Cloud, a cloud platform that hosts Watson cognitive applications
Abstract
Machine Learning workloads have traditionally been run in high-performance computing (HPC) environments, where users log in to
dedicated machines and utilize the attached GPUs to run training jobs on huge datasets. Training large neural network models is
very resource intensive, and even after exploiting parallelism and accelerators such as GPUs, a single training job can still take days. Consequently, the cost of hardware is a barrier to entry. Even when upfront cost is not a concern, the lead time to setup such an HPC environment takes months from acquiring hardware to setup the hardware with the right set of firmware, software installed and configured. Furthermore, scalability is hard to achieve in a rigid traditional lab environment. Therefore, it is slow to react to the dynamic change in artificial intelligent industry.
Watson Deep Learning as a service, a cloud-based deep learning platform that mitigates the long lead time and high upfront investment in hardware. It enables robust and scalable sharing of resources among the teams in an organization. It is designed for on-demand cloud environments. Providing a similar user experience in a multi-tenant cloud environment comes with its own unique challenges regarding fault tolerance, performance,
and security. Watson Deep Learning as a service tackles these challenges and present a deep learning stack for the cloud environments in a secure, scalable and fault-tolerant manner. It supports a wide range of deep-learning frameworks such as Tensorflow, PyTorch, Caffe, Torch, Theano, and MXNet etc.
These frameworks reduce the effort and skillset required to design, train, and use deep learning models.
Artificial Neural Network and Virtual Intelligence
Machine Learning and Decision Management
Robotics and Intelligent System
Big Data Analysis and Data Mining
Cyber Defence and Cyber Security
Natural Language Processing And Speech Recognition