Efficient Processing of Deep Neural Networks. Vivienne Sze
Чтение книги онлайн.
Читать онлайн книгу Efficient Processing of Deep Neural Networks - Vivienne Sze страница 5
8.3 Sparse Dataflow
8.3.1 Exploiting Sparse Weights
8.3.2 Exploiting Sparse Activations
8.3.3 Exploiting Sparse Weights and Activations
8.3.4 Exploiting Sparsity in FC Layers
8.3.5 Summary of Sparse Dataflows
8.4 Summary
9 Designing Efficient DNN Models
9.1 Manual Network Design
9.1.1 Improving Efficiency of CONV Layers
9.1.2 Improving Efficiency of FC Layers
9.1.3 Improving Efficiency of Network Architecture After Training
9.2 Neural Architecture Search
9.2.1 Shrinking the Search Space
9.2.2 Improving the Optimization Algorithm
9.2.3 Accelerating the Performance Evaluation
9.2.4 Example of Neural Architecture Search
9.3 Knowledge Distillation
9.4 Design Considerations for Efficient DNN Models
10.1 Processing Near Memory
10.1.1 Embedded High-Density Memories
10.1.2 Stacked Memory (3-D Memory)
10.2 Processing in Memory
10.2.1 Non-Volatile Memories (NVM)
10.2.2 Static Random Access Memories (SRAM)
10.2.3 Dynamic Random Access Memories (DRAM)
10.2.4 Design Challenges
10.3 Processing in Sensor
10.4 Processing in the Optical Domain
Preface
Deep neural networks (DNNs) have become extraordinarily popular; however, they come at the cost of high computational complexity. As a result, there has been tremendous interest in enabling efficient processing of DNNs. The challenge of DNN acceleration is threefold:
• to achieve high performance and efficiency,
• to provide sufficient flexibility to cater to a wide and rapidly changing range of workloads, and
• to integrate well into existing software frameworks.
In order to understand the current state of art in addressing this challenge, this book aims to provide an overview of DNNs, the various tools for understanding their behavior, and the techniques being explored to efficiently accelerate their computation. It aims to explain foundational concepts and highlight key design considerations when building hardware for processing DNNs rather than trying to cover all possible design configurations, as this is not feasible given the fast pace of the field (see Figure 1). It is targeted at researchers and practitioners who are familiar with computer architecture who are interested in how to efficiently process DNNs or how to design DNN models that can be efficiently processed. We hope that this book will provide a structured introduction to readers who are new to the field, while also formalizing and organizing key concepts to provide insights that may spark new ideas for those who are already in the field.
Organization
This book is organized into three modules that each consist of several chapters. The first module aims to provide an overall background to the field of DNN and insight on characteristics of the DNN workload.
• Chapter 1 provides background on the context of why DNNs are important, their history, and their applications.
• Chapter 2 gives an overview of the basic components of DNNs and popular DNN models currently in use. It also describes the various resources used for DNN research and development. This includes discussion of the various software frameworks and the public datasets that are used for training and evaluation.
The second module focuses on the design of hardware for processing DNNs. It discusses various architecture design decisions depending on the degree of customization (from general purpose platforms to full custom hardware) and design considerations when mapping the DNN workloads onto these architectures. Both temporal and spatial architectures are considered.
Figure 1: It’s been observed that the number of ML publications are growing exponentially at a faster rate than Moore’s law! (Figure from [1].)
• Chapter 3 describes the key metrics that should be considered when designing or comparing various DNN accelerators.
• Chapter 4 describes how DNN kernels can be processed, with a focus on temporal architectures such as CPUs and GPUs. To achieve greater efficiency, such architectures generally have a cache hierarchy and coarser-grained computational capabilities, e.g., vector instructions, making the resulting computation more efficient. Frequently for such architectures, DNN processing can be transformed into a matrix multiplication, which has many optimization opportunities. This chapter also discusses various software and hardware optimizations used to accelerate DNN computations on these platforms without impacting application accuracy.
• Chapter 5 describes the design of specialized hardware for DNN processing, with a focus on spatial architectures. It highlights the processing order and resulting data movement in the hardware used to process a DNN and the relationship to a loop nest representation of a DNN. The order of the loops in the loop nest is referred to as the dataflow, and it determines how often each piece of data needs to be moved.