Skip to the content.

Efficient Deep Learning book cover

Please link to this site using


If you are interested in collaborating with us in any aspect of developing this book, please feel free to reach out to us at:


We have included the PDFs of some of the chapter drafts for your review (Chapter 1, 2, 3, 4, 5). Please be warned that the content is in an initial stage, and might contain errors. Any feedback is appreciated though.


The field of Deep Learning has progressed exponentially, and so has the footprint of ML models like BERT, GPT-3, ResNet, etc. While they work great, training and deploying these large (and growing) models in production is expensive. You might want to deploy your face filter model on smartphones to let your users add a puppy filter on their selfies. But it might be too big or too slow, or you might want to improve the quality of your cloud-based spam detection model but you don’t want to pay for a bigger cloud VM to host a more accurate but larger model. What if you don’t have enough labeled data for your models, or can’t manually tune your models? All this is daunting!

What if you could make your models more efficient: use less resources (model size, latency, training time, data, manual involvement) and deliver better quality (accuracy, precision, recall, etc.). That sounds wonderful! But how?

This book will go through algorithms and techniques used by researchers and engineers at Google Research, Facebook AI Research (FAIR), and other eminent AI labs to train and deploy their models on devices ranging from large server-side machines to tiny microcontrollers. In this book we present a balance of fundamentals as well as practical know-how to fully equip you to go ahead and optimize your model training and deployment workflows such that your models perform as well or better than earlier, with a fraction of resources. We also will present deep dives into popular models, infrastructure, and hardware, along with challenging projects to test your skills.

Table of Contents

The table of contents is as follows.

Part I: Introduction to Efficient Deep Learning

  1. Introduction (PDF)
    • Introduction to Deep Learning
    • Efficient Deep Learning
    • Mental Model of Efficient Deep Learning
    • Summary

Part II: Effciency Techniques

  1. Introduction to Compression Techniques (PDF)
    • An Overview of Compression
    • Quantization
    • Exercises: Compressing images from the Mars Rover
    • Project: Quantizing a Deep Learning Model
    • Summary
  2. Introduction to Learning Techniques (PDF)
    • Learning Techniques and Efficiency
    • Data Augmentation
      • Project: Increasing the accuracy of an image classification model with Data Augmentation.
      • Project: Increasing the accuracy of a text classification model with Data Augmentation.
    • Distillation
      • Project: Increasing the accuracy of an speech identification model with Distillation.
    • Summary
  3. Efficient Architectures (PDF)
    • Embeddings for Smaller and Faster Models
      • Project: Using pre-trained embeddings to improve accuracy of a NLP task.
    • Learn Long-Term Dependencies Using Attention
      • Project: News Classification Using RNN and Attention Models
    • Efficient On-Device Convolutions
      • Project: Project: Snapchat-Like Filters for Pets
    • Summary
  4. Advanced Compression Techniques !NEW! (PDF)
    • Model Compression Using Sparsity
      • Exercise: Sparsity improves compression
      • Project: Lightweight model for pet filters application
    • Weight Sharing using Clustering
      • Exercise: Using clustering to compress a 1-D tensor.
      • Exercise: Mars Rover beckons again! Can we do better with clustering?
      • Exercise: Simulating clustering on a dummy dense fully-connected layer
      • Project: Using Clustering to compress a deep learning model
    • Summary
  5. Advanced Learning Techniques
    • Contrastive Learning
    • Unsupervised Pre-Training
    • Project: Learning to classify with 10% labels.
    • Curriculum Learning
  6. Automation
    • Hyper-Parameter Tuning
      • Project: Multi-objective tuning to get a smaller and more accurate model.
    • AutoML
      • Project: Searching over model architectures for boosting model accuracy.
    • Compression Search
      • Project: Layer-wise Sparsity to achieve a pareto optimal model.

Part 3 - Infrastructure

  1. Software Infrastructure
    • PyTorch Ecosystem
    • iOS Ecosystem
    • Cloud Ecosystems
  2. Hardware infrastructure
    • GPUs
    • Jetson
    • TPU
    • M1 / A4/5?
    • Microcontrollers

Part 3 - Applied Deep Dives

  1. Deep-Dives: Tensorflow Platforms
    • Mobile
      • Project: Benchmarking a tiny on-device model with TFLite.
    • Microcontrollers
      • Project: Speech detection on a microcontroller with TFMicro.
    • Web
      • Project: Face recognition on the web with TensorFlow.JS.
    • Google Tensor Processing Unit (TPU)
      • Project: Training BERT efficiently with TPUs.
    • Summary
  2. Deep-Dives: Efficient Models
    • BERT
      • Project: Training efficient BERT models.
    • MobileNet
    • EfficientNet architectures
      • Project: Comparing efficient mobile models on Mobile.
    • Speech Detection
      • Project: Efficient speech detection models.

Projects / Codelabs / Tutorials

The Minimally Qualified Reader

The minimally qualified reader is someone who has a basic understanding of ML and at least some experience of training deep learning models. They can do basic fine-tuning of models by changing common parameters, can make minor changes to model architectures, etc. and get the modified models to train to a good accuracy. However, they are running into problems with productionizing these models / want to optimize them further. This is primarily because the book does not teach deep learning basics, for which we would like to refer you to excellent resources like Deep Learning with Python, Dive into Deep Learning, etc. Any reader having this pre-requisite knowledge would be able to enjoy the book.

Subscribe for Updates

* indicates required

Report errata and feedback.

We welcome any errata / feedback / ideas. Please file them as an issue here. Alternatively, write to us at