Distributed Machine Learning Patterns, Video Edition

Distributed Machine Learning Patterns, Video Edition

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 6h 21m | 941 MB

Practical patterns for scaling machine learning from your laptop to a distributed cluster.

Distributing machine learning systems allow developers to handle extremely large datasets across multiple clusters, take advantage of automation tools, and benefit from hardware accelerations. This book reveals best practice techniques and insider tips for tackling the challenges of scaling machine learning systems.

In Distributed Machine Learning Patterns you will learn how to:

  • Apply distributed systems patterns to build scalable and reliable machine learning projects
  • Build ML pipelines with data ingestion, distributed training, model serving, and more
  • Automate ML tasks with Kubernetes, TensorFlow, Kubeflow, and Argo Workflows
  • Make trade-offs between different patterns and approaches
  • Manage and monitor machine learning workloads at scale

Inside Distributed Machine Learning Patterns you’ll learn to apply established distributed systems patterns to machine learning projects—plus explore cutting-edge new patterns created specifically for machine learning. Firmly rooted in the real world, this book demonstrates how to apply patterns using examples based in TensorFlow, Kubernetes, Kubeflow, and Argo Workflows. Hands-on projects and clear, practical DevOps techniques let you easily launch, manage, and monitor cloud-native distributed machine learning pipelines.

Deploying a machine learning application on a modern distributed system puts the spotlight on reliability, performance, security, and other operational concerns. In this in-depth guide, Yuan Tang, project lead of Argo and Kubeflow, shares patterns, examples, and hard-won insights on taking an ML model from a single device to a distributed cluster.

Distributed Machine Learning Patterns provides dozens of techniques for designing and deploying distributed machine learning systems. In it, you’ll learn patterns for distributed model training, managing unexpected failures, and dynamic model serving. You’ll appreciate the practical examples that accompany each pattern along with a full-scale project that implements distributed model training and inference with autoscaling on Kubernetes.

What’s inside

  • Data ingestion, distributed training, model serving, and more
  • Automating Kubernetes and TensorFlow with Kubeflow and Argo Workflows
  • Manage and monitor workloads at scale
Table of Contents

1 Part 1. Basic concepts and background
2 Introduction to distributed machine learning systems
3 Distributed systems
4 Distributed machine learning systems
5 What we will learn in this book
6 Summary
7 Part 2. Patterns of distributed machine learning systems
8 Data ingestion patterns
9 The Fashion-MNIST dataset
10 Batching pattern
11 Sharding pattern Splitting extremely large datasets among multiple machines
12 Caching pattern
13 Answers to exercises
14 Summary
15 Distributed training patterns
16 Parameter server pattern Tagging entities in 8 million YouTube videos
17 Collective communication pattern
18 Elasticity and fault-tolerance pattern
19 Answers to exercises
20 Summary
21 Model serving patterns
22 Replicated services pattern Handling the growing number of serving requests
23 Sharded services pattern
24 The event-driven processing pattern
25 Answers to exercises
26 Summary
27 Workflow patterns
28 Fan-in and fan-out patterns Composing complex machine learning workflows
29 Synchronous and asynchronous patterns Accelerating workflows with concurrency
30 Step memoization pattern Skipping redundant workloads via memoized steps
31 Answers to exercises
32 Summary
33 Operation patterns
34 Scheduling patterns Assigning resources effectively in a shared cluster
35 Metadata pattern Handle failures appropriately to minimize the negative effect on users
36 Answers to exercises
37 Summary
38 Part 3. Building a distributed machine learning workflow
39 Project overview and system architecture
40 Data ingestion
41 Model training
42 Model serving
43 End-to-end workflow
44 Answers to exercises
45 Summary
46 Overview of relevant technologies
47 Kubernetes The distributed container orchestration system
48 Kubeflow Machine learning workloads on Kubernetes
49 Argo Workflows Container-native workflow engine
50 Answers to exercises
51 Summary
52 A complete implementation
53 Model training
54 Model serving
55 The end-to-end workflow
56 Summary

Homepage