Home
Home
Podcasts
Podcasts
Library
Cancel
Sign in
PODCAST
PyTorch Developer Podcast
Edward Yang, Team PyTorch
Follow
The PyTorch Developer Podcast is a place for the PyTorch dev team to do bite sized (10-20 min) topics about all sorts of internal development topics in PyTorch.
Start Here
Jun 13 2022
Learning rate schedulers
What’s a learning rate? Why might you want to schedule it? How does the LR scheduler API in PyTorch work? What the heck is up with the formula implementation? Why is everything terrible?
Jun 13 2022
Learning rate schedulers
What’s a learning rate? Why might you want to schedule it? How does the LR scheduler API in PyTorch work? What the heck is up with the formula implementation? Why is everything terrible?
Jun 6 2022
Weak references
What are they good for? (Caches. Private fields.) C++ side support, how it’s implemented / release resources. Python side support, how it’s implemented. Weak ref tensor hazard due to resurrection. Downsides of weak references in C++. Scott Wolchok’s release resources optimization.Other episodes to listen to first:
May 30 2022
Strides
Mike Ruberry has an RFC about stride-agnostic operator semantics ( so let's talk about strides. What are they? How are they used to implement views and memory format? How do you handle them properly when writing kernels? In what sense are strides overspecified, and therefore, not worth slavishly reimplementing in a system like PrimTorch? What does Edward think we should do about them?My blog post that covers strides along with other topics can be found at
May 9 2022
AOTAutograd
AOTAutograd is a cool new feature in functorch for capturing both forward and backward traces of PyTorch operators, letting you run them through a compiler and then drop the compiled kernels back into a normal PyTorch eager program. Today, Horace joins me to tell me how it works, what it is good to use for, and what our future plans for it are.
May 2 2022
Dispatcher questions with Sherlock
Sherlock recently joined the PyTorch team, having previously worked on ONNX Runtime at Microsoft, and Sherlock’s going to ask me some questions about the dispatcher, and I’m going to answer them. We talked about the history of the dispatcher, how to override dispatching order, multiple dispatch, how to organize various dispatch keys and torch function mode. The companion video is at
Apr 25 2022
New CI
PyTorch recently moved all of its CI from CircleCI to GitHub Actions. There were a lot of improvements in the process, making my old podcast about CI obsolete! Today, Eli Uriegas joins me to talk about why we moved to GitHub Actions, how the new CI system is put together, and what some cool features about our new CI.
Apr 17 2022
Python exceptions
C++ has exceptions, Python has exceptions. But they’re not the same thing! How do exceptions work in CPython, how do we translate exceptions from C++ to Python (hint: it’s different for direct bindings versus pybind11), and what do warnings (which we also translate from C++ to Python) have in common with this infrastructure?
Apr 11 2022
Torch vs ATen APIs
PyTorch’s torch API is the Python API everyone knows and loves, but there’s also another API, the ATen API, which most of PyTorch’s internal subsystems are built on. How to tell them apart? What implications do these have on our graph mode IR design? Also, a plug for PrimTorch, a new set of operators, not designed for eager mode, that is supposed to be even lower level than ATen.
Sep 24 2021
All about NVIDIA GPUs
PyTorch is in the business of shipping numerical software that can run fast on your CUDA-enabled NVIDIA GPU, but it turns out there is a lot of heterogeneity in NVIDIA’s physical GPU offering and when it comes to what is fast and what is slow, what specific GPU you have on hand matters quite a bit. Yet there are literally hundreds of distinct NVIDIA GPU models on the market, how do you make sense of the madness? Today, Natalia Gimelshein joins me to talk about everything that’s going on in the NVIDIA GPU market, and what, as a framework developer, you have to care about to make sense of it all.Further reading.NVIDIA microarchitectures on Wikipedia slightly old post about matching SM to architecture
Sep 16 2021
Tensor subclasses and Liskov substitution principle
A lot of recent work going in PyTorch is all about adding new and interesting Tensor subclasses, and this all leads up to the question of, what exactly is OK to make a tensor subclass? One answer to this question comes from an old principle from Barbara Liskov called the Liskov substitution principle, which informally can be stated as S is a subtype of T if anywhere you have T, it can be replaced with S without altering "desirable" properties of this program. In this podcast I'll talk about LSP and how it relates to the design of Tensor subclasses and a hypothetical "abstract Tensor specification" which really doesn't exist but which sort of implicitly exists in the corpus of existing PyTorch programs.Further reading:This is a cool interview with Barbara Liskov that I quote in the podcast Balandat talking about linear operators in PyTorch the end I talk a little bit about multiple dispatch; an earlier discussion about this topic is in this podcast
Sep 10 2021
Half precision
In this episode I talk about reduced precision floating point formats float16 (aka half precision) and bfloat16. I'll discuss what floating point numbers are, how these two formats vary, and some of the practical considerations that arise when you are working with numeric code in PyTorch that also needs to work in reduced precision. Did you know that we do all CUDA computations in float32, even if the source tensors are stored as float16? Now you know!Further reading.The Wikipedia article on IEEE floating point is pretty great bfloat16 works out when doing training of acc_type in PyTorch
Sep 1 2021
DataLoader with multiple workers leaks memory
Today I'm going to talk about a famous issue in PyTorch, DataLoader with num_workers > 0 causes memory leak ( This bug is a good opportunity to talk about DataSet/DataLoader design in PyTorch, fork and copy-on-write memory in Linux and Python reference counting; you have to know about all of these things to understand why this bug occurs, but once you do, it also explains why the workarounds help.Further reading.A nice summary of the full issue architecture RFC Python
Aug 18 2021
Batching
PyTorch operates on its input data in a batched manner, typically processing multiple batches of an input at once (rather than once at a time, as would be the case in typical programming). In this podcast, we talk a little about the implications of batching operations in this way, and then also about how PyTorch's API is structured for batching (hint: poorly) and how Numpy introduced a concept of ufunc/gufuncs to standardize over broadcasting and batching behavior. There is some overlap between this podcast and previous podcasts about TensorIterator and vmap; you may also be interested in those episodes.Further reading.ufuncs and gufuncs and brief taxonomy of PyTorch operators by shape behavior episodes on TensorIterator and vmap and
Aug 10 2021
Multiple dispatch in __torch_function__
Python is a single dispatch OO language, but there are some operations such as binary magic methods which implement a simple form of multiple dispatch. torch_function__ (through its Numpy predecessor __array_function) generalizes this mechanism so that invocations of torch.add with different subclasses work properly. This podcast describes how this mechanism works and how it can be used (in an unconventional way) to build composable subclasses ala JAX in functorch.Further reading:This podcast in written form dispatch resolution rules in the RFC
Aug 3 2021
Multithreading
Writing multithreading code has always been a pain, and in PyTorch there are buckets and buckets of multithreading related issues you have to be aware about and deal with when writing code that makes use of it. We'll cover how you interface with multithreading in PyTorch, what goes into implementing those interfaces (thread pools!) and also some miscellaneous stuff like TLS, forks and data structure thread safety that is also relevant.Further reading:TorchScript CPU inference threading documentation thread pool and autograd thread pool issue for TLS propagation across threads
Jul 27 2021
Asynchronous versus synchronous execution
CUDA is asynchronous, CPU is synchronous. Making them play well together can be one of the more thorny and easy to get wrong aspects of the PyTorch API. I talk about why non_blocking is difficult to use correctly, a hypothetical "asynchronous CPU" device which would help smooth over some of the API problems and also why it used to be difficult to implement async CPU (but it's not hard anymore!) At the end, I also briefly talk about how async/sync impedance can also show up in unusual places, namely the CUDA caching allocator.Further reading.CUDA semantics which discuss non_blocking somewhat requesting async cpu
Jul 23 2021
gradcheck
We talk about gradcheck, the property based testing mechanism that we use to verify the correctness of analytic gradient formulas in PyTorch. I'll talk a bit about testing in general, property based testing and why gradcheck is a particularly useful property based test. There will be some calculus, although I've tried to keep the math mostly to intuitions and pointers on what to read up on elsewhere.Further reading.Gradcheck mechanics, a detailed mathematical explanation of how it works In particular, it also explains how gradcheck extends to complex numbersJAX has a pretty good explanation about vjp and jvp at gradcheck tracking issue
Jul 21 2021
torch.use_deterministic_algorithms
torch.use_deterministic_algorithms lets you force PyTorch to use deterministic algorithms. It's very useful for debugging!There are some errors in the recording: the feature is called torch.use_deterministic_algorithms, and there is not actually a capability to warn (this was in an old version of the PR but taken out), we just error if you hit nondeterministic code.Docs:
Jul 20 2021
Reference counting
Reference counting is a common memory management technique in C++ but PyTorch does its reference counting in a slightly idiosyncratic way using intrusive_ptr. We'll talk about why intrusive_ptr exists, the reason why refcount bumps are slow in C++ (but not in Python), what's up with const Tensor& everywhere, why the const is a lie and how TensorRef lets you create a const Tensor& from a TensorImpl* without needing to bump your reference count.Further reading.Why you shouldn't feel bad about passing tensor by reference correctness in PyTorch RFC
Jul 13 2021
Memory layout
Memory layout specifies how the logical multi-dimensional tensor maps its elements onto physical linear memory. Some layouts admit more efficient implementations, e.g., NCHW versus NHWC. Memory layout makes use of striding to allow users to conveniently represent their tensors with different physical layouts without having to explicitly tell every operator what to do.Further reading.Tutorial format RFC permutation proposal (not implemented)