Skip to content

Commit 54d0972

Browse files
oxinaboxmzgubicnickrobinson251niklasschmitz
authored
Apply suggestions from code review
Co-authored-by: Miha Zgubic <mzgubic@users.noreply.github.com> Co-authored-by: Nick Robinson <npr251@gmail.com> Co-authored-by: Niklas Schmitz <niklas.f.schmitz@gmail.com>
1 parent 57813a2 commit 54d0972

1 file changed

Lines changed: 19 additions & 19 deletions

File tree

docs/src/converting_zygoterules.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# Converting ZygoteRules.@adjoint to `rrule`s
22

3-
[ZygoteRules.jl](https://github.com/FluxML/ZygoteRules.jl) is a legacy package similar to ChainRules but supporting [Zygote.jl](https://github.com/FluxML/Zygote.jl) only.
3+
[ZygoteRules.jl](https://github.com/FluxML/ZygoteRules.jl) is a legacy package similar to ChainRulesCore but supporting [Zygote.jl](https://github.com/FluxML/Zygote.jl) only.
44

55
If you have some rules written with ZygoteRules it is a good idea to upgrade them to use ChainRules instead.
66
Zygote will still be able to use them, but so will other AD systems,
77
and you will get access to some more advanced features.
8-
(Which Zygote may or may not then ignore).
8+
Some of these features are currently ignored by Zygote, but could be supported in the future.
99

1010
## Example
1111
Consider the function
@@ -29,20 +29,20 @@ end
2929
### ChainRules
3030
```julia
3131
function rrule(::typeof(f), x, y::Foo, z)
32-
f_pullback(Ω̄) = (NoFields(), 2Ω̄, Tangent{Foo}(;a=Ω̄), ZeroTangent())
32+
f_pullback(Ω̄) = (NoTangent(), 2Ω̄, Tangent{Foo}(;a=Ω̄), ZeroTangent())
3333
return f(x, y, z), f_pullback
3434
end
3535
```
3636

3737
## Write as a `rrule(::typeof(f), ...)`
3838
No magic macro here, `rrule` is the function that it is.
39-
The function it is the rule for is the first argument, or second argument if you need to take a `[RuleConfig`](@ref).
39+
The function it is the rule for is the first argument, or second argument if you need to take a [`RuleConfig`](@ref).
4040

4141
Note that when writing the rule for constructor you will need to use `::Type{Foo}`, not `typeof(Foo)`.
4242
See docs on [Constructors](@ref).
4343

4444
## Include the derivative with respect to the function object itself
45-
The `ZygoteRules.@adjoint` macro automagically[^1] inserts a extra `nothing` in the return for the function it generates to represent the derivative of output with respect to the function object.
45+
The `ZygoteRules.@adjoint` macro automagically[^1] inserts an extra `nothing` in the return for the function it generates to represent the derivative of output with respect to the function object.
4646
ChainRules as a philosophy avoids magic as much as possible, and thus require you to return it explicitly.
4747
If it is a plain function (like `typeof(sin)`), then the differential will be [`NoTangent`](@ref).
4848

@@ -60,18 +60,18 @@ There are many senses of zero.
6060
ChainRules represents two of them, as subtypes of [`AbstractZero`](@ref).
6161

6262
[`ZeroTangent`](@ref) for the case that there is no relationship between the primal output and the primal input.
63-
[`NoTangent`](@ref) for the case that conceptually the tangent space don't exist.
63+
[`NoTangent`](@ref) for the case where conceptually the tangent space doesn't exist.
6464
e.g. what is the Tangent to a String or an index: those can't be perturbed.
6565

66-
See [FAQ on difference between `ZeroTangent` and `NoTangent`](@ref faq_abstract_zero).
66+
See [FAQ on the difference between `ZeroTangent` and `NoTangent`](@ref faq_abstract_zero).
6767
At the end of the day it doesn't matter too much if you get them wrong.
6868
`NoTangent` and `ZeroTangent` more or less act identically.
6969

7070
### `Tuple`s and `NamedTuple`s become `Tangent{T}`s
7171
Zygote uses `Tuple`s and `NamedTuple`s to represent the structural tangents for `Tuple`s and `struct`s respectively.
72-
ChainRules core provides a generic `Tangent{T}`(@ref Tangent) to represent the structural tangent of a primal type `T`.
72+
ChainRules core provides a generic [`Tangent{T}`](@ref Tangent) to represent the structural tangent of a primal type `T`.
7373
It takes positional arguments if representing tangent for a `Tuple`.
74-
Or keyword argument to represent the tangent for a `struct`.
74+
Or keyword argument to represent the tangent for a `struct` or a `NamedTuple`.
7575
When representing a `struct` you only need to list the nonzero fields -- any not given are implicit considered to be [`ZeroTangent`](@ref).
7676

7777
When we say structural tangent we mean tangent types that are based only on the structure of the primal.
@@ -84,47 +84,47 @@ For more details see the the [design docs on the many tangent types](@ref manyty
8484
Rules that need to call back into the AD system, e.g, for higher order functions like `map(f, xs)`, need to be changed.
8585
In `ZygoteRules` you can use `ZygoteRules.pullback` or `ZygoteRules._pullback`, which will always result in calling into Zygote.
8686
Since ChainRules is AD agnostic, you can't do that.
87-
Instead you use a [`RuleConfig`](@ref) to specify requirements of an AD system e.g `::RuleConfig{>:CanReverseMode}` work for Zygote,
87+
Instead you use a [`RuleConfig`](@ref) to specify requirements of an AD system e.g `::RuleConfig{>:HasReverseMode}` work for Zygote,
8888
and then use [`rrule_via_ad`](@ref).
8989

9090
See the [docs on calling back into AD](@ref config) for more details.
9191

9292
## Consider adding some thunks
9393

9494
A feature ChainRulesCore offers that ZygoteRules doesn't is support for thunks.
95-
Where work is delayed until it is needed, and avoided if it never is.
95+
Thunks delay work until it is needed, and avoid it if it never is.
9696
See docs on [`@thunk`](@ref), [`Thunk`](@ref), [`InplaceableThunk`](@ref).
9797

98-
You don't have to though.
98+
You don't have to use thunks, though.
9999
It is easy to go overboard with using thunks.
100100

101101
## Testing Changes
102102

103-
One of the advantages of using ChainRules is that you can easily and robustly test it with [ChainRulesTestUtils.jl](https://juliadiff.org/ChainRulesTestUtils.jl/stable/).
104-
This both uses finite differencing to test accuracy of derivative, as well as checking the correctness of the API.
103+
One of the advantages of using ChainRules is that you can easily and robustly test your rules with [ChainRulesTestUtils.jl](https://juliadiff.org/ChainRulesTestUtils.jl/stable/).
104+
This uses finite differencing to test the accuracy of derivative, as well as checks the correctness of the API.
105105
It should catch anything you might have gotten wrong referred to in this page.
106106

107107
The test for the above example is `test_rrule(f, 2.5, Foo(9.9, 7.2), 31.0)`.
108108
You can see it looks a lot like an example call to `rrule`, just with the prefix `test_` added to the start.
109109

110110
## `@nograd` becomes `@non_differentiable`
111111
Probably more or less with no changes.
112-
[`@non_differentiable`](@ref) also lets you specify a signature.
112+
[`@non_differentiable`](@ref) also lets you specify a signature in case you want to restrict non-differentiability to a certain subset of argument types.
113113

114114
## No such thing a `literal_getproperty`
115115
That is just `getproperty`, it takes `Symbol`.
116116
It should constant-fold.
117117
It likely doesn't though as Zygote doesn't play nice with the optimizer.
118118

119119
## Take embedded spaces and types seriously
120-
Traditionally Zygote has taken a very laissez faire attitude towards types and mathematical spaces.
121-
Sometimes treating `Real`s as embedded in the `Complex` plane; some time not.
122-
Sometimes treating sparse and structuredly-sparse matrix as embedded in the space of dense matrixes.
120+
Traditionally Zygote has taken a very laissez-faire attitude towards types and mathematical spaces.
121+
Sometimes treating `Real`s as embedded in the `Complex` plane; sometimes not.
122+
Sometimes treating sparse and structuredly-sparse matrix as embedded in the space of dense matrices.
123123
Writing rules that apply to any `Array{T}` which perhaps are only applicable for `Array{<:Real}` and not so much for `Array{Quaternion}`.
124124
Traditionally ChainRules takes a much more considered approach.
125125

126126
See for example our [docs on how to handle complex numbers](@ref complexfunctions) correctly.
127-
(The outcome of several long long long discussions with a number of expert in our community)
127+
(The outcome of several long long long discussions with a number of experts in our community)
128128

129129
Now, I am not here to tell you what to do in your package, but this is a good time to reconsider how seriously you take these things in the rules you are converting.
130130

0 commit comments

Comments
 (0)