Skip to content

Commit 2ab00b3

Browse files
committed
fix typos and wrong wording in new distribution and getting started docs
1 parent 90bba2f commit 2ab00b3

3 files changed

Lines changed: 43 additions & 47 deletions

File tree

doxygen/contributor_help_pages/adding_new_distributions.md

Lines changed: 25 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -16,21 +16,21 @@ in the below we'll cover each of the general steps needed to add a new distribut
1616

1717
We will use the normal distribution as an example and adding the lpdf
1818
function, though note for acceptance into Stan math a function must have
19-
it's respective `lpdf`, `lcdf`, `cdf`, `lccdf` and `rng` implimented.
19+
its respective `lpdf`, `lcdf`, `cdf`, `lccdf` and `rng` implemented.
2020
Though we will only be doing the `lpdf` in the below all of the notes here will apply
2121
to the other functions.
2222

2323
So for the normal distribution probability density function
2424

25-
$$
25+
\f[
2626
\text{Normal}(y|\mu,\sigma)=\frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{y-\mu}{\sigma}\right)^2}
27-
$$
27+
\f]
2828

29-
to get the log probabiloity density function we log the above to get
29+
to get the log probability density function we log the above to get
3030

31-
$$
32-
\text{ln}\left(\text{Normal}(y|\mu,\sigma)\right)=-\frac{1}{2} \left(\frac{y-\mu}{\sigma}\right)^2 - \log(\sigma) - \frac{1}{2}\log(2\pi)
33-
$$
31+
\f[
32+
\ln{\left(\text{Normal}(y|\mu,\sigma)\right)}=-\frac{1}{2} \left(\frac{y-\mu}{\sigma}\right)^2 - \ln{\left(\sigma\right)} - \frac{1}{2}\ln{\left(2\pi\right)}
33+
\f]
3434

3535
Now we can directly plug this into Stan as a custom lpdf function
3636

@@ -40,7 +40,7 @@ real new_normal_lpdf (real y, real mu, real sigma){
4040
}
4141
```
4242

43-
This is nice because now we can test this function against another implimentations
43+
This is nice because now we can test this function against another implementations
4444
to verify its correctness. At this point it is a good idea to post something on
4545
[discourse](https://discourse.mc-stan.org/) or file an
4646
[issue](https://github.com/stan-dev/math/issues) to let folks know you would like
@@ -50,7 +50,7 @@ to add this distribution.
5050
### Writing out the Partials
5151

5252
For an efficient implimentation of the distribution we want to calculate
53-
each of it's partials with respect to the distributions inputs. This is easy for normal but can be rough for other distributions.
53+
each of its partials with respect to the distributions inputs. This is easy for normal but can be rough for other distributions.
5454
In that case then [matrixcalculus.org](http://www.matrixcalculus.org/) or [wolframalpha](https://www.wolframalpha.com/) is your friend.
5555
There we can plug in the lpdf and get back each of the partials.
5656

@@ -61,21 +61,18 @@ function. One other nice thing about the matrixcalculus site is that it can gene
6161
which is nice for documentation.
6262

6363

64-
$$
65-
\begin{aligned}
66-
f = \text{ln}\left(\text{Normal}(y|\mu,\sigma)\right) &= -\frac{1}{2} \left(\frac{y-\mu}{\sigma}\right)^2 - \log(\sigma) - \frac{1}{2}\log(2\pi) \cr
64+
\f{aligned}{
65+
f = \text{ln}\left(\text{Normal}(y|\mu,\sigma)\right) &= -\frac{1}{2} \left(\frac{y-\mu}{\sigma}\right)^2 - \ln\left(\sigma\right) - \frac{1}{2}\ln{\left(2\pi\right)} \cr
6766
\frac{\partial f}{\partial y} &= -\frac{y-\mu}{\sigma^{2}} \cr
6867
\frac{\partial f}{\partial \mu} &= \frac{y-\mu}{\sigma^{2}} \cr
6968
\frac{\partial f}{\partial \sigma} &= -\frac{1}{\sigma} + \frac{(y-\mu)^{2}}{\sigma^{3}}
70-
\end{aligned}
71-
$$
69+
\f}
7270

7371
It's a little early, but once we get the `lpdf` function working with the above we will want to get out a pen and paper to simplify and find common subexpressions we only need to calculate once.
7472
For instance in the normal we can compute `y - mu` and `1/sigma`
7573

76-
$$
77-
\begin{aligned}
78-
f(y|\mu,\sigma) = \text{ln}\left(\text{Normal}(y|\mu,\sigma)\right) &= -\frac{1}{2} \left(\frac{y-\mu}{\sigma}\right)^2 - \log(\sigma) - \frac{1}{2}\log(2\pi) \cr
74+
\f{aligned}{
75+
f(y|\mu,\sigma) = \text{ln}\left(\text{Normal}(y|\mu,\sigma)\right) &= -\frac{1}{2} \left(\frac{y-\mu}{\sigma}\right)^2 - \ln{\left(\sigma\right)} - \frac{1}{2}\ln{\left(2\pi\right)} \cr
7976
\frac{\partial f}{\partial y} &= -t_3 \cr
8077
\frac{\partial f}{\partial \mu} &= t_3 \cr
8178
\frac{\partial f}{\partial \sigma} &= \frac{t_{2}^2}{t_1} \cdot t_0 - t_0 \cr
@@ -84,8 +81,7 @@ t_0 &= \frac{1}{\sigma} \cr
8481
t_1 &= t_{0}^2 \cr
8582
t_2 &= y - \mu \cr
8683
t_3 &= \frac{t_2}{t_1}
87-
\end{aligned}
88-
$$
84+
\f}
8985

9086

9187
### Writing the function
@@ -114,14 +110,14 @@ must work for all of Stan's scalar types
114110
`double`, `var`, and `fvar<T>` while accepting mixtures of scalars and vectors.
115111
The return of the function is the joint log probability accumulated over all
116112
of the inputs which is a scalar of the least upper bound of all the
117-
parameters scalar types. That's a lot of big words, but in essense means that
113+
parameters scalar types. That's a lot of big words, but in essence means that
118114
if one of the inputs is a `var` and the others are double the return type needs to be
119115
a `var`. If the input signature contained `fvar<var>`, `var`, `double` then the
120116
return type would be `fvar<var>`. See the [Common pitfalls](@ref common_pitfalls) for an
121117
explanation of `return_type_t`.
122118
123119
Notice the `bool propto` template parameter, this is used by the function to
124-
decide whether or not the function needs to propogate constants to the joint
120+
decide whether or not the function needs to propagate constants to the joint
125121
log probability we'll be calculating.
126122
127123
#### Preparing the Parameters
@@ -176,7 +172,7 @@ if (!include_summand<propto, T_y, T_loc, T_scale>::value) {
176172
The `if` statements here are checking if
177173
178174
1. Any of the inputs are length zero
179-
2. Either the function drops constants `propto=false` or all of the inputs are constant (aka if they are all of type `double`).
175+
2. Either the function drops constants `propto=true` and all of the inputs are constant (aka if they are all of type `double`).
180176
181177
If either of the two conditions are met then there's no need to calculate the
182178
rest of the lpdf function and we can return back zero.
@@ -237,7 +233,7 @@ and scalars can both be iterated over in the loop.
237233

238234
For vectors, `scalar_seq_view` simply holds a reference to the vector it's passed
239235
and calling `scalar_seq_view`'s method `.val(i)` will return element i in the vector
240-
after calling `value_of()` on the element. The actual element can be acessed
236+
after calling `value_of()` on the element. The actual element can be accessed
241237
with `operator[]`. For scalars, `scalar_seq_view`'s `.val(i)` and `operator[]`
242238
will just return the scalar no matter what index is passed.
243239

@@ -283,16 +279,16 @@ for (size_t n = 0; n < N; n++) {
283279
+= -inv_sigma + inv_sigma * y_minus_mu_over_sigma_squared;
284280
}
285281
}
286-
// return `logp` and handle rules for propogating partials for each autodiff type.
282+
// return `logp` and handle rules for propagating partials for each autodiff type.
287283
return ops_partials.build(logp);
288284
```
289285

290286
The `logp` is used to accumulate the log probability density function's value,
291287
where `propto` is used to decide whether or not that value should have constants
292288
added or dropped.
293289

294-
The odd bits here are mostly the `if`s that are of the form
295-
`include_summand<propto>::value` and `!is_constant_all<T_loc>::value`. We want to
290+
The odd bits here are mostly the `if`s that include
291+
`include_summand<propto>` and `!is_constant_all<T_loc>`. We want to
296292
only compute the partials and accumulate the constants if those values are not
297293
constant (`double`), so we have an if statement here, which since the conditional
298294
is a type trait whose value is known at compile time we won't pay for any of these
@@ -324,7 +320,7 @@ them into operands adjoint.
324320
The for loop version is nice and simple, but there's a few things for performance
325321
that we can do better. For instance, in the for loop version we are constantly
326322
reading and writing to memory from a bunch of different places. We can fix that by
327-
rewriting the above to use multiple loops, but unless we have seperate loops
323+
rewriting the above to use multiple loops, but unless we have separate loops
328324
that turn on and off for when combinations of partials need to be calculated
329325
then we lose places where we can share calculations between partials.
330326

@@ -381,7 +377,7 @@ For instance, when calculating `inv_sigma`, if that expression is used for
381377
calculating multiple partials then we want to evaluate it once on that line
382378
and reuse the precomputed operation multiple times in the preceding code. However
383379
if it's not used multiple times then we just want an expression that will then
384-
later be evaluated at it's final destination. The same happens for `y_scaled_sq`
380+
later be evaluated at its final destination. The same happens for `y_scaled_sq`
385381
and `scaled_diff`.
386382
387383
One odd piece of code here is
@@ -424,7 +420,7 @@ let the compiler see we are doing division by 1 and will remove the operation.
424420
But that's it, you can see the full `normal_lpdf` function [here](https://github.com/stan-dev/math/blob/develop/stan/math/prim/prob/normal_lpdf.hpp) in Stan that uses the Eigen version.
425421

426422
One other little piece you'll want to do is add a `normal_lpdf` function with the exact same signature
427-
but without the `propto` paremeter. Unless told otherwise we don't assume users want the proportional constants
423+
but without the `propto` parameter. Unless told otherwise we don't assume users want the proportional constants
428424
added so we have a default signature that does not require setting the `propto` parameter.
429425

430426
```cpp

doxygen/contributor_help_pages/common_pitfalls.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,7 @@ class my_big_type {
203203
204204
We can see in the above that the standard style of a move (the constructor taking an rvalue reference) is to copy the pointer and then null out the original pointer. But in Stan, particularly for reverse mode, we need to keep memory around even if it's only a temporary for when we call the gradient calculations in the reverse pass. And since memory for reverse mode is stored in our arena allocator no copying happens in the first place.
205205
206-
When working with arithmetic types, keep in mind that moving Scalars is often less optimal than simply taking their copy. For instance, Stan's `var` type is a PIMPL implimentation, so it simply holds a pointer of size 8 bytes. A `double` is also 8 bytes which just so happens to fit exactly in a [word](https://en.wikipedia.org/wiki/Word_(computer_architecture)) of most modern CPUs. While a reference to a double is also 8 bytes, unless the function is inlined by the compiler, the computer will have to place the reference into cache, then go fetch the value that is being referenced which now takes up two words instead of one!
206+
When working with arithmetic types, keep in mind that moving Scalars is often less optimal than simply taking their copy. For instance, Stan's `var` type is a PIMPL implementation, so it simply holds a pointer of size 8 bytes. A `double` is also 8 bytes which just so happens to fit exactly in a [word](https://en.wikipedia.org/wiki/Word_(computer_architecture)) of most modern CPUs. While a reference to a double is also 8 bytes, unless the function is inlined by the compiler, the computer will have to place the reference into cache, then go fetch the value that is being referenced which now takes up two words instead of one!
207207
208208
The general rules to follow for passing values to a function are:
209209
@@ -227,7 +227,7 @@ The pointer is cheap to copy around and is safe to copy into lambdas for
227227
228228
As an example, see the implementation of `mdivide_left`
229229
[here](https://github.com/stan-dev/math/blob/develop/stan/math/rev/fun/mdivide_left.hpp)
230-
where `make_callback_ptr` is used to save the result of an Eigen
230+
where `make_callback_ptr()` is used to save the result of an Eigen
231231
Householder QR decomposition for use in the reverse pass.
232232
233233
The implementation is in

doxygen/contributor_help_pages/getting_started.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,15 +6,15 @@ automatic differentiation (autodiff) library behind the Stan language, and
66
the vast majority of the functions exposed at the Stan language level are
77
implemented here in C++.
88

9-
In the course of the Math library's existance, C++ has changed substantially.
9+
In the course of the Math library's existence, C++ has changed substantially.
1010
Math was originally written before C++11. It currently targets C++14. In the near
1111
future it will transition to C++17. With this in mind, there are many different
1212
ways to write Math functions. This guide tries to document best practices,
1313
conventions which not all functions in Math follow, but should be followed for
14-
new code to keep the code from getting unweildy (the old patterns will be
14+
new code to keep the code from getting unwieldy (the old patterns will be
1515
updated eventually).
1616

17-
The title contans "Current State" to emphasize that if any information here is
17+
The title contains "Current State" to emphasize that if any information here is
1818
out of date or any advice does not work, it should be reported as a bug (the
1919
[example-models](https://github.com/stan-dev/example-models)).
2020

@@ -46,14 +46,14 @@ The Stan Math library is spit into 4 main source folders that hold functions, ty
4646

4747
- prim: General `Scalar`, `Matrix`, and `std::vector<T>` types
4848
- rev: Specializations for reverse mode automatic differentiation
49-
- fwd: Specializtions for forward mode automatic differentiation.
49+
- fwd: Specializations for forward mode automatic differentiation.
5050
- mix: Sources to allow mixes of forward and reverse mode.
5151
- opencl: Sources for doing reverse mode automatic differentiation on GPUs.
5252

5353
Within each of those folders you will find any one of the following folders
5454

55-
- core: Base implimentations of custom scalars or backend setup.
56-
- Ex: in `prim` this is operators for complex numbers and the setup for threading, `rev`'s core is the scalar and it's base operators for reverse mode and the stack allocator, and `fwd` has the scalar for forward mode autodiff and it's operators.
55+
- core: Base implementations of custom scalars or backend setup.
56+
- Ex: in `prim` this is operators for complex numbers and the setup for threading, `rev`'s core is the scalar and its base operators for reverse mode and the stack allocator, and `fwd` has the scalar for forward mode autodiff and its operators.
5757
- err: Functions that perform a check and if true throw an error.
5858
- fun: The math functions exposed to the Stan language.
5959
- functor: Functions that take in other functions and data as input such as `reduce_sum()`
@@ -97,7 +97,7 @@ This is pretty standard C++ besides (1), (2), and (3) which I'll go over here.
9797
TL;DR: Before accessing individual coefficients of an Eigen type, use `to_ref()` to make sure it's a type that's safe to access by coefficient.
9898
9999
100-
In the Stan math library we allow functions to accept eigen expressions. This is rather nice as for instance the code
100+
In the Stan math library we allow functions to accept Eigen expressions. This is rather nice as for instance the code
101101
102102
```cpp
103103
Eigen::MatrixXd x = multiply(add(A, multiply(B, C)), add(D, E));
@@ -110,7 +110,7 @@ Eigen::Product<Eigen::Add<Matrix, Eigen::Product<Matrix, Matrix>>, Eigen::Add<Ma
110110

111111
Using lazily evaluated expressions allows Eigen to avoid redundant copies, reads, and writes to our data. However, this comes at a cost.
112112

113-
In (2), when we access the coefficients of `x`, if it's type is similar to the wacky expression above we can get incorrect results as Eigen does not gurantee any safety of results when performing coefficient level access on a expression type that transforms it's inputs. So `to_ref()` looks at it's input type, and if the input type is an Eigen expression that it evaluates the expression so that all the computations are performed and the return object is then safe to access.
113+
In (2), when we access the coefficients of `x`, if its type is similar to the wacky expression above we can get incorrect results as Eigen does not guarantee any safety of results when performing coefficient level access on a expression type that transforms its inputs. So `to_ref()` looks at its input type, and if the input type is an Eigen expression that it evaluates the expression so that all the computations are performed and the return object is then safe to access.
114114

115115
But! Suppose our input is something like
116116

@@ -128,13 +128,13 @@ If we used `auto` here then if the return type of `to_ref()` was an `Eigen::Matr
128128

129129
#### (2) Using `value_type_t<>` and Friends
130130

131-
The [type trait](@ref type_traits) `value_type_t` is one of several type traits we use in the library to query information about types. `value_type_t` will return the inner type of a container,
132-
so `value_type_t<Eigen::Matrix<double, -1, -1>>` will return `double`, `std::vector<std::vector<double>>` will return a `std::vector<double>` and `value_type_t<double>` will simply return a double.
133-
See the module on type traits for a
131+
The type trait `value_type_t` is one of several type traits we use in the library to query information about types. `value_type_t` will return the inner type of a container,
132+
so `value_type_t<Eigen::Matrix<double, -1, -1>>` will return `double`, `value_type_t<std::vector<std::vector<double>>>` will return a `std::vector<double>` and `value_type_t<double>` will simply return a double.
133+
See the module on type traits for a guide on Stan's specific type traits.
134134

135135
#### (3) Accessing Eigen matrices though `.coeff()` and `.coeffRef()`
136136

137-
This is a small bit, but Eigen performs bounds checks by default when using `[]` or `()` to access elements. However `.coeff()` and `.coeffRef()` do not. Because Stan model's perform bounds checking at a higher level it's safe to remove the bounds checks here.
137+
This is a small bit, but Eigen performs bounds checks by default when using `[]` or `()` to access elements. However `.coeff()` and `.coeffRef()` do not. Because Stan model's perform bounds checking at a higher level its safe to remove the bounds checks here.
138138

139139
#### Testing
140140

@@ -182,7 +182,7 @@ struct DenseStorage {
182182
}
183183
```
184184

185-
It's very common for us to want to access just the `val_` or `adj_` of the `vari`'s inside of the `var`'s and so we have written custom methods `.adj()` and `.val()` using Eigen's plugin system which returns an expression of an `Eigen::Matrix<double, Rows, Cols>` representing the `var`'s values and adjoints, respectivly.
185+
It's very common for us to want to access just the `val_` or `adj_` of the `vari`'s inside of the `var`'s and so we have written custom methods `.adj()` and `.val()` using Eigen's plugin system which returns an expression of an `Eigen::Matrix<double, Rows, Cols>` representing the `var`'s values and adjoints, respectively.
186186

187187
#### Using type traits to Expose Our New Function
188188

@@ -264,7 +264,7 @@ all the outputs and inputs. (reverse pass)
264264
265265
Reverse mode autodiff in Math requires a huge number of temporaries and
266266
intermediates to be computed and saved for the reverse pass. There are
267-
so many allocations that the overhead of `malloc()` becomes noticable. To
267+
so many allocations that the overhead of `malloc()` becomes noticeable. To
268268
avoid this, Math provides its own memory arena. The assumptions of
269269
the Math arena are:
270270
@@ -312,7 +312,7 @@ auto myfunc(const T& x) {
312312
313313
## (2) Setting up the Reverse Pass
314314
315-
Once we have stored the data we need for the reverse pass we need to actually write that reverse pass! We need to take our adjoint calculation and put it onto the callback stack so that when the users call `grad()` the adjoints are propogated upwards properly.
315+
Once we have stored the data we need for the reverse pass we need to actually write that reverse pass! We need to take our adjoint calculation and put it onto the callback stack so that when the users call `grad()` the adjoints are propagated upwards properly.
316316
317317
For this we have a function called `reverse_pass_callback()`. Calling `reverse_pass_callback()` with a functor `f` creates an object on the callback stack that will call `f`.
318318
@@ -433,7 +433,7 @@ inline auto sin(const Container& x) {
433433
}
434434
```
435435

436-
In Stan math, a `container` type is an `std::vector` that holds either other `std::vector`s or `Eigen::Matrix` types. So in the first function we use `apply_scalar_unary` to apply `sin()` to either `Scalar`s, `std::vector`s holding scalars, or `Eigen::Matrix` types. The second function which uses `apply_vector_unary()` will apply it's lambda to the container's whose elements are also containers or Eigen matrices.
436+
In Stan math, a `container` type is an `std::vector` that holds either other `std::vector`s or `Eigen::Matrix` types. So in the first function we use `apply_scalar_unary` to apply `sin()` to either `Scalar`s, `std::vector`s holding scalars, or `Eigen::Matrix` types. The second function which uses `apply_vector_unary()` will apply its lambda to the container's whose elements are also containers or Eigen matrices.
437437

438438

439439
### Binary Functions (and least-upper-bound return type semantics)

0 commit comments

Comments
 (0)