fix typos and wrong wording in new distribution and getting started docs

SteveBronder · SteveBronder · commit 2ab00b3fac29 · 2021-05-03T15:20:56.000-04:00
diff --git a/doxygen/contributor_help_pages/adding_new_distributions.md b/doxygen/contributor_help_pages/adding_new_distributions.md
@@ -16,21 +16,21 @@ in the below we'll cover each of the general steps needed to add a new distribut
 
 We will use the normal distribution as an example and adding the lpdf
 function, though note for acceptance into Stan math a function must have
-it's respective `lpdf`, `lcdf`, `cdf`, `lccdf` and `rng` implimented.
+its respective `lpdf`, `lcdf`, `cdf`, `lccdf` and `rng` implemented.
 Though we will only be doing the `lpdf` in the below all of the notes here will apply
 to the other functions.
 
 So for the normal distribution probability density function
 
-$$
+\f[
 \text{Normal}(y|\mu,\sigma)=\frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{y-\mu}{\sigma}\right)^2}
-$$
+\f]
 
-to get the log probabiloity density function we log the above to get
+to get the log probability density function we log the above to get
 
-$$
-\text{ln}\left(\text{Normal}(y|\mu,\sigma)\right)=-\frac{1}{2} \left(\frac{y-\mu}{\sigma}\right)^2 - \log(\sigma) - \frac{1}{2}\log(2\pi)
-$$
+\f[
+\ln{\left(\text{Normal}(y|\mu,\sigma)\right)}=-\frac{1}{2} \left(\frac{y-\mu}{\sigma}\right)^2 - \ln{\left(\sigma\right)} - \frac{1}{2}\ln{\left(2\pi\right)}
+\f]
 
 Now we can directly plug this into Stan as a custom lpdf function
 
@@ -40,7 +40,7 @@ real new_normal_lpdf (real y, real mu, real sigma){
 }
 ```
 
-This is nice because now we can test this function against another implimentations
+This is nice because now we can test this function against another implementations
 to verify its correctness. At this point it is a good idea to post something on
 [discourse](https://discourse.mc-stan.org/) or file an
 [issue](https://github.com/stan-dev/math/issues) to let folks know you would like
@@ -50,7 +50,7 @@ to add this distribution.
 ### Writing out the Partials
 
 For an efficient implimentation of the distribution we want to calculate
-each of it's partials with respect to the distributions inputs. This is easy for normal but can be rough for other distributions.
+each of its partials with respect to the distributions inputs. This is easy for normal but can be rough for other distributions.
 In that case then [matrixcalculus.org](http://www.matrixcalculus.org/) or [wolframalpha](https://www.wolframalpha.com/) is your friend.
 There we can plug in the lpdf and get back each of the partials.
 
@@ -61,21 +61,18 @@ function. One other nice thing about the matrixcalculus site is that it can gene
 which is nice for documentation.
 
 
-$$
-\begin{aligned}
-f = \text{ln}\left(\text{Normal}(y|\mu,\sigma)\right) &= -\frac{1}{2} \left(\frac{y-\mu}{\sigma}\right)^2 - \log(\sigma) - \frac{1}{2}\log(2\pi) \cr
+\f{aligned}{
+f = \text{ln}\left(\text{Normal}(y|\mu,\sigma)\right) &= -\frac{1}{2} \left(\frac{y-\mu}{\sigma}\right)^2 - \ln\left(\sigma\right) - \frac{1}{2}\ln{\left(2\pi\right)} \cr
 \frac{\partial f}{\partial y} &= -\frac{y-\mu}{\sigma^{2}} \cr
 \frac{\partial f}{\partial \mu} &= \frac{y-\mu}{\sigma^{2}} \cr
 \frac{\partial f}{\partial \sigma} &= -\frac{1}{\sigma} + \frac{(y-\mu)^{2}}{\sigma^{3}}
-\end{aligned}
-$$
+\f}
 
 It's a little early, but once we get the `lpdf` function working with the above we will want to get out a pen and paper to simplify and find common subexpressions we only need to calculate once.
 For instance in the normal we can compute `y - mu` and `1/sigma`
 
-$$
-\begin{aligned}
-f(y|\mu,\sigma) = \text{ln}\left(\text{Normal}(y|\mu,\sigma)\right) &= -\frac{1}{2} \left(\frac{y-\mu}{\sigma}\right)^2 - \log(\sigma) - \frac{1}{2}\log(2\pi) \cr
+\f{aligned}{
+f(y|\mu,\sigma) = \text{ln}\left(\text{Normal}(y|\mu,\sigma)\right) &= -\frac{1}{2} \left(\frac{y-\mu}{\sigma}\right)^2 - \ln{\left(\sigma\right)} - \frac{1}{2}\ln{\left(2\pi\right)} \cr
 \frac{\partial f}{\partial y} &= -t_3 \cr
 \frac{\partial f}{\partial \mu} &= t_3 \cr
 \frac{\partial f}{\partial \sigma} &= \frac{t_{2}^2}{t_1} \cdot t_0 - t_0 \cr
@@ -84,8 +81,7 @@ t_0 &= \frac{1}{\sigma} \cr
 t_1 &= t_{0}^2 \cr
 t_2 &= y - \mu \cr
 t_3 &= \frac{t_2}{t_1}
-\end{aligned}
-$$
+\f}
 
 
 ### Writing the function
@@ -114,14 +110,14 @@ must work for all of Stan's scalar types
 `double`, `var`, and `fvar<T>` while accepting mixtures of scalars and vectors.
 The return of the function is the joint log probability accumulated over all
 of the inputs which is a scalar of the least upper bound of all the
-parameters scalar types. That's a lot of big words, but in essense means that
+parameters scalar types. That's a lot of big words, but in essence means that
 if one of the inputs is a `var` and the others are double the return type needs to be
 a `var`. If the input signature contained `fvar<var>`, `var`, `double` then the
 return type would be `fvar<var>`. See the [Common pitfalls](@ref common_pitfalls) for an
 explanation of `return_type_t`.
 
 Notice the `bool propto` template parameter, this is used by the function to
-decide whether or not the function needs to propogate constants to the joint
+decide whether or not the function needs to propagate constants to the joint
 log probability we'll be calculating.
 
 #### Preparing the Parameters
@@ -176,7 +172,7 @@ if (!include_summand<propto, T_y, T_loc, T_scale>::value) {
 The `if` statements here are checking if
 
 1. Any of the inputs are length zero
-2. Either the function drops constants `propto=false` or all of the inputs are constant (aka if they are all of type `double`).
+2. Either the function drops constants `propto=true` and all of the inputs are constant (aka if they are all of type `double`).
 
 If either of the two conditions are met then there's no need to calculate the
 rest of the lpdf function and we can return back zero.
@@ -237,7 +233,7 @@ and scalars can both be iterated over in the loop.
 
 For vectors, `scalar_seq_view` simply holds a reference to the vector it's passed
 and calling `scalar_seq_view`'s method `.val(i)` will return element i in the vector
-after calling `value_of()` on the element. The actual element can be acessed
+after calling `value_of()` on the element. The actual element can be accessed
 with `operator[]`. For scalars, `scalar_seq_view`'s `.val(i)` and `operator[]`
 will just return the scalar no matter what index is passed.
 
@@ -283,16 +279,16 @@ for (size_t n = 0; n < N; n++) {
         += -inv_sigma + inv_sigma * y_minus_mu_over_sigma_squared;
   }
 }
-// return `logp` and handle rules for propogating partials for each autodiff type.
+// return `logp` and handle rules for propagating partials for each autodiff type.
 return ops_partials.build(logp);
 ```
 
 The `logp` is used to accumulate the log probability density function's value,
 where `propto` is used to decide whether or not that value should have constants
 added or dropped.
 
-The odd bits here are mostly the `if`s that are of the form
-`include_summand<propto>::value` and `!is_constant_all<T_loc>::value`. We want to
+The odd bits here are mostly the `if`s that include
+`include_summand<propto>` and `!is_constant_all<T_loc>`. We want to
 only compute the partials and accumulate the constants if those values are not
 constant (`double`), so we have an if statement here, which since the conditional
 is a type trait whose value is known at compile time we won't pay for any of these
@@ -324,7 +320,7 @@ them into operands adjoint.
 The for loop version is nice and simple, but there's a few things for performance
 that we can do better. For instance, in the for loop version we are constantly
 reading and writing to memory from a bunch of different places. We can fix that by
-rewriting the above to use multiple loops, but unless we have seperate loops
+rewriting the above to use multiple loops, but unless we have separate loops
 that turn on and off for when combinations of partials need to be calculated
 then we lose places where we can share calculations between partials.
 
@@ -381,7 +377,7 @@ For instance, when calculating `inv_sigma`, if that expression is used for
 calculating multiple partials then we want to evaluate it once on that line
 and reuse the precomputed operation multiple times in the preceding code. However
 if it's not used multiple times then we just want an expression that will then
-later be evaluated at it's final destination. The same happens for `y_scaled_sq`
+later be evaluated at its final destination. The same happens for `y_scaled_sq`
 and `scaled_diff`.
 
 One odd piece of code here is
@@ -424,7 +420,7 @@ let the compiler see we are doing division by 1 and will remove the operation.
 But that's it, you can see the full `normal_lpdf` function [here](https://github.com/stan-dev/math/blob/develop/stan/math/prim/prob/normal_lpdf.hpp) in Stan that uses the Eigen version.
 
 One other little piece you'll want to do is add a `normal_lpdf` function with the exact same signature
-but without the `propto` paremeter. Unless told otherwise we don't assume users want the proportional constants
+but without the `propto` parameter. Unless told otherwise we don't assume users want the proportional constants
 added so we have a default signature that does not require setting the `propto` parameter.
 
 ```cpp
diff --git a/doxygen/contributor_help_pages/common_pitfalls.md b/doxygen/contributor_help_pages/common_pitfalls.md
@@ -203,7 +203,7 @@ class my_big_type {
 
 We can see in the above that the standard style of a move (the constructor taking an rvalue reference) is to copy the pointer and then null out the original pointer. But in Stan, particularly for reverse mode, we need to keep memory around even if  it's only a temporary for when we call the gradient calculations in the reverse pass. And since memory for reverse mode is stored in our arena allocator no copying happens in the first place.
 
-When working with arithmetic types, keep in mind that moving Scalars is often less optimal than simply taking their copy. For instance, Stan's `var` type is a PIMPL implimentation, so it simply holds a pointer of size 8 bytes. A `double` is also 8 bytes which just so happens to fit exactly in a [word](https://en.wikipedia.org/wiki/Word_(computer_architecture)) of most modern CPUs. While a reference to a double is also 8 bytes, unless the function is inlined by the compiler, the computer will have to place the reference into cache, then go fetch the value that is being referenced which now takes up two words instead of one!
+When working with arithmetic types, keep in mind that moving Scalars is often less optimal than simply taking their copy. For instance, Stan's `var` type is a PIMPL implementation, so it simply holds a pointer of size 8 bytes. A `double` is also 8 bytes which just so happens to fit exactly in a [word](https://en.wikipedia.org/wiki/Word_(computer_architecture)) of most modern CPUs. While a reference to a double is also 8 bytes, unless the function is inlined by the compiler, the computer will have to place the reference into cache, then go fetch the value that is being referenced which now takes up two words instead of one!
 
 The general rules to follow for passing values to a function are:
 
@@ -227,7 +227,7 @@ The pointer is cheap to copy around and is safe to copy into lambdas for
 
 As an example, see the implementation of `mdivide_left`
 [here](https://github.com/stan-dev/math/blob/develop/stan/math/rev/fun/mdivide_left.hpp)
-where `make_callback_ptr` is used to save the result of an Eigen
+where `make_callback_ptr()` is used to save the result of an Eigen
 Householder QR decomposition for use in the reverse pass.
 
 The implementation is in
diff --git a/doxygen/contributor_help_pages/getting_started.md b/doxygen/contributor_help_pages/getting_started.md
@@ -6,15 +6,15 @@ automatic differentiation (autodiff) library behind the Stan language, and
 the vast majority of the functions exposed at the Stan language level are
 implemented here in C++.
 
-In the course of the Math library's existance, C++ has changed substantially.
+In the course of the Math library's existence, C++ has changed substantially.
 Math was originally written before C++11. It currently targets C++14. In the near
 future it will transition to C++17. With this in mind, there are many different
 ways to write Math functions. This guide tries to document best practices,
 conventions which not all functions in Math follow, but should be followed for
-new code to keep the code from getting unweildy (the old patterns will be
+new code to keep the code from getting unwieldy (the old patterns will be
 updated eventually).
 
-The title contans "Current State" to emphasize that if any information here is
+The title contains "Current State" to emphasize that if any information here is
 out of date or any advice does not work, it should be reported as a bug (the
 [example-models](https://github.com/stan-dev/example-models)).
 
@@ -46,14 +46,14 @@ The Stan Math library is spit into 4 main source folders that hold functions, ty
 
 - prim: General `Scalar`, `Matrix`, and `std::vector<T>` types
 - rev: Specializations for reverse mode automatic differentiation
-- fwd: Specializtions for forward mode automatic differentiation.
+- fwd: Specializations for forward mode automatic differentiation.
 - mix: Sources to allow mixes of forward and reverse mode.
 - opencl: Sources for doing reverse mode automatic differentiation on GPUs.
 
 Within each of those folders you will find any one of the following folders
 
-- core: Base implimentations of custom scalars or backend setup.
-  - Ex: in `prim` this is operators for complex numbers and the setup for threading, `rev`'s core is the scalar and it's base operators for reverse mode and the stack allocator, and `fwd` has the scalar for forward mode autodiff and it's operators.
+- core: Base implementations of custom scalars or backend setup.
+  - Ex: in `prim` this is operators for complex numbers and the setup for threading, `rev`'s core is the scalar and its base operators for reverse mode and the stack allocator, and `fwd` has the scalar for forward mode autodiff and its operators.
 - err: Functions that perform a check and if true throw an error.
 - fun: The math functions exposed to the Stan language.
 - functor: Functions that take in other functions and data as input such as `reduce_sum()`
@@ -97,7 +97,7 @@ This is pretty standard C++ besides (1), (2), and (3) which I'll go over here.
 TL;DR: Before accessing individual coefficients of an Eigen type, use `to_ref()` to make sure it's a type that's safe to access by coefficient.
 
 
-In the Stan math library we allow functions to accept eigen expressions. This is rather nice as for instance the code
+In the Stan math library we allow functions to accept Eigen expressions. This is rather nice as for instance the code
 
 ```cpp
 Eigen::MatrixXd x = multiply(add(A, multiply(B, C)), add(D, E));
@@ -110,7 +110,7 @@ Eigen::Product<Eigen::Add<Matrix, Eigen::Product<Matrix, Matrix>>, Eigen::Add<Ma
 
 Using lazily evaluated expressions allows Eigen to avoid redundant copies, reads, and writes to our data. However, this comes at a cost.
 
-In (2), when we access the coefficients of `x`, if it's type is similar to the wacky expression above we can get incorrect results as Eigen does not gurantee any safety of results when performing coefficient level access on a expression type that transforms it's inputs. So `to_ref()` looks at it's input type, and if the input type is an Eigen expression that it evaluates the expression so that all the computations are performed and the return object is then safe to access.
+In (2), when we access the coefficients of `x`, if its type is similar to the wacky expression above we can get incorrect results as Eigen does not guarantee any safety of results when performing coefficient level access on a expression type that transforms its inputs. So `to_ref()` looks at its input type, and if the input type is an Eigen expression that it evaluates the expression so that all the computations are performed and the return object is then safe to access.
 
 But! Suppose our input is something like
 
@@ -128,13 +128,13 @@ If we used `auto` here then if the return type of `to_ref()` was an `Eigen::Matr
 
 #### (2) Using `value_type_t<>` and Friends
 
-The [type trait](@ref type_traits) `value_type_t` is one of several type traits we use in the library to query information about types. `value_type_t` will return the inner type of a container,
-so `value_type_t<Eigen::Matrix<double, -1, -1>>` will return `double`, `std::vector<std::vector<double>>` will return a `std::vector<double>` and `value_type_t<double>` will simply return a double.
-See the module on type traits for a
+The type trait `value_type_t` is one of several type traits we use in the library to query information about types. `value_type_t` will return the inner type of a container,
+so `value_type_t<Eigen::Matrix<double, -1, -1>>` will return `double`, `value_type_t<std::vector<std::vector<double>>>` will return a `std::vector<double>` and `value_type_t<double>` will simply return a double.
+See the module on type traits for a guide on Stan's specific type traits.
 
 #### (3) Accessing Eigen matrices though `.coeff()` and `.coeffRef()`
 
-This is a small bit, but Eigen performs bounds checks by default when using `[]` or `()` to access elements. However `.coeff()` and `.coeffRef()` do not. Because Stan model's perform bounds checking at a higher level it's safe to remove the bounds checks here.
+This is a small bit, but Eigen performs bounds checks by default when using `[]` or `()` to access elements. However `.coeff()` and `.coeffRef()` do not. Because Stan model's perform bounds checking at a higher level its safe to remove the bounds checks here.
 
 #### Testing
 
@@ -182,7 +182,7 @@ struct DenseStorage {
 }
 ```
 
-It's very common for us to want to access just the `val_` or `adj_` of the `vari`'s inside of the `var`'s and so we have written custom methods `.adj()` and `.val()` using Eigen's plugin system which returns an expression of an `Eigen::Matrix<double, Rows, Cols>` representing the `var`'s values and adjoints, respectivly.
+It's very common for us to want to access just the `val_` or `adj_` of the `vari`'s inside of the `var`'s and so we have written custom methods `.adj()` and `.val()` using Eigen's plugin system which returns an expression of an `Eigen::Matrix<double, Rows, Cols>` representing the `var`'s values and adjoints, respectively.
 
 #### Using type traits to Expose Our New Function
 
@@ -264,7 +264,7 @@ all the outputs and inputs. (reverse pass)
 
 Reverse mode autodiff in Math requires a huge number of temporaries and
 intermediates to be computed and saved for the reverse pass. There are
-so many allocations that the overhead of `malloc()` becomes noticable. To
+so many allocations that the overhead of `malloc()` becomes noticeable. To
 avoid this, Math provides its own memory arena. The assumptions of
 the Math arena are:
 
@@ -312,7 +312,7 @@ auto myfunc(const T& x) {
 
 ## (2) Setting up the Reverse Pass
 
-Once we have stored the data we need for the reverse pass we need to actually write that reverse pass! We need to take our adjoint calculation and put it onto the callback stack so that when the users call `grad()` the adjoints are propogated upwards properly.
+Once we have stored the data we need for the reverse pass we need to actually write that reverse pass! We need to take our adjoint calculation and put it onto the callback stack so that when the users call `grad()` the adjoints are propagated upwards properly.
 
 For this we have a function called `reverse_pass_callback()`. Calling `reverse_pass_callback()` with a functor `f` creates an object on the callback stack that will call `f`.
 
@@ -433,7 +433,7 @@ inline auto sin(const Container& x) {
 }
 ```
 
-In Stan math, a `container` type is an `std::vector` that holds either other `std::vector`s or `Eigen::Matrix` types. So in the first function we use `apply_scalar_unary` to apply `sin()` to either `Scalar`s, `std::vector`s holding scalars, or  `Eigen::Matrix` types. The second function which uses `apply_vector_unary()` will apply it's lambda to the container's whose elements are also containers or Eigen matrices.
+In Stan math, a `container` type is an `std::vector` that holds either other `std::vector`s or `Eigen::Matrix` types. So in the first function we use `apply_scalar_unary` to apply `sin()` to either `Scalar`s, `std::vector`s holding scalars, or  `Eigen::Matrix` types. The second function which uses `apply_vector_unary()` will apply its lambda to the container's whose elements are also containers or Eigen matrices.
 
 
 ### Binary Functions (and least-upper-bound return type semantics)