The rotation matrix R is in the special orthogonal group SO(3), which means that it is orthogonal (its columns are normal and orthogonal to each other) and it has determinant +1.
We will use the 6×1 and 4×4 representations interchangeably depending on context.
Note that some other textbooks put the translational part on top and the rotational part below. It doesn’t matter much, but it will affect our notation for things like the adjoint action, differentials, etc below.
Here, we only differentiate with respect to δ∈se(3) around the point δ=0. In other words, we have a function F(T) where T∈SE(3) and we would like to perturb T by a very small δ.
Suppose δ=[ωτ]T.If F is of dimension n, then the resulting derivative is an n×6 Jacobian.
Let T∈SE(3), p∈R3, then the following table lists some useful derivatives.
As you will see later, these are basically all the derivatives you need, and all other derivatives can be easily derived from these, often together with using the adjoint.
We seek to reduce the cost as much as possible. This is a nonlinear least squares problem. During the optimisation process, we perturb it by a small perturbation δ(t)∈se(3):
where scalar weights cGN, cs are chosen such that ∥δ∥≤Δ where the scalar parameter Δ is the radius of the trust region. When Δ is small, the problem is behaving badly and the quadratic approximation that Gauss-Newton uses is not very valid. In this case, cGN is zero, allowing the optimiser to take timid steps along the steepest descent direction. When Δ is large, the quadratic approximation is good and we take bigger steps in the Gauss-Newton direction.
Heuristics are used to determine when to increase Δ and when to decrease it.
When uncertainty is present, the data is associated with some covariance matrix Σ which is an n×n matrix where n is the sise of the residual. The Gauss-Newton update then becomes:
The matrix Σ−1 is also sometimes called the information matrix. In practice, it is useful to factor this matrix, for example, by taking the matrix square root. Let W=Σ−21, then,
In other words, we just pre-multiply the Jacobian and residual by a weight matrix. This is a form of whitening. In the common case where Σ is a diagonal matrix, we simply divide each row of the Jacobian and the residual by the standard deviation.
Another common factorisation is the eigendecomposition of the covariance matrix Σ:
where Q∈SO(n) is the square matrix whose columns are the orthonormal eigenvectors of Σ and Λ is the diagonal matrix whose entries are the eigenvalues. Then,
This is useful, for example, in the case of surfel matches. Recall that the covariance matrix is always symmetric positive semidefinite, allowing for easy eigendecomposition.
However, in many cases, the random variable is not Gaussian. The Gaussian has very thin tails, making it incredibly unlikely to have outliers. In the real life, there are often many outliers that necessitate a more thick-tailed distribution. To model such situations, we need robust loss functions.
The goal is to recover the trajectory of a vehicle as a parametric curve S(t)∈SE(3) as a function of time t.
We may assume the curve does not oscillate faster than some frequency (say, 10~Hz).
Now we should find a trajectory representation that satisfies the following properties:
For some finite period of time, the trajectory should be approximately correct and differentiable.
For any given real number t within the domain of the curve, we can efficiently evaluate S(t) in constant time.
Local perturbations δ(t) may be applied to the curve, so that Snew(t)=S(t)⊕δ(t), such that the perturbation δ(t) may be parameterised with a finite number of parameters, each with compact support.
The trajectory must be independent of the choice of the inertial reference frame.
Our solution to satisfy these requirements is a piecewise linear trajectory.The trajectory is represented as a sequence of elements Si that represent S(ti) for some ti sampled with even spacing at a high frequency (say, 100~Hz).To evaluate S(t), we use a geodesic interpolation for ti≤t<ti+1.
When applying perturbations, the curve δ is a smooth curve which may be thought of as a vector of infinite dimension.
To ensure that the problem is computationally tractable and that Snew remains twice-differentiable, we parameterise the perturbation δ by a finite vector ξ.The vector ξ is the concatenation many six-dimensional vectors ξi∈se(3), such that
Notice that, since δ(t) only applies small local perturbations, it is possible to add together the ξi treating them as vectors in R6.The scalar-valued function β is a basis function with compact support, which means that it is nonzero for a finite contiguous segment of t and zero everywhere else.A good choice is the piecewise polynomial for a cardinal cubic B-spline:
The elements ξi∈se(3) are known as the spline knots. Since each knot is a 6×1 vector, the number of columns of J is 6×nknots where nknots is the number of spline knots.
As you can see, the derivative of the trajectory at any point t in time is a linear combination of the derivatives of up to four spline knots.
knowing that each of these blocks will contribute up to four blocks, weighted with scalar weights β, to the actual Jacobian where we are optimizing the spline knots ξ.
The gravity constraint seeks to penalise the distance between up vector of the car R(T(t))u and the true up vector utrue=[0,0,1]T.
For example, if the sensor were mounted perfectly upright, then u=utrue.
We can also use the accelerometer reading to obtain a different u at each point in time, as long as you remember to subtract the inertial acceleration.
Since the gravity constraint only depends on rotation and not on translation, we only care about the derivative with respect to ω(δ), which we hereafter write as ω.
We are aligning a “moving” or “map” point m to a “static” or “scene” point s by transforming the moving point with the pose T.
The matrix A can be used to store the uncertainty of the point s, i.e. an information matrix (the inverse of a covariance matrix).
If you are registering the point to a line, you can let A be 2×3. A common choice is to compute an orthonormal basis from the direction of the ray, e.g. using Duff et al’s approach. This is commonly used in visual odometry.
If you are registering the point to a plane, you can let A be 1×3.
The velocity constraint seeks to penalise deviations in vehicle velocity from the 6 degree of freedom velocity estimates from another source.
For ease of implementation, the vehicle velocity is obtained by numerical differentiation, e.g. by evaluating the pose at times t1 and t2.Let h=1/(t2−t1).
The regularisation constraint implements a basic Tikhonov regulariser.It seeks to dampen the problem to avoid divergent oscillations, overfitting, or other types of poor convergence.