分类算法与数学基础下的文章

非重叠艾伦方差与重叠艾伦方差

作者: xsh
时间: 2026-01-28
分类: 算法与数学基础,传感器标定
评论

https://www.allaboutcircuits.com/technical-articles/intro-to-allan-variance-analysis-non-overlapping-and-overlapping-allan-variance/

Learn about non-overlapping and overlapping Allan variance and how the Allan variance curve can be used to identify different types of random errors present in a signal.

Allan variance is a statistical analysis tool for identifying various noise types that exist in a signal. Developed in the mid-1960s, the Allan variance was used to measure the frequency stability of precision oscillators. Later, this technique was applied to other areas as well. One common use case of the Allan variance is identifying the random errors that affect microelectromechanical systems (MEMS) gyroscopes.

In this article, we’ll introduce two basic versions of the Allan variance and see how the Allan variance curve can be used to identify different types of random errors present in a signal. To have a better understanding of the Allan variance technique, let’s first take a look at the signal averaging technique that is closely related to the theory behind the Allan variance.

Signal Averaging for Noise Reduction

First, let's assume that we are measuring a near-DC value affected by white Gaussian noise with a variance of:

$σ_{n}^{2} = 10^{- 12}$

To reduce the noise power, we can repeat the measurement M times and average the measured values. Since the noise samples are not correlated, averaging suppresses the noise terms without affecting the value of the desired DC value. It can be shown that the noise variance of the averaged signal, $σ_{n, a v g}^{2}$ , is given by:

$σ_{n, a v g}^{2} = \frac{σ_{n}^{2}}{M}$

If we plot $σ_{n, a v g}^{2}$ against M as a log-log graph, we’ll have a line with a slope of -1. This is shown in Figure 1.

A graph showing variance vs number of averaged samples

Figure 1. A graph showing variance vs number of averaged samples

Averaging more and more samples, we can reduce the noise variance from 10^-12 to below 10^-15. Thus, we can determine the DC input value more accurately. Although averaging can be an effective solution in certain applications, it has its own limitations: it assumes that the noise samples are not correlated with each other.

Now assume that, in addition to being affected by white Gaussian noise, our signal also slightly drifts over time. This is shown in Figure 2.

A graph showing how the signal drifts slightly over time.

Figure 2. A graph showing how the signal drifts slightly over time.

If we apply signal averaging to the above data, we’ll obtain the brown curve below.

A graph showing the change when a signal average is undergone.

Figure 3. A graph showing the change when a signal average is undergone.

As you can see, the signal averaging technique loses its effectiveness when there is a drift in the signal. In this particular example, the variance is almost constant above about M = 200. In fact, depending on the type of the random drift affecting a given application, the variance of the averaged signal can even increase above certain values of M. Using a graph of $σ_{n, a v g}^{2}$ against M, we can determine how many samples should be averaged to minimize the noise variance.

The Allan variance is based on a similar idea. It calculates the root mean square value of the random drift error as a function of the averaging time. Interestingly, the graph of the Allan variance doesn’t just reveal the averaging time that minimizes the noise variance, it also gives us some useful information about the different types of random processes that contribute to the drift at the circuit output.

Non-overlapping and Overlapping Allan Variance

As mentioned earlier, Allan variance was first used to measure the frequency stability of precision oscillators and has spread out to other applications. For example, the Allan variance is widely used to examine the random errors of MEMS gyroscopes. In this case, the Allan variance gives the root mean square random drift error of the gyroscope as a function of the averaging time.

Since there are several different versions of the Allan variance, we’ll take a look at the non-overlapping and fully overlapping versions of the Allen variance that are commonly used to analyze the random errors of inertial sensors. The goal will be to use the notation suitable for analyzing the output of a gyroscope.

Non-overlapping Allan Variance

To calculate the Allan variance, we need to collect a large number of samples from the gyroscope when it is motionless. To accomplish this, let's assume that we have a dataset of N samples from the gyroscope output Ω(t) at multiples of the sampling time t₀. With the non-overlapping Allan variance technique, these N samples are divided into K disjoint groups of equal length where each group (or cluster) has N samples. If the sampling time is t₀, the total time duration of each group is T = nt₀.

Figure 4 shows the gyroscope output samples and how the clusters of the non-overlapping Allan variance are formed for n = 3.

Example gyroscope output samples and how non-overlapping Allan variance clusters are formed.

Figure 4. Example gyroscope output samples and how non-overlapping Allan variance clusters are formed.

In this example, each cluster has a time duration of 3t₀. The first cluster (k = 1) includes the first 3 samples at t₀, 2t₀, and 3t₀. The second cluster (k = 2) includes the samples at 4t₀, 5t₀, 6t_0, and so on. To find the Allan variance, we need to calculate the average of each cluster.

The average of the kth cluster is denoted by ${¯ ¯¯ ¯ Ω}_{k} (T)$ in Figure 4. Having the cluster averages, we can now apply the following equation to find the Allan variance:

$σ^{2} (T) = \frac{1}{2 (K - 1)} K - 1 \sum k = 1 ({¯ ¯¯ ¯ Ω}_{k + 1} (T) - {¯ ¯¯ ¯ Ω}_{k} (T))^{2}$

Equation 1.

Note that K denotes the total number of clusters.

Overlapping Allan Variance

The overlapping Allan variance tends to perform better than a non-overlapping version for large datasets (large values of N). Similar to the non-overlapping version, the N samples from the gyroscope output Ω(t) are divided into K groups of equal length where each group (or cluster) has n samples. However, in this case, the clusters overlap each other.

Figure 5 shows how the first 6 clusters of the overlapping Allan variance are formed for n = 3.

First six clusters of the overlapping Allan variance for n = 3.

Figure 5. First six clusters of the overlapping Allan variance for n = 3.

Note that, in the overlapping version, each cluster is specified by an index k which corresponds to the first sample in the group. In the example depicted above, each cluster has a time duration of 3t₀. The first cluster (k = 1) includes the first 3 samples at t₀, 2t₀, and 3t₀. The second cluster (k = 2) includes the samples at 2t₀, 3t₀, 4t_0, and so on. Thus, the clusters overlap each other.

The color of the short lines below the samples specifies what clusters each sample is included in. Again, to find the Allan variance, we need to calculate the average of each cluster. The average of the kth cluster is denoted by ${¯ ¯¯ ¯ Ω}_{k} (T)$ in Figure 5.

Having the cluster averages, we can now apply the following equation to find the Allan variance:

$σ^{2} (T) = \frac{1}{2 (N - 2 n + 1)} N - 2 n + 1 \sum k = 1 ({¯ ¯¯ ¯ Ω}_{k + n} (T) - {¯ ¯¯ ¯ Ω}_{k} (T))^{2}$

Equation 2.

For n = 3, the first term of the above summation (k = 1) involves the cluster averages ${_{1}} (T)$ and ${_{4}} (T)$ . Figure 6 shows these two clusters.

Example of the two overlapping Allan variance clusters.

Figure 6. Example of the two overlapping Allan variance clusters.

The next term of the summation (k = 2) requires ${_{2}} (T)$ and ${_{5}} (T)$ that are shown in Figure 7.

Figure 7.

Finally, taking the square root of the result obtained from either Equations 1 or 2, we obtain the value of the root Allan variance or the Allan deviation for a particular value of T.

Typical Allan Deviation Plot

The Allan deviation versus the cluster time (T) is commonly plotted in a log-log graph. Generally, the Allan variance curve is U-shaped. Figure 8 shows a typical Allan deviation curve encountered when analyzing the output of a stationary inertial sensor.

A general Allan deviation curve for a stationary inertial sensor.

Figure 8. A general Allan deviation curve for a stationary inertial sensor.

The horizontal axis (T) can take different units in time (e.g., microseconds, seconds, minutes, or hours). The unit of the vertical axis is the angular velocity (e.g., rad/s or °/h) for gyroscopes or acceleration (given in terms of g or m/s²) in the case of accelerometers.

As you can see, depending on the value of cluster time (T), the slope of the curve can take different values. Overall, the Allan variance creates a way to identify and quantify various noise terms within the data.

There are several different random errors, such as quantization noise, angle random walk, bias instability, rate random walk, and rate ramp, that can introduce errors to the gyroscope output. Among these, quantization noise, bias instability, and rate random walk are usually the most significant effects.

加速度计与重力方向判别

作者: xsh
时间: 2026-01-27
分类: 算法与数学基础,传感器标定
评论

李群李代数在SLAM系统中的使用

作者: xsh
时间: 2026-01-15
分类: 算法与数学基础
评论

这篇文章针对有一定SLAM基础的同学或者对李群李代数的应用感兴趣的数学专业同学,已经很小众了，但对于真正干这行并想要更深理解的同学可能会很有帮助，因此花了些时间整理发出来。要从理论推导到后面代码应用都读懂难度还是比较大，尽量讲解。

李群李代数其实已经算SLAM里最深的数学知识了(至少我目前接触到的)，而事实上的SLAM应用也只是用了它们中的很小一部分。这篇文章不会详细地介绍李群李代数，毕竟google或者百度一下lie group SLAM之类的已经有不少人写过相关博客或者论文了。这篇文章的重点是针对开源的SLAM系统，推导出和代码中应用一致的公式，以帮助相关专业的同学知道李群李代数的现实应用。下面这个链接是<视觉SLAM十四讲>的作者所写的关于李群李代数的入门视觉SLAM中的数学基础第三篇李群与李代数 - 半闲居士 - 博客园关于定义之类的我就不写了，提取出几个比较关键的部分。 李群关键点: 旋转矩阵

\mathbf{R}

和转换矩阵

\mathbf{T} = \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0} & 1 \end{bmatrix}

分别属于SO(3)和SE(3)李群，我们重点针对SO(3)讲解。李群有几个(仅仅是这篇文章要使用的)重要性质. 性质1: SO(3)和SE(3)李群矩阵和它的转置互为逆矩阵

\mathbf{R}\mathbf{R}^T = \mathbf{1}

，行列式为1(有为-1的情况但不会在SLAM中出现)

det \mathbf{R} = 1

。 性质2: 加法不封闭性和乘法封闭性，即两个李群矩阵相加后一般就不再是李群了，也就是李群当中不存在加法。两个李群想乘之后仍然是李群。现在如果有一个物体旋转了角度

\mathbf{R}_1

，再此基础上我们如果再旋转一定角度，数学操作是对这个

\mathbf{R}_1

再乘上另一个旋转矩阵

\mathbf{R}_2

，最终得到的

\mathbf{R}_2\mathbf{R}_1

仍然是一个旋转矩阵。但是你如果把这两个旋转矩阵加起来

\mathbf{R}_1 + \mathbf{R}_2

则没有任何实际的物理意义。性质3:李群中的元素并不是独立的，一个旋转矩阵有9个元素，而实际旋转的自由度是3. 李代数关键点:每一个李群都有一个对应的李代数，我们记为小写的so(3)和se(3)。李代数的元素是线性独立的。一个旋转矩阵对应的李代数有3个元素。记一个李代数为

\phi = [\phi_1, \phi_2, \phi_3]^T

，定义如下操作

\phi^{\wedge} = \begin{bmatrix} 0 & -\phi_3 & \phi_2 \\ \phi_3 & 0 & \phi_1 \\ -\phi_2 & \phi_1 & 1 \end{bmatrix}

一个李群和它的李代数的对应关系为矩阵指数关系

\mathbf{R} = exp(\phi^{\wedge}) = \mathbf{I} + \phi^{\wedge} + \frac{1}{2!} (\phi^{\wedge})^2 + \frac{1}{3!} (\phi^{\wedge})^3 +... = \sum_{n=0}^{\infty}\frac{1}{n!} (\phi^{\wedge})^n \tag{1}

李代数是有加法封闭性的，即两个李代数相加

\phi_1+ \phi_2

仍可以得到一个李代数，对这个新的so(3)李代数进行指数操作可以得到一个新的旋转矩阵

exp((\phi_1+ \phi_2)^{\wedge})

。李群李代数之间的关系对旋转的求导有很大的帮助。第一个链接前言部分里的公式(2)所提到的非线性最小二乘就是SLAM问题的关键。我把那个公式再写一下

\underset{\mathbf{T}}{min} \ u(\mathbf{T}) = \sum_{i=1}^{N}|| \mathbf{q}_i - \mathbf{T}\mathbf{p}_i||^2 \tag{2}

已知两对匹配的3d点(实际应用中可能是图像的2d点和它现实中的3d点的匹配，或者2d图像点和它在另一图像中的对应2d点匹配)

\mathbf{p}和\mathbf{q}

，找到一个转换矩阵

\mathbf{T}

使他们匹配后点与点之间的距离最小。最小二乘都会涉及到对要求解的量的求导。即

\frac{ d(\mathbf{q}_i - \mathbf{T}\mathbf{p}_i)}{d\mathbf{T}}

\mathbf{p}和\mathbf{q}

是已知常量，所以关键在于

\frac{ d( \mathbf{T}\mathbf{p}_i)}{d\mathbf{T}} \tag{3}

这个式子乍一看还不简单？上下的\mathbf{T}约掉，结果自然就是

\mathbf{p}

，但是实际上它是一个转换矩阵，对一个转换矩阵，这么做是错误的。回想导数的定义

\frac{df(x)}{dx} = \lim_{\delta x \rightarrow 0} \frac{f(x+\delta x) - f(x)}{\delta x} \tag{4}

应用在式子3中就应该是

\frac{d(\mathbf{T}\mathbf{p}_i)}{d\mathbf{T}} = \lim_{\delta \mathbf{T} \rightarrow 0} \frac{(\mathbf{T}+\delta \mathbf{T})\mathbf{p} - \mathbf{T}\mathbf{p}}{\delta \mathbf{T}} \tag{5}

现在你应该看到问题来了，由于李群的加法不封闭性，

(\mathbf{T}+\delta \mathbf{T})

根本就没有意义，不再是李群。导数的定义对李群不再有意义，我们上面假设能分子分母直接划掉相同的变量得到结果是根据导数的定义得来的，即假设加法成立，我们的式5就可以得到

\frac{d(\mathbf{T}\mathbf{p}_i)}{d\mathbf{T}} = \lim_{\delta \mathbf{T} \rightarrow 0} \frac{(\mathbf{T}+\delta \mathbf{T})\mathbf{p} - \mathbf{T}\mathbf{p}}{\delta \mathbf{T}} = \lim_{\delta \mathbf{T} \rightarrow 0} \frac{\delta \mathbf{T}\mathbf{p}}{\delta \mathbf{T}} = \mathbf{p}

但很遗憾，不行。那么我们如何对转换矩阵求导呢？我们下面要讲解的应用中，把对转换矩阵的求导分开为对位移的求导和对旋转的求导(实际上有直接的公式可以应用不用分开，但是分开求会很直观)。我们前面写到

\mathbf{T} = \begin{bmatrix} \mathbf{R} & \mathbf{t} \\ \mathbf{0} & 1 \end{bmatrix}

，

\mathbf{p}

写为齐次点即

(\mathbf{p}, 1)^T

，代入式(3)，我们得到

\frac{ d( \mathbf{R}\mathbf{p}_i + \mathbf{t},1)^T}{d\mathbf{T}}

1作为常数部分我们不管，前半部分可以分别对旋转求导和对平移求偏导。

\frac{ \partial( \mathbf{R}\mathbf{p}_i + \mathbf{t})}{\partial \mathbf{t}} \\ \frac{ \partial( \mathbf{R}\mathbf{p}_i + \mathbf{t})}{\partial \mathbf{R}}

对位移的偏导非常好解决，位移是有加法的，求偏导时

\mathbf{R}\mathbf{p}_i

视为常亮，所以上式的位移部分可以直接得到答案那就是单位矩阵

\mathbf{I}

。一切的问题都来源于对旋转矩阵的偏导

\frac{ \partial( \mathbf{R}\mathbf{p}_i + \mathbf{t})}{\partial \mathbf{R}} = \frac{ \partial( \mathbf{R}\mathbf{p}_i )}{\partial \mathbf{R}}

旋转矩阵也是不存在加法的。那么我们该怎么办呢？接下来就是李代数的应用了。我们把上式等价于对旋转的李代数求导！为什么可以这么等价？我给出一个直观的解释。我们知道，平移之所以能很简单的求导是因为满足加法封闭性，即如果我们把对平移的求导应用到式(4)中，会有一个

\delta \mathbf{t}

，这个

\delta \mathbf{t}

趋于0表示针对当前的

\mathbf{t}

趋于不移动。当对旋转求导时，我们应该找到一个小量，"加"到旋转上，使物体对于当前旋转，不再旋转，只要我们设计出一个小量，满足这个条件，我们就可以针对这个小量求导。我们不能直接加上一个所有元素都趋于0的旋转小量

\delta\mathbf{R}

，但是旋转对应的李代数可以。假设针对当前的旋转的李代数

\mathbf{R} = exp(\phi^{\wedge})

，加上一个小量，

\mathbf{R}' = exp((\phi + \delta \phi)^{\wedge})

，当

\delta \phi

趋于0时，旋转不变，于是乎，我们可以针对这个小量求导。有点晦涩，多思考一下，这已经是我能想到的最直观的解释了，理论证明可无穷尽也。那么对旋转的求导可以变换为

\begin{align} \frac{ \partial( \mathbf{R}\mathbf{p}_i )}{\partial \mathbf{R}} & = \frac{ \partial( exp(\phi^{\wedge})\mathbf{p}_i )}{\partial exp(\phi^{\wedge})} & = \lim_{\delta \phi \rightarrow 0} \frac{exp((\phi + \delta \phi)^{\wedge})\mathbf{p} - exp(\phi^{\wedge}\mathbf{p})}{\delta \phi} \end{align} \tag{6}

这个式子是可以求解的。这个式子在很多有关SLAM的书籍中写到过。这里还是用大家很可能有的中文教材<视觉SLAM十四讲>来讲解的。上面的式子对应的是第一版十四讲4.3.3的推导结果。这是李代数求导的第一种方法。这种方法其实相对接下来要讲的第二种方法直观，因为至少我们是根据导数的定义，实实在在的有李代数的加法来求解的。第二种求导方法则不再拘泥于单纯意义上的加法了。旋转矩阵真的没有加法吗？其实对于旋转来说，乘法就是加法，我们要对一个旋转"加上"另一个旋转，实际上是对旋转矩阵做乘法。所以我们如果对当前的旋转量乘一个近乎于单位矩阵的旋转，其实就是对它"加"了一个小量，当这个旋转小量趋于单位矩阵时，就没有旋转发生。旋转矩阵趋于单位阵对应的李代数就趋于0向量。那么第二个求导方法就是针对当前的旋转乘上一个小的(近乎于单位矩阵)的旋转小量，并对它的李代数趋于0时求极限。

\begin{align} \frac{ \partial( \mathbf{R}\mathbf{p}_i )}{\partial \mathbf{R}} & = \frac{ \partial( exp(\phi^{\wedge})\mathbf{p}_i )}{\partial exp(\phi^{\wedge})} & = \lim_{\delta \varphi \rightarrow 0} \frac{exp(\varphi^{\wedge})exp(\phi^{\wedge})\mathbf{p} - exp(\phi^{\wedge}\mathbf{p})}{\delta \varphi} \end{align} \tag{7}

这就是十四讲第一版书上4.3.4扰动模型了。不过正如书上所说，"扰动"

exp(\varphi)

可以是左扰动也可以是右扰动。书上给出的求导结果是左扰动结果，下面我给出右扰动的推导过程以及结果。

\begin{align} \frac{\partial{ exp(\phi^{\wedge}) \mathbf{p} }}{\partial{\phi}} & = \underset{\varphi \to 0}{lim} \frac{exp(\phi^{\wedge}) exp(\varphi^{\wedge})\mathbf{p} - exp(\phi^{\wedge})\mathbf{p}}{\varphi} \\ & \overset{李群性质1}{\approx} \underset{\varphi\to0}{lim} \frac{exp(\phi^{\wedge}) (\mathbf{I}+\varphi^{\wedge})\mathbf{p} - exp(\phi^{\wedge})\mathbf{p}}{\varphi} \\ & = \underset{\varphi \to 0}{lim} \frac{exp(\phi^{\wedge}) \varphi^{\wedge} \mathbf{p}}{\varphi} \\ & \overset{李代数性质2}{=} - \underset{\varphi \to 0}{lim} \frac{exp(\phi^{\wedge}) \mathbf{p}^{\wedge}\varphi }{\varphi}\\ & = -exp(\phi^{\wedge}) \mathbf{p}^{\wedge} \\ & = -\mathbf{R} \mathbf{p}^{\wedge} \end{align}

这里"李群性质1"是当李代数很小时(所以我们称之为扰动)，

exp(\varphi^{\wedge}) \approx \mathbf{I} + \varphi^{\wedge}

，这里

\mathbf{I}

是3x3单位矩阵。"李代数性质2"是

\varphi^{\wedge} \mathbf{p} = - \mathbf{p}^{\wedge}\varphi

。读者可以自己证明。

下面是激动人心的实际应用

有了李群的导数，接下来就是实际应用部分了。我这里的实际应用不是说让你自己写几行代码去实现，而是去看看比较成熟的SLAM系统是在哪里用到上面的推导的。这需要你打开相关的论文和代码对比来看，需要些精力。我经常操作的是视觉惯性slam，好的开源库有VINS-FUSION,VINS-Mono(都来自于港科大，前者更新有更强的应用不过后者被更多人熟知),okvis等。okvis是多年前的作品，非常经典，但是代码难度比前面的大不少。综合了一下情况我决定使用VINS-Mono的代码，把李代数的求导结果和他们的代码里一一对应起来。代码在他们的github中 HKUST-Aerial-Robotics/VINS-Mono: A Robust and ... - GitHub 我们以VINS-Mono想要求解的reprojection error为例。 vins的原论文 VINS-Mono: A Robust and Versatile Monocular Visual ... - arXiv 式25把处于不同图像的对应的2d点投影到同一单位球上然后求残差。在vins-mono里projection_factor.cpp里Evaluate函数中包含了误差函数以及倒数的结果。首先来看前面几行几个参数

Eigen::Vector3d Pi(parameters[0][0], parameters[0][1], parameters[0][2]);    Eigen::Quaterniond Qi(parameters[0][6], parameters[0][3], parameters[0][4], parameters[0][5]);     Eigen::Vector3d Pj(parameters[1][0], parameters[1][1], parameters[1][2]);    Eigen::Quaterniond Qj(parameters[1][6], parameters[1][3], parameters[1][4], parameters[1][5]);     Eigen::Vector3d tic(parameters[2][0], parameters[2][1], parameters[2][2]);    Eigen::Quaterniond qic(parameters[2][6], parameters[2][3], parameters[2][4], parameters[2][5]);     double inv_dep_i = parameters[3][0];

这几行的parameter来源于传入的要优化的参数，具体来源于在estimator.cpp中的这两行

ProjectionFactor *f = new ProjectionFactor(pts_i, pts_j);problem.AddResidualBlock(f, loss_function, para_Pose[imu_i], para_Pose[imu_j], para_Ex_Pose[0], para_Feature[feature_index]);

其中para_Pose[imu_i]对应于parameter[0]，是IMUi时刻的位姿，para_Pose[imu_j]对应于parameter[1]，是IMUj时刻位姿，para_Ex_Pose[0]是相机和IMU的外参，对应parameter[2]。最后一个就对应inv_dep_i.即该像素点深度的倒数。这几个是想要被优化的量。随后几行

    Eigen::Vector3d pts_camera_i = pts_i / inv_dep_i;    Eigen::Vector3d pts_imu_i = qic * pts_camera_i + tic;    Eigen::Vector3d pts_w = Qi * pts_imu_i + Pi;    Eigen::Vector3d pts_imu_j = Qj.inverse() * (pts_w - Pj);    Eigen::Vector3d pts_camera_j = qic.inverse() * (pts_imu_j - tic);

对应vins的原论文式25第三个

\mathcal{P}_l^{c_j}

的求法。注意式子中的back projection函数

\pi_c^{-1}(\cdot)

在原文中它的作用是turns a pixel location into a unit vector using camera intrinsic parameters。具体参见论文图6. 代码中的pts_i对应的是公式中的

\pi_c^{-1}( \begin{bmatrix} u_l^{c_i} \\ v_l^{c_i} \end{bmatrix} )

这是论文中所说的指向球面的unit vector。用它除以深度的倒数(乘上深度)，就对应代码中的pts_i / inv_dep_i，获得一个在相机坐标系下的3d点

\frac{1}{\lambda}_l \pi_c^{-1}( \begin{bmatrix} u_l^{c_i} \\ v_l^{c_i} \end{bmatrix} )

这个3d点乘以相机到body(或者说IMU)坐标系下的转换矩阵得到在IMU坐标系下的坐标。

\mathbf{R}_c^b \frac{1}{\lambda}_l \pi_c^{-1}( \begin{bmatrix} u_l^{c_i} \\ v_l^{c_i} \end{bmatrix} + \mathbf{p}_c^b )

对应的代码中的qic * pts_camera_i + tic。剩下的式25的第3个式子的位姿变换和代码的每一行一一对应。上式用

\mathbf{T}_{b_i}^{\omega}

转到世界坐标系，再转到

j

图像所在的坐标系最后转到

j

相机下的unit vector即pts_camera_j.

\mathcal{P}_l^{c_j} = \mathbf{R}_b^c( \mathbf{R}_{\omega}^{b_j}( \mathbf{R}_{b_i}^{\omega}( \mathbf{R}_c^b \frac{1}{\lambda}_l \pi_c^{-1}( \begin{bmatrix} u_l^{c_i} \\ v_l^{c_i} \end{bmatrix} + \mathbf{p}_c^b ) + \mathbf{p}_{b_i}^{\omega} - \mathbf{p}_{b_j}^{\omega}) - \mathbf{p}_c^b \tag{8}

随后代码中

#ifdef UNIT_SPHERE_ERROR     residual =  tangent_base * (pts_camera_j.normalized() - pts_j.normalized());#else    double dep_j = pts_camera_j.z();    residual = (pts_camera_j / dep_j).head<2>() - pts_j.head<2>();#endif

用UNIT_SPHERE_ERROR才和式25的第一行对应，如果不用的话就和普通的pinhole相机模型的点对应。为了方便起见，我还是推导的不用UNIT_SPHERE_ERROR所得到的雅各比矩阵。因为我发现VINS, parameter.h中的#define UNIT_SPHERE_ERROR这一行被注释掉了。代码下一行 residual = sqrt_info * residual; okvis也有这一行，这是因为ceres能接受的最小二乘优化只能是最简单的形式

e^Te

，而带有协方差的优化的形式为

e^TP^{-1}e

，所以代码中会对

P^{-1}

做LLT分解，有

LL^T = P^{-1}

，那个sqrt_info自然就是

L^T

了。综上针对某一个像素点在

i,j

frame下的像素坐标我们的residual就是

\mathbf{r} = (\mathcal{P}_l^{c_j} /z_j - \widehat{\mathcal{P}}_l^{c_j} )_{1:2}

式子中

\widehat{\mathcal{P}}_l^{c_j}

来源于测量(比如我们通过特征点匹配的方法知道了i,j是对应的，那时得到的j的坐标就是通过测量得到的),

(\cdot)_{1:n}

表示取向量第1位到第n位。把上式具体写出来就是

\mathbf{r} = \begin{bmatrix} x_{c_j}/z_{c_j} - \widehat{u}_{c_j} \\ y_{c_j}/z_{c_j} - \widehat{v}_{c_j} \end{bmatrix} \tag{9}

residual需要求导的对象是body在

i

frame时的pose，即

(\mathbf{R}_{b_i}^{\omega}, \mathbf{p}_{b_i}^{\omega})

。 首先我们计算简单的对位置求导。

\frac{\partial{\mathbf{r}}}{\partial{\mathbf{p}_{b_i}^{\omega}}} = \frac{\partial{\mathbf{r}}}{\partial{\mathcal{P}_l^{c_j}}} \frac{\partial{\mathcal{P}_l^{c_j}}}{\partial{\mathbf{p}_{b_i}^{\omega}}}

其中

\mathbf{r}

为2维，

\mathcal{P}_l^{c_j}

为3d点

(x_{c_j}, y_{c_j}, z_{c_j})

，很容易得到

\frac{\partial{\mathbf{r}}}{\partial{\mathcal{P}_l^{c_j}}} = \begin{bmatrix} 1/z_{c_j} & 0 & -x_{c_j}/z_{c_j}^2 \\ 0 & 1/z_{c_j} & -y_{c_j}/z_{c_j}^2 \end{bmatrix} \tag{10}

然后(8)式对

\mathbf{p}_{b_i}^{\omega}

求导，得到

\frac{\partial{\mathbf{r}}}{\partial{\mathcal{P}_l^{c_j}}} = \mathbf{R}_b^c \mathbf{R}_{\omega}^{b_j}

对应代码中的

jaco_i.leftCols<3>() = ric.transpose() * Rj.transpose();

即

\frac{\partial{\mathbf{r}}}{\partial{\mathcal{P}_l^{c_j}}} = \mathbf{R}_b^c \mathbf{R}_{\omega}^{b_j} = (\mathbf{R}_c^b)^{-1} (\mathbf{R}_{b_j}^{\omega})^{-1} = (\mathbf{R}_c^b)^T (\mathbf{R}_{b_j}^{\omega})^T

这里用到了SO(3)群的基本性质，SO(3)下的矩阵的逆等于它的转置。 之后我们对旋转求导数

\frac{\partial{\mathbf{r}}}{\partial{\mathbf{R}_{b_i}^{\omega}}} = \frac{\partial{\mathbf{r}}}{\partial{\mathcal{P}_l^{c_j}}} \frac{\partial{\mathcal{P}_l^{c_j}}}{\partial{\mathbf{R}_{b_i}^{\omega}}}

导数的前一部分和式10一样，后一部分涉及到李群的求导，为书写方便我们先令

\frac{1}{\lambda}_l \pi_c^{-1}( \begin{bmatrix} u_l^{c_i} \\ v_l^{c_i} \end{bmatrix}) = \mathcal{P}_l^{c_i}

，因为左式本来就是

i

frame像素点投影到相机平面的3d点。后一部分

\begin{align} \frac{\partial{\mathcal{P}_l^{c_j}}}{\partial{\mathbf{R}_{b_i}^{\omega}}} & \overset{代入式8}{=} \frac{\partial{\mathbf{R}_b^c( \mathbf{R}_{\omega}^{b_j}( \mathbf{R}_{b_i}^{\omega}( \mathbf{R}_c^b \mathcal{P}_l^{c_i} + \mathbf{p}_c^b ) + \mathbf{p}_{b_i}^{\omega} - \mathbf{p}_{b_j}^{\omega}) - \mathbf{p}_c^b}}{\partial{\mathbf{R}_{b_i}^{\omega}}} \\ & = \frac{\partial{\mathbf{R}_b^c \mathbf{R}_{\omega}^{b_j} \mathbf{R}_{b_i}^{\omega} \mathbf{R}_c^b \mathcal{P}_l^{c_i} }}{\partial{\mathbf{R}_{b_i}^{\omega}}} \end{align}

现在只对

\mathbf{R}_{b_i}^{\omega}

求导数，那其他量视为是常量，我们令

C_1 = \mathbf{R}_b^c \mathbf{R}_{\omega}^{b_j}

，另外3d点

\mathbf{R}_c^b\mathcal{P}_l^{c_i}

得到的是body 坐标系下的3d点，我们记为

\mathcal{P}_l^{b_i}

。那么上式就变成了

\begin{align} \frac{\partial{\mathcal{P}_l^{c_j}}}{\partial{\mathbf{R}_{b_i}^{\omega}}} & = \frac{\partial{C_1 \mathbf{R}_{b_i}^{\omega} \mathcal{P}_l^{b_i} }}{\partial{\mathbf{R}_{b_i}^{\omega}}} \\ & = C1 \frac{\partial{ \mathbf{R}_{b_i}^{\omega} \mathcal{P}_l^{b_i} }}{\partial{\mathbf{R}_{b_i}^{\omega}}} \end{align} \tag{11}

现在的问题就是如何得到

\frac{\partial{ \mathbf{R}_{b_i}^{\omega} \mathcal{P}_l^{b_i} }}{\partial{\mathbf{R}_{b_i}^{\omega}}}

了,简单记为

\frac{\partial{ \mathbf{R} \mathcal{P} }}{\partial{\mathbf{R}}}

。那这儿很明显了，应用我们右扰动的推导结果，可以直接得到

\begin{align} \frac{\partial{\mathcal{P}_l^{c_j}}}{\partial{\mathbf{R}_{b_i}^{\omega}}} & = C_1\frac{\partial{ \mathbf{R}_{b_i}^{\omega} \mathcal{P}_l^{b_i} }}{\partial{\mathbf{R}_{b_i}^{\omega}}} \\ & \overset{因为用了李群性质1}{\approx} -C_1 \mathbf{R}_{b_i}^{\omega} (\mathcal{P}_l^{b_i})^{\wedge} \end{align}

代入

C_1 = \mathbf{R}_b^c\mathbf{R}_{\omega}^{b_j}

，我们就有了

这和代码中的

jaco_i.rightCols<3>() = ric.transpose() * Rj.transpose() * Ri * -Utility::skewSymmetric(pts_imu_i);

对应。总的来说代码第一个求导部分得到了残差对

i

frame处位姿的求导。

     if (jacobians[0])        {            Eigen::Map> jacobian_pose_i(jacobians[0]);             Eigen::Matrix jaco_i;            jaco_i.leftCols<3>() = ric.transpose() * Rj.transpose();\\对位移求导            jaco_i.rightCols<3>() = ric.transpose() * Rj.transpose() * Ri * -Utility::skewSymmetric(pts_imu_i);\\对旋转求导             jacobian_pose_i.leftCols<6>() = reduce * jaco_i;\\reduce为式10            jacobian_pose_i.rightCols<1>().setZero(); \\这里有一个0列，回顾前面可以发现我们在求偏导时忽略了一个1的        }

其他部分的推导我就不讲了，以后有时间的话我会写上IMU部分的推导，有预积分会麻烦很多。你自己如果有兴趣的话，你可以推导okvis的reprojection error部分，它也是用ceres优化，优化部分的结构和VINS-MONO相似。虽然他们的代码更难读懂，但是我已经推导过了，和前面的讲解一致，所以你也可以试一下。

李群和李代数的核心思想

作者: xsh
时间: 2026-01-15
分类: 算法与数学基础
评论

李群和李代数的核心思想

可以理解为专门用于矩阵旋转的东西，符合封结幺逆法则；
李群可以理解为旋转矩阵，李代数可以理解为旋转向量；
李群是连续群，李代数可以表出李群的导数，所以李代数表示的是李群的局部性质；
进而我们可以理解为：旋转向量表达了旋转矩阵的局部（旋转发生那一瞬间的领域内）性质；
由拉格朗日中值定理可知：导数控制函数。李代数控制李群， $ϕ$ 控制 $R$ ；^【1】

也就是说想要估计出函数值，我们可以研究该函数的导数，用来描述某个点领域内性质。故而我们需要建立对李群的求导模型，通过分析导数的性质来估计出相机在这一时刻（领域内）的位姿。

但是我们知道群是指只有一个运算的集合（我们选择矩阵乘法），所以李群不对加法封闭^【2】，但是我们知道李代数是建立在向量空间上的，支持加法运算。所以我们需要一种让李群映射到李代数的机制，然后通过对李代数求导，求出李群的导数。不过，对李代数求导后的结果非常复杂，所以我们需要寻找另外一种求导方式^【3】，这就是我们接下来所要介绍的内容。

【注】

^【1】：某个名牌大学考研的复试题——你知道导数的作用是什么吗？

^【2】：李群也是一种群。甭跟我扯什么鳄鱼不是鱼、日本人不是人。

^【3】：对谁求导不重要，因为我们总可以通过这个导数控制相同的函数。

李群的两种求导模型（都是映射到了李代数空间）

BCH公式线性化（将李群的变化与李代数的变化联系起来）；
对李代数求导的求导模型；（复杂）
1. 需要求出左右雅可比矩阵的逆；
对微扰动求导的扰动模型；（精简）
1. 不需要求出左右雅可比矩阵的逆；

这两种求导模型都是会有误差存在的

李群和李代数的基础符号

特殊正交群 $S O (3)$ ，特殊欧式群 $S E (3)$ ；
特殊正交群上的李代数 $s o (3)$ ，这里我们具象化为三维 $ϕ$ 向量或者反对称阵 $\hat{ϕ}$ ；
特殊欧式群上的李代数 $s e (3)$ ，这里我们具象化为六维 $ξ$ 向量或者四维方阵 $\hat{ξ}$ ；

$ρ$ 表示三维空间中的平移， $ϕ$ 表示三维空间中的旋转。
反对称矩阵与相应的三维向量： $\land$ 和 $\lor$ ；
向量 $\vec{a}$ 和 $\vec{ω}$ 都表示旋转向量的单位方向向量， $θ$ 表示旋转角， $\vec{ϕ}$ 表示旋转向量， $R$ 表示旋转矩阵；

《李群与李代数》在三维刚体运动（SLAM）中应用的详细讲解

作者: xsh
时间: 2026-01-15
分类: 算法与数学基础
评论

这是一个关于《李群与李代数》在三维刚体运动（SLAM）中应用的详细讲解。根据你提供的课件内容，我将这些知识点整理为逻辑严密的六个部分：动机、基本概念（群与代数）、指数映射、BCH公式与求导、扰动模型以及代码实践。

1. 为什么要引入李群与李代数？（动机）

在做SLAM或机器人状态估计时，我们核心任务是对位姿（旋转和平移）进行估计和优化。

传统方法的局限性：
- 三维旋转矩阵 $R$ 必须满足正交且行列式为1的约束（ $RR^T=I, \det(R)=1$ ）。
- 作为优化变量时，这些额外约束会使优化问题变得非常困难（有约束优化通常比无约束优化难解）。
李代数的优势：
- 李群（如旋转矩阵）没有定义良好的加法，难以求导。
- 李代数对应于李群单位元处的切空间（Tangent Space）。
- 在李代数上，我们可以把位姿变为向量形式，进行无约束优化，这对非线性优化（如Gauss-Newton法）至关重要。

2. 基础概念：李群 (Lie Group)

群 (Group) 是一种集合加上一种运算的代数结构。

四大性质： 封闭性、结合律、幺元、逆 ¹。
常见的群：
- 一般线性群 $GL(n)$ ： $n \times n$ 可逆矩阵。
- 特殊正交群 $SO(3)$ ： 即旋转矩阵群。 $R \in \mathbb{R}^{3\times 3}, RR^T=I, \det(R)=1$ 。
- 特殊欧氏群 $SE(3)$ ： 即变换矩阵群。包含旋转 $R$ 和平移 $t$ ， $T = \begin{bmatrix} R & t \\ 0^T & 1 \end{bmatrix}$ 。

李群是指具有连续（光滑）性质的群。直观上看，刚体在空间中的运动是连续的，所以 $SO(3)$ 和 $SE(3)$ 都是李群。

3. 核心概念：李代数 (Lie Algebra)

每个李群都有与之对应的李代数，描述了群在单位元附近的局部性质（切空间）。李代数由一个集合 $\mathbb{V}$ 、数域 $\mathbb{F}$ 和二元运算（李括号 [,]）组成。

3.1 $\mathfrak{so}(3)$ ：对应三维旋转

定义： 这是一个三维向量 $\phi \in \mathbb{R}^3$ 。
反对称矩阵（Hat Operator $^\wedge$）：

将向量 $\phi = [\phi_1, \phi_2, \phi_3]^T$ 映射为一个反对称矩阵 $\Phi = \phi^\wedge$：

$\phi^\wedge = \begin{bmatrix} 0 & -\phi_3 & \phi_2 \\ \phi_3 & 0 & -\phi_1 \\ -\phi_2 & \phi_1 & 0 \end{bmatrix}$
物理意义： $\mathfrak{so}(3)$ 的物理意义就是旋转向量。

3.2 $\mathfrak{se}(3)$ ：对应三维变换

定义：这是一个六维向量 $\xi \in \mathbb{R}^6$。

$\xi = \begin{bmatrix} \rho \\ \phi \end{bmatrix}$
- $\rho \in \mathbb{R}^3$ ：平移部分（注意：虽然叫平移，但不直接等同于 $SE(3)$ 中的 $t$ ，二者差一个线性变换）。
- $\phi \in \mathbb{R}^3$ ：旋转部分（与 $\mathfrak{so}(3)$ 一致）。
扩展的Hat Operator：在 $\mathfrak{se}(3)$ 中，$\wedge$ 符号将其映射为 $4 \times 4$ 矩阵：

$\xi^\wedge = \begin{bmatrix} \phi^\wedge & \rho \\ 0^T & 0 \end{bmatrix}$

4. 桥梁：指数映射 (Exponential Map)

指数映射建立了**李代数（向量/矩阵）到李群（旋转/变换矩阵）**的联系。

R = \exp(\phi^\wedge)

4.1 SO(3) 的指数映射（罗德里格斯公式）

通过泰勒展开并利用反对称矩阵的性质（$a^\wedge a^\wedge = aa^T - I$），可以得到著名的罗德里格斯公式 (Rodrigues' Formula)：

\exp(\theta a^\wedge) = \cos\theta I + (1-\cos\theta)aa^T + \sin\theta a^\wedge

其中 $\theta$ 是旋转角度，$a$ 是旋转轴（单位向量）。

反向映射（对数映射）： 给定旋转矩阵 $R$ ，也可以求出对应的李代数 $\phi$ （即求转轴和转角）。

4.2 SE(3) 的指数映射

对于六维向量 $\xi$，其指数映射为：

\exp(\xi^\wedge) = \begin{bmatrix} R & J\rho \\ 0^T & 1 \end{bmatrix} = T

注意： 平移部分不是直接相加，而是多了一个左雅可比矩阵 $J$ 。 $J$ 描述了李代数上的平移分量如何转换到群上的平移分量。

5. 关键难点：求导与扰动模型

我们在优化中需要求导，但李群只有乘法没有加法： $\exp(\phi_1^\wedge)\exp(\phi_2^\wedge) \neq \exp((\phi_1+\phi_2)^\wedge)$ 。这导致无法直接定义导数。

5.1 BCH 公式 (Baker-Campbell-Hausdorff)

BCH 公式解决了“两个指数映射的乘积等于什么”的问题。

当其中一个量为小量（$\Delta \phi$）时，BCH 提供了线性近似：

\ln(\exp(\phi^\wedge)\exp(\Delta \phi^\wedge))^\vee \approx \phi + J_l^{-1}(\phi)\Delta \phi

这说明：在李群上左乘一个小量，等于在李代数上加上一个“带左雅可比逆”的小量。

5.2 两种求导思路

为了对位姿求导（例如求 $\frac{\partial (Rp)}{\partial R}$ ），有两种方法：

导数模型（不推荐）：
- 思路：在李代数上直接加一个小量 $\delta \phi$ ，即 $\exp((\phi + \delta\phi)^\wedge)$ 。
- 结果：由于 BCH 公式的存在，结果中会引入复杂的雅可比矩阵 $J_l$ ，计算极其繁琐。
扰动模型（Perturbation Model，推荐）：
- 思路：在李群上左乘（或右乘）一个微小扰动 $\exp(\varphi^\wedge)$ ，令 $\varphi \to 0$ 。
- 公式推导（左扰动为例）：
  
  $\lim_{\varphi \to 0} \frac{\exp(\varphi^\wedge)\exp(\phi^\wedge)p - \exp(\phi^\wedge)p}{\varphi}$
- 最终结果：
  
  $\frac{\partial (Rp)}{\partial \varphi} = -(Rp)^\wedge$
- 优势： 结果非常简洁，避开了复杂的雅可比矩阵 $J_l$ 。这也是视觉SLAM中通常使用的方法。

6. 实践：Sophus 库

课件最后介绍了 Sophus 库，这是一个基于 Eigen 的 C++ 库，专门用于处理李群李代数 ²。

核心功能：
- 支持 $SO(3)$ 和 $SE(3)$ 的构造与运算。
- 提供 exp() 和 log() 函数进行群与代数的转换。
- 支持 hat() (向量转矩阵) 和 vee() (矩阵转向量) 操作。
代码示例要点：
- Sophus::SO3::exp(phi)：将旋转向量转为旋转矩阵。
- Sophus::SE3 的构造通常先把旋转和平移定义好，注意平移在前还是旋转在前的约定（Sophus 内部通常是平移在前，旋转在后，但构造函数可能不同，需看具体版本）。
- 编译： 这是一个 CMake 工程，依赖 Eigen，需要标准的 cmake .. 和 make 流程 ³。

总结

李群 (Group): 旋转/变换矩阵本身，有约束，难优化。
李代数 (Algebra): 对应的向量形式，无约束，易优化。
映射: 指数映射 ( $\exp$ ) 和对数映射 ( $\ln$ ) 在两者间转换。
求导: 使用左乘扰动模型，将复杂的求导问题转化为简单的矩阵运算（ $-(Rp)^\wedge$ ），这是 SLAM 后端优化的数学基础。

下一步建议：

你是否需要我解释一下左扰动模型和右扰动模型的具体区别，或者用一个具体的数学例子（比如推导 $-(Rp)^\wedge$ 的过程）来加深理解？