分类 算法与数学基础 下的文章

The Extended Kalman Filter is a special Kalman Filter used when working with nonlinear systems. Since the Kalman Filter can not be applied to nonlinear systems, the Extended Kalman Filter was created to solve that problem. Specifically, it was used in the development of navigation control systems aboard Apollo 11. And like many other real world systems, the behavior is not linear so a traditional Kalman Filter can not be used. This post will walk you through an Extended Kalman Filter Python example so you can see how one can be implemented!

In the diagram below, it can be seen that the Extended Kalman Filter’s general flow is exactly the same as the Kalman Filter. The main two differences between the Kalman Filter and the Extended Kalman Filter are represented in red font. In addition to these differences, there are some lower level differences that I will cover in more detail below.

This article assumes that the reader understands the core concepts of the Kalman Filter. If you want a refresher before reading on, try skimming my Kalman Filter Explained Simply article.

extended kalman filter algorithm overview diagram
Extended Kalman Filter Diagram

Python Example Overview

The Extended Kalman Filter Python example chosen for this article takes in measurements from a ground based radar tracking a ship in a harbor and estimates the ships position and velocity. The radar measurements are in a local polar coordinate frame and the filter’s state estimate is in a local cartesian coordinate frame. The measurement to state estimate relationship i.e. polar to cartesian is nonlinear.

In this example, the ship is traveling in a straight line at constant velocity of 20 meters/sec or about 45 miles per hour. It is located about 3250 meters away or about 2 miles. The measurements are being reported at 1 hertz or once per second. The measurements are being reported as the range and azimuth values in local polar frame and input measurement data can be seen in the two plots below.

extended kalman filter input measurement range data vs actual range data plot
Range Measurements vs Actual Range of Ship
extended kalman filter input measurement azimuth data vs actual azimuth data plot
Azimuth Measurements vs Actual Azimuth of Ship

A simulation technique was used to generate these measurements. The code for simulating this ship behavior is beyond the scope of this post, but the plots above show the noisy measurements that are based on simulated position data.

On the Initialization of the Extended Kalman Filter

As can be seen in the Extended Kalman Filter diagram above, the first two measurements are needed before filtering can be begin. The reason for this is because the output state estimate needs to be initialized. There are different approaches to filter initialization based on your application but for this example, the first two position measurements will be used to form the state estimate which consists of the position and velocity.

Logically, one can make sense of this. The first position measurement received only tells you about the ships position at one point in time. Only after you receive the second measurement, can you compute a velocity i.e. change in position measurement divided by the time difference between the two measurements.

Extended Kalman Filter Python Example

Let’s take a look at the input, output, and system model equations before we dive into the code. This should make reading the code much easier.

Input and Output Equations

As mentioned above, the input measurements for this Extended Kalman Filter Python example are in the local polar coordinate frame. And these are the equations that represent these measurements.

local polar input measurement matrix equations i.e. range and azimuth

z is the measurement vector, R is the measurement covariance matrix and t is the time at which the measurement was taken. m is used as a subscript to indicate measurements.

The above input measurements are filtered and a system state estimate is computed and output in the following form.

local cartesian output state estimate i.e. x and y position and velocity

x is the state vector, P is the state covariance matrix, and T is the time at which the state estimate is valid. In this example, this time will reflect the last measurement time that was used to compute the state estimate.

System Model Equations for Prediction Step

As mentioned above, the system model equations are where the primary differences between the Extended Kalman Filter and the traditional Kalman Filter lie. For this example, the A and Q matrices used are not any different than matrices that would be used in a Kalman Filter. Those equations are the following.

system model equations for prediction step. A and Q matrix.

The A matrix above is the state transition matrix. This matrix is used to extrapolate the state vector and state covariance from the last data time to align with the data time of the new measurement. This matrix was developed assuming the ship was traveling linearly at constant velocity i.e. using physics equations of motion.

The Q matrix is the system noise matrix. This represents the uncertainty in the system prediction model. For this example, the A matrix assumes the ship is traveling in a linear motion at a constant velocity. But in reality, this is not always true. The Q matrix is used to add error to the state estimate prediction to account for those differences in linear motion and small accelerations or decelerations. The values used in this example are not optimized. Optimization of these parameters is outside the scope of this post.

Here are the prediction equations from Step 3 in the diagram above.

extended kalman filter prediction equations

System Model Equations for Estimation Step

The estimation process for the Extended Kalman Filter is performed in two steps. First, the Kalman Gain is computed. Second, the state vector and state covariance for the new measurement time is computed with the Kalman Gain. Both of these steps require the use of an H matrix. The H matrix is the state-to-measurement transition matrix. This matrix is used to transform the predicted state vector and covariance matrix into the same form as the input measurement. For this Extended Kalman Filter example, the state and measurement relationship is nonlinear. Because of this nonlinear relationship, the H matrix is different than that of a traditional Kalman Filter. Specifically, the H matrix is a jacobian matrix. The derivation and applications of the jacobian matrix are outside the scope of this post. The important features to note about the jacobian at this time is that is made up of first order partial differential equations. Here is the H matrix for this example.

state to measurement transition matrix H. Jacobian matrix.

The partial differential equations were computed with the following polar to cartesian relationships for range and azimuth.

range and azimuth equations with respect to local cartesian frame

After the H matrix is computed, the Kalman Gain can be computed. The Kalman Gain equations that follow consist of two steps. First, compute the innovation matrix, S. Then compute the gain, K.

kalman gain equations

The estimation steps use the Kalman Gain, K. The estimation steps include estimating the new system state as well as the associated state covariance matrix. One difference you can see in these equations compared to the traditional Kalman Filter estimation equations, is the h(xP) function. This function represents the state-to-measurement nonlinear transformation.

h(xP) is defined by the equation below for this example. These equations should look familiar because these are the same equations used to compute the jacobian H matrix.

Our Extended Kalman Filter tutorial is implemented in Python with these equations. Next, we will review the implementation details with code snippets and comments.

Python Implementation for the Extended Kalman Filter Example

In order to develop and tune a Python Extended Kalman Filter, you need the following source code functionality:

  • Input measurement data (real or simulated)
  • Extended Kalman Filter algorithm
  • Data collection and visualization tools
  • Test code to read in measurement data, execute filter logic, collect data, and plot the collected data

For this tutorial, multiple Python libraries are used to accomplish the above functionality. The NumPy Python library is used for arrays, matrices, and linear algebra operations on those structures. The Matplotlib is used to plot input and output data. Assuming these packages are installed, this is the code used to import and assign a variable name to these packages.

import numpy as np
from numpy.linalg import inv
import matplotlib.pyplot as plt

Extended Kalman Filter Algorithm

The algorithm is implemented using one function with two input arguments, the measurement data structure, z, and the update number, updateNumber. The update number is not a necessary input variable. But for this example, it was easier to implement. Additionally, the update number was used as the time stamp for each measurement because the measurements were reported once a second. I will update this in the future to be more robust.

There are three main processing threads through this function determined by the update number: initialization, re-initialization, and filtering. Here is the code:

def ekfilter(z, updateNumber):
dt = 1.0
j = updateNumber
# Initialize State
if updateNumber == 0: # First Update
# compute position values from measurements
# x = r*sin(b)
temp_x = z[0][j]*np.sin(z[1][j]*np.pi/180)
# y = r*cos(b)
temp_y = z[0][j]*np.cos(z[1][j]*np.pi/180)
# state vector
# - initialize position values
ekfilter.x = np.array([[temp_x],
[temp_y],
[0],
[0]])
# state covariance matrix
# - initialized to zero for first update
ekfilter.P = np.array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
# state transistion matrix
# - linear extrapolation assuming constant velocity
ekfilter.A = np.array([[1, 0, dt, 0],
[0, 1, 0, dt],
[0, 0, 1, 0],
[0, 0, 0, 1]])
# measurement covariance matrix
ekfilter.R = z[2][j]
# system error matrix
# - initialized to zero matrix for first update
ekfilter.Q = np.array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
# residual and kalman gain
# - not computed for first update
# - but initialized so it could be output
residual = np.array([[0, 0],
[0, 0]])
K = np.array([[0, 0],
[0, 0],
[0, 0],
[0, 0]])
# Reinitialize State
if updateNumber == 1: # Second Update
prev_x = ekfilter.x[0][0]
prev_y = ekfilter.x[1][0]
# x = r*sin(b)
temp_x = z[0][j]*np.sin(z[1][j]*np.pi/180)
# y = r*cos(b)
temp_y = z[0][j]*np.cos(z[1][j]*np.pi/180)
temp_xv = (temp_x - prev_x)/dt
temp_yv = (temp_y - prev_y)/dt
# state vector
# - reinitialized with new position and computed velocity
ekfilter.x = np.array([[temp_x],
[temp_y],
[temp_xv],
[temp_yv]])
# state covariance matrix
# - initialized to large values
# - more accurate position values can be used based on measurement
# covariance but this example does not go that far
ekfilter.P = np.array([[100, 0, 0, 0],
[0, 100, 0, 0],
[0, 0, 250, 0],
[0, 0, 0, 250]])
# state transistion matrix
# - linear extrapolation assuming constant velocity
ekfilter.A = np.array([[1, 0, dt, 0],
[0, 1, 0, dt],
[0, 0, 1, 0],
[0, 0, 0, 1]])
# measurement covariance matrix
# - provided by the measurment source
ekfilter.R = z[2][j]
# system error matrix
# - adds 4.5 meter std dev in x and y position to state covariance
# - adds 2 meters per second std dev in x and y velocity to state covariance
# - these values are not optimized but work for this example
ekfilter.Q = np.array([[20, 0, 0, 0],
[0, 20, 0, 0],
[0, 0, 4, 0],
[0, 0, 0, 4]])
# residual and kalman gain
# - not computed for first update
# - but initialized so it could be output
residual = np.array([[0, 0],
[0, 0]])
K = np.array([[0, 0],
[0, 0],
[0, 0],
[0, 0]])
if updateNumber > 1: # Third + Updates
# Predict State Forward
x_prime = ekfilter.A.dot(ekfilter.x)
# Predict Covariance Forward
P_prime = ekfilter.A.dot(ekfilter.P).dot(ekfilter.A.T) + ekfilter.Q
# state to measurement transition matrix
x1 = x_prime[0][0]
y1 = x_prime[1][0]
x_sq = x1*x1
y_sq = y1*y1
den = x_sq+y_sq
den1 = np.sqrt(den)
ekfilter.H = np.array([[ x1/den1, y1/den1, 0, 0],
[y1/den, -x1/den, 0, 0]])
ekfilter.HT = np.array([[x1/den1, y1/den],
[y1/den1, -x1/den],
[0, 0],
[0, 0]])
# measurement covariance matrix
ekfilter.R = z[2][j]
# Compute Kalman Gain
S = ekfilter.H.dot(P_prime).dot(ekfilter.HT) + ekfilter.R
K = P_prime.dot(ekfilter.HT).dot(np.linalg.inv(S))
# Estimate State
# temp_z = current measurement in range and azimuth
temp_z = np.array([[z[0][j]],
[z[1][j]]])
# compute the predicted range and azimuth
# convert the predicted cartesian state to polar range and azimuth
pred_x = x_prime[0][0]
pred_y = x_prime[1][0]
sumSquares = pred_x*pred_x + pred_y*pred_y
pred_r = np.sqrt(sumSquares)
pred_b = np.arctan2(pred_x, pred_y) * 180/np.pi
h_small = np.array([[pred_r],
[pred_b]])
# compute the residual
# - the difference between the state and measurement for that data time
residual = temp_z - h_small
# Compute new estimate for state vector using the Kalman Gain
ekfilter.x = x_prime + K.dot(residual)
# Compute new estimate for state covariance using the Kalman Gain
ekfilter.P = P_prime - K.dot(ekfilter.H).dot(P_prime)
return [ekfilter.x[0], ekfilter.x[1], ekfilter.x[2], ekfilter.x[3], ekfilter.P, K, residual, updateNumber];
# End of ekfilter

Testing the Extended Kalman Filter Algorithm

In order to test this algorithm implementation, real data or simulated data is needed. For this tutorial, simulated data is generated and used to test the algorithm. As mentioned above, the simulation for this example is outside the scope of this post.

The test code below was used to test this algorithm. The getMeasurements() function returns an array of arrays of measurement information in the following form: [measurement_range, meas_azimuth, measurement_cov, meas_time]

z = getMeasurements()
for iii in range(0, len(z[0])):
# for each measurement, call the Extended Kalman Filter function
f = ekfilter(z, iii)

Filter Results and Analysis

There are many different ways to analyze your filter’s performance. And every application will require different analysis because each will have its own performance requirements. Here are two plots that helped us understand if this filter was working.

The first plot is an error plot for the x position estimate. This plot includes the x position error, which is computed as the x position estimate minus the real x position that was used to generate the polar measurement. In addition to the x position error, the green lines represent the estimated error bounds computed from the state covariance estimate. You can see in this plot that sometimes the x position error falls outside the error bounds. Again, based on your application requirements, this may or may not be okay. The x position error starts to settle within the error bounds after update number 15.

x position error containment plot for extended kalman filter python example

Another plot used was the estimated velocity for the ship. It can be seen that the x and y velocity values vary during the first few updates but then begin to settle at the actual values used for measurement generation i.e. x velocity at 20 meters per second and y velocity at 0 meters per second. This is a good sign that the filter is working because it is able to accurately estimate the ships velocity based only on position measurements in a different coordinate frame.

extended kalman filter results plots. x and y velocity estimates vs time

Extended Kalman Filters can diverge. This means that the filter’s estimate will grow uncontrollably and won’t represent the true system state. Researchers continue to conduct research to find ways to correct that divergence. Divergence correction is outside the scope of this post.

Summary

I wish you the best of luck in your filtering work as you move forward on your projects! Please leave a comment if you enjoyed this post!


Most tutorials for the Kalman Filter are difficult to understand because they require advanced math skills to understand how the Kalman Filter is derived. If you have tried to read Rudolf E Kalman’s 1960 Kalman Filter paper, you know how confusing this concept can be. But do you need to understand how to derive the Kalman Filter in order to use it?

No. If you want to design and implement a Kalman Filter, you do not need to know how to derive it, you just need to understand how it works.

The truth is, anybody can understand the Kalman Filter if it is explained in small digestible chunks. This post simply explains the Kalman Filter and how it works to estimate the state of a system.

The big picture of the Kalman Filter

Lets look at the Kalman Filter as a black box. The Kalman Filter has inputs and outputs. The inputs are noisy and sometimes inaccurate measurements. The outputs are less noisy and sometimes more accurate estimates. The estimates can be system state parameters that were not measured or observed. This last sentence describes the super power of the Kalman Filter. Again, the Kalman Filter estimates system parameters that are not observed or measured.

In short, you can think of the Kalman Filter as an algorithm that can estimate observable and unobservable parameters with great accuracy in real-time. Estimates with high accuracy are used to make precise predictions and decisions. For these reasons, Kalman Filters are used in robotics and real-time systems that need reliable information.

What is the Kalman Filter?

Simply put, the Kalman Filter is a generic algorithm that is used to estimate system parameters. It can use inaccurate or noisy measurements to estimate the state of that variable or another unobservable variable with greater accuracy. For example, Kalman Filtering is used to do the following:

  • Object Tracking – Use the measured position of an object to more accurately estimate the position and velocity of that object.
  • Body Weight Estimate on Digital Scale – Use the measured pressure on a surface to estimate the weight of object on that surface.
  • Guidance, Navigation, and Control – Use Inertial Measurement Unit (IMU) sensors to estimate an objects location, velocity, and acceleration; and use those estimates to control the objects next moves.

The real power of the Kalman Filter is not smoothing measurements. It is the ability to estimate system parameters that can not be measured or observed with accuracy. Estimates with improved accuracy in systems that operate in real time, allow systems greater control and thus more capabilities.

Kalman Filter Algorithm Overview

Kalman Filter algorithm process diagram

The process diagram above shows the Kalman Filter algorithm step by step. I know those equations are intimidating but I assure you this will all make sense by the time you finish reading this article. Let’s look at this process one step at a time. For your reference, here is a table of definitions that will be referred to throughout.

x state variable n x 1 column vector Output
P state covariance matrix n x n matrix Output
z measurement m x 1 column vector Input
A state transition matrix n x n matrix System Model
H state-to-measurement matrix m x n matrix System Model
R measurement covariance matrix m x m matrix Input
Q process noise covariance matrix n x n matrix System Model
K Kalman Gain n x m Internal
Kalman Filter Algorithm Reference Terms

The table above identifies the variables used in the algorithm. Each variable listed has a structure type and category. As this article continues, use the table as a reference.



Kalman Filter Radar Tracking Tutorial

This tutorial will go through the step by step process of a Kalman Filter being used to track airplanes and objects near airports. The output track states are used to display to the air traffic control operators monitoring the air space.

Kalman Filter Tutorial Notation

Radars are not built equally. Each one has different capabilities and therefore provides different types of information to its supporting systems. For this example, the radar will output its measurements in 2D cartesian coordinates, x and y. These measurements will be represented as a 2-by-1 column vector, z. The associated variance-covariance matrix for these measurements, R, will also be provided by the radar along with the time tag for when the measurement occurred, t. The subscript m denotes the measurement parameters. And the k subscript denotes the order of the measurement.

radar measurement equations - Kalman Filter

The Kalman Filter estimates the objects position and velocity based on the radar measurements. The estimate is represented by a 4-by-1 column vector, x. It’s associated variance-covariance matrix for the estimate is represented by a 4-by-4 matrix, P. Additionally, the state estimate has a time tag denoted as T.

Kalman Filter object state equations

Step 1: Initialize System State

Initializing the system state of a Kalman Filter varies across applications. In this tutorial, the Kalman Filter initializes the system state with the first measurement.

xk Eqn. 1-1
Pk Eqn. 1-2

In this radar tracking example, the input measurements contain position only information. The output system state will contain the position and velocity of the object.

When the first measurement comes, the only information known about the object is the position at that point in time. The system state estimate will be set to the input position after the first estimate. The system state error covariance will be set to the first measurement’s position accuracy.

Initialize System State in Equations

These equations show the input and output values for this Kalman Filter after receiving the first measurement.

radar measurement 1 equations - Kalman Filter
initialize state and time tag equations - Kalman Filter
initialize state covariance equations - Kalman Filter

Step 2: Reinitialize System State

The system state estimate is reinitialized because a velocity estimate needs a second position measurement for computation.

xk Eqn. 2-1
Pk Eqn. 2-2

Velocity is estimated with a linear approximation. As you most likely recall from high school physics, velocity is equal to the distance traveled divided by the time it took to travel that distance.

The updated system state estimate will be the second measurement’s position and the computed velocity. The updated system state error covariance will be the second measurement’s position accuracy and an approximated velocity accuracy. Note that this velocity accuracy approximation is something that can be tuned and adjusted after running data through your filter. There are different ways of approximating these values so if this doesn’t match your approximation, that’s okay!

Reinitialize System State in Equations for the Kalman Filter

These equations show the input and output values for this Kalman Filter after receiving the second measurement. Note the velocity variance terms in the state covariance matrix. These values are being set to 104. In other words, this value indicates a large uncertainty for the velocity state values. In this example, the velocity units are meters per second.

measurement 2 equations
reinitialize state vector and time tag equations
reinitialize state covariance matrix equation

Quick Note on Initialization

Steps 1 and 2 used the first couple measurements to initialize and re-initialize the system estimate. Each application of the Kalman Filter may do this differently but the goal is to have a system state estimate that can be updated for future measurement with the Kalman Filter equations.

Steps 3 through 6 demonstrate how measurements are filtered in and the state estimate is updated.

Step 3: Predict System State Estimate

When the third measurement is received, the system state estimate is propagated forward to time align it with the measurement. This alignment is done so that the measurement and state estimate can be combined.

xp = Axk-1 Eqn. 3-1
Pp = APk-1AT + Q Eqn. 3-2

The system model is used to perform this prediction. In this example, a constant velocity linear motion model is used to approximate the objects position change over a time interval. Note that a constant velocity model does assume zero acceleration. Remember this because it will resurface later.

The constant velocity linear motion model is something you may also remember from your high school physics class. The equation states that the position of an object is equal to its initial position plus its displacement over a specified time period assuming a constant velocity.

A state transition matrix represents these equations. This matrix is used to propagate the state estimate and state error covariance matrix appropriately. You may be wondering why the state error covariance matrix is propagated. The reason for this is because when a state estimate is propagated in time, the uncertainty about its state at this future time step is inherently uncertain, so the error covariance grows.

On the Q Matrix

The Q matrix represents process noise for the system model. The system model is an approximation. Throughout the life of a system state, that system model fluctuates in its accuracy. Therefore, the Q matrix is used to represent this uncertainty and adds to the existing noise on the state. For this example, the systems actual accelerations and decelerations contribute to this error.

On the H Matrix

The Kalman Filter uses the state-to-measurement matrix, H, to convert the system state estimate from the state space to the measurement space. For some applications, this is a matrix of zeros and ones. For other applications that use the Extended Kalman Filter, the H matrix is populated with differential equations. To learn more about Extended Kalman Filters, check out my article on them here.

In this tutorial, the H matrix is a simple matrix that is set up to reduce the state estimate and error covariance to position only values rather than position and velocity.

Predict System State in Equations

measurement 3 equations
measurement 3 state time tag and delta equations
Kalman Filter state transition and process noise equations
Kalman Filter state vector prediction equation
Predicted Covariance Equations - Kalman Filter

Step 4: Compute the Kalman Gain

The Kalman Filter computes a Kalman Gain for each new measurement that determines how much the input measurement will influence the system state estimate. In other words, when a really noisy measurement comes in to update the system state, the Kalman Gain will trust its current state estimate more than this new inaccurate information.

This concept is the root of the Kalman Filter algorithm and why it works. It can recognize how to properly weight its current estimate and the new measurement information to form an optimal estimate.

K = PPHT (HPPHT + R)-1 Eqn. 4-1

Step 5: Estimate System State and System State Error Covariance Matrix

The Kalman Filter uses the Kalman Gain to estimate the system state and error covariance matrix for the time of the input measurement. After the Kalman Gain is computed, it is used to weight the measurement appropriately in two computations.

The first computation is the new system state estimate. The second computation is the system state error covariance.

xk = xp + K(zk – Hxp) Eqn. 5-1
Pk = PP – KHPP Eqn. 5-2
Kalman Filter Estimation Equations

The state estimate computed above is the only state history the Kalman Filter retains. As a result, Kalman Filters can be implemented on machines with low memory restrictions.



Next Steps

I hope this post allowed you to see how amazing the Kalman Filter is. And when it is broken up into parts, it is not that intimidating.

In conclusion, the Kalman Filter is a generic process for optimal state estimation. It is used in a variety of applications that require accurate estimation. So now that you know what it is and how it works, go out and use it in your projects!

If you enjoyed reading this post, please share it with your friends on your favorite social network! Thanks for reading!


在互联网大厂做算法有些年头了,带过推荐、搜广。面试过没有一千也有八百个候选人,从应届生到工作多年的老鸟。线性代数这东西,怎么说呢,就像是内功。平时看不见摸不着,但真动起手来,一个招式是花拳绣腿还是力道千钧,内行一眼就能看出来。

十来年前我还在学校啃书的时候,也觉得线性代数就是解方程、算行列式、求个特征值。直到后来在工作中被现实反复捶打,才明白当年老师说“线性代数是现代数学的基石”这话的分量。

所以,本科阶段线性代数“学到家了”是个什么状态?

不再把矩阵看作是一堆数字组成的表格,而是把它当成一个“动作”或者一个“空间”的时候,你就开始入门了。

学到家的状态,就是能自如地在三种视角下切换:

  1. 方程组视角: 最原始的视角,一堆数,解未知数。这是线性代数的起点,也是最不重要的终点。停留在这,你就是个计算器。
  2. 向量空间与变换视角: 核心视角。一个向量,不再是(x, y, z)三个数,而是空间中的一个箭头,一个点。一个矩阵,不再是m x n个数,而是一个函数,一个动作,它能把一个向量(或者整个空间)旋转、拉伸、投影到另一个地方。这是理解一切的关键。
  3. 数据视角: 这是在应用中最常见的视角。一个矩阵,就是一张数据表。每一行是一个样本,每一列是一个特征。对这个矩阵的操作,就是在对数据进行变换和提取信息。

所谓“学到家”,就是当你看到一个矩阵A,脑子里能立刻浮现出它所代表的那个“动作”是什么样的。是把空间压扁了(奇异矩阵)?还是只是旋转了一下(正交矩阵)?还是在某些方向上进行了缩放(对角矩阵)?这个动作最重要的方向是哪里(特征向量)?缩放的力度多大(特征值)?

好了,不扯玄的,分专业说点具体的。

纯数学专业:追求抽象、结构和证明

对纯数选手来说,本科线代是未来学习抽象代数、泛函分析、微分流形这些大课的“新手村”。你的目标不是解题,是理解结构。

学到家的标准是什么?

脱离具体坐标系思考。 你要理解,向量空间是一个抽象的集合,满足那八条黄金法则就行。基和坐标只是我们为了方便理解和计算,给这个抽象空间安上的一套“度量衡”。一个线性变换,本质上是独立于坐标系存在的。Ax = b 这个式子,换一组基,A和x和b都会变成别的样子,但它们代表的变换、向量和结果向量之间的关系是永恒的。能理解这一点,你就脱离了“计算”的低级趣味。

对偶空间(Dual Space)不陌生。 这是个坎,很多人学了就忘。但它非常重要。当你能自然地把一个向量空间V和它的对偶空间V*联系起来,理解线性泛函(linear functional)是怎么回事,那说明你对“线性”这个概念的理解上了一个层次。

张量积(Tensor Product)有个初步概念。 本科不一定要求很深,但至少要知道这玩意是干嘛的,它是一种构造新线性空间的方式,是把多个线性空间“融合”在一起的工具。这在后面的物理、几何里会反复出现。

纯数的目标是,给你任何一个满足线性公理的“空间”,不管是函数空间还是别的什么奇怪玩意,你都能用线性代数的框架去分析它。你的世界里,万物皆可为向量。

应用数学/计算数学

搞应用的,线代就是你吃饭的家伙。你的重点是把理论和大规模计算结合起来。

学到家的标准是什么?

矩阵分解是你的肌肉记忆。 LU, QR, SVD,这几个分解必须烂熟于心。你不仅要知道它们是怎么算的,更要知道它们各自的几何意义和应用场景。比如,为什么解线性方程组用LU分解效率高?为什么做最小二乘法要用QR分解,因为它涉及到正交化,几何上更稳定?

SVD(奇异值分解)必须熟练掌握。 如果你说精通线代,但SVD说不明白,那基本等于白说。SVD把一个复杂的变换,分解成“旋转-缩放-再旋转”这三个基本动作。这个思想太优美了。从降维(PCA的底层)、数据压缩(图像压缩)、推荐系统(矩阵填补),到NLP里的LSA,SVD无处不在。你要能把SVD的公式和它在这些应用里做的事情对应起来。

理解“条件数”(Condition Number)和数值稳定性。 这是应用数学和纯数学的分水岭。纯数里,可逆就是可逆。但在计算机里,一个接近奇异的矩阵(病态矩阵),会让你的算法因为一点点输入误差就得到谬以千里的结果。理解条件数,就是理解你的算法面对真实世界数据的“抗噪能力”。

我以前做推荐算法,有个核心问题是用户-物品评分矩阵的补全。这个矩阵巨大且稀疏。直接处理是不可能的。怎么办?一个经典思路就是用SVD的变种(比如FunkSVD)。我们假设这个巨大的评分矩阵,其背后真正的“信息”是低秩的,也就是说,可以用两个小得多的矩阵U和V的乘积来近似它(R ≈ U * V.T)。这里的U可以理解为用户对不同隐因子的偏好(比如电影的“喜剧因子”、“动作因子”),V可以理解为物品在这些隐因子上的分布。我们通过优化来找到最好的U和V,这个过程,本质上就是把SVD的思想用在了充满缺失值的数据上。你不理解SVD的低秩近似思想,就不可能想出这种方法。

统计学/数据科学:数据几何学

统计学里的线性代数,是用来描述和理解数据关系的语言。

学到家的标准是什么?

投影(Projection)是核心。 线性回归,本质上就是把你的目标变量向量,投影到由你的特征向量张成的那个子空间里。理解了投影,你就理解了最小二乘法的几何本质,你就知道为什么残差向量要和特征空间正交。这些都是几何图像,而不是一堆求偏导的公式。

二次型(Quadratic Form)和正定矩阵。 协方差矩阵就是一个典型的半正定矩阵,它定义了数据分布的“形状”。一个多维高斯分布,它的“等高线”就是个椭球,这个椭球的轴向和长短,就由协方差矩阵的特征向量和特征值决定。看到一个协方差矩阵,你脑子里应该能浮现出数据大致是个什么“蛋形”。

PCA(主成分分析)能自己推出来。 PCA不是一个简单的降维工具。它是线代思想的完美体现。从统计上看,是找到方差最大的方向;从线性代数上看,就是对协方差矩阵进行特征值分解。特征向量就是新的坐标轴(主成分),特征值就是数据在这些新坐标轴上的方差。你能把这两个视角统一起来,并且能解释清楚为什么要这么做(为了用最少的维度保留最多的信息),才算懂了。

计算机科学(特别是AI/ML):高维空间的魔术师

CS这边,尤其是搞AI的,线代是绝对的命脉。我们每天都在和几百、几千甚至几百万维的向量打交道。

学到家的标准是什么?

向量化思维(Vectorization)。 你写的代码里,应该几乎看不到for循环去处理单个数据元素,而是整个整个地处理向量和矩阵。这不仅是为了利用GPU的并行计算能力,更是一种思考方式的转变。比如计算一堆向量两两之间的余弦相似度,你应该想到的是一次矩阵乘法,而不是两层for循环。

理解各种“积”。 内积(dot product)代表相似度/投影。外积(outer product)可以用来更新神经网络的权重矩阵。哈达玛积(Hadamard product,element-wise product)在各种门控机制(比如LSTM)里会用到。你要清楚什么时候该用哪种积。

矩阵求导(Matrix Calculus)。 神经网络的训练(反向传播)本质上就是个巨大的、多元函数对矩阵(权重)的求导链式法则。你不需要每次都手推,但你必须理解那个梯度是怎么一层层传递回来的,为什么会有维度匹配的要求。这能帮你调试复杂的网络结构。

再举个例子,NLP里的词嵌入(Word Embeddings),比如Word2Vec。它干的事,就是把每个词,从一个稀疏的one-hot向量(维度是整个词典大小,几十万维),映射到一个稠密的低维向量(比如300维)。这个映射,其实就是一个巨大的权重矩阵。而这个模型训练的目标,就是让意思相近的词,它们对应的300维向量在空间中的距离也相近(比如vector('king') - vector('man') + vector('woman') 约等于 vector('queen'))。你看,一个看似神奇的语言问题,被转化成了一个纯粹的、高维向量空间里的几何问题。没有这种空间和变换的思维,你连论文都看不懂。

怎么才能达到这个境界?(附资源)

光说不练假把式。我推荐点我当年确实从中受益匪浅的东西。

1、打好直觉基础:3Blue1Brown的《线性代数的本质》

这个必须放在第一个。如果我当年上大学时有这个系列,能省我无数的弯路。Grant Sanderson用动画把线代的几何直觉讲得透透的。看完这个,你再去看任何一本教材,都会觉得豁然开朗。什么行列式是面积/体积的变化率,什么特征向量是变换中“不动”的方向,这些核心思想,他给你讲明白了。这个没得商量,所有人都得看,反复看。

视频地址:。

关于3Blue1Brown的补充:视频终归只是一种呈现方式,真正深入消化理解,还得依靠笔记和文字。市面上已经出现不少针对3Blue1Brown系列的笔记整理,目的是帮助大家系统复盘内容、补充细节,也方便查找、回顾与复习。值得一提的是,有一批笔记,不只单纯翻译,还针对视频中略过的部分补充了更多细致的推导和背景知识。下面,我结合个人体验和观察,给大家介绍一份我认为值得参考的中英文笔记。(注:3Blue1Brown的讲解固然精彩,但它并非万能。它的核心价值在于建立几何直觉,而非替代传统学习中的计算训练和证明逻辑,可以配合《线性代数的几何意义》,有奇效!! 注意是西安电子科技那本,这本书籍我也放在下面这个链接里面了)

2、建立系统框架:Gilbert Strang的《Introduction to Linear Algebra》及MIT公开课

Strang教授的书是应用导向的,没那么多数学八股。他会从四个核心子空间(列空间、零空间、行空间、左零空间)的角度来构建整个线代大厦。这套框架对于理解数据和方程组非常有帮助。他的公开课视频也很经典,老先生讲得很有激情。这本书和视频,我至少刷了不下三遍。

3、追求严谨证明(数学系可选):Sheldon Axler的《Linear Algebra Done Right》

这本书是数学系学生的最爱。它的特点是上来就干线性变换,把行列式扔到最后才讲。这是一种更现代、更抽象的讲法,能帮你更好地理解线性变换的本质,而不是陷入计算细节。如果你不是搞纯数的,可以不看,但如果你想真正理解线代背后的数学结构,这本书是必读的。

4、动手实践,自己造轮子

别光看。用你熟悉的语言(Python+NumPy就很好),亲手实现一遍矩阵乘法、LU分解、QR分解。你甚至可以尝试去实现一个简单的SVD。这个过程会逼着你去理解算法的每一步细节,比看书看十遍都管用。你会发现理论上一个简单的转置,在代码实现里可能涉及到内存访问效率的大问题。这就是理论和实践的差距。

本科阶段的线性代数,所谓“学到家了”,绝对不是指你能做出书上所有最难的习题。

作为一名在数学之路上探索踩坑了很久的人,深知优质教材对于数学学习非常重要。因此,精心搜集了76本国外经典数学教材,涵盖代数、几何、分析、概率等多个数学分支,希望能帮助热爱数学的同学们构建更完善的知识体系。

此外,见证国内数学学科发展的“活化石”,科学出版社的《现代数学基础丛书》自1981年诞生以来,已持续更新至204卷,堪称中国数学界的“四库全书”,也一并整理好了。

文中资源均收录于:

而是你已经把线性代数从一门“数学课”,变成了一种“世界观”。

你看到一堆数据,想到的是它在多维空间中的分布和几何结构。
你看到一个复杂的系统,会下意识地去想能不能用线性的方式去近似和描述它。
你看到一个算法,能穿透表面的逻辑,看到它背后矩阵运算的骨架。

达到这种状态,无论你将来是继续深造,还是去工业界搬砖,这门内功都会让你受益终身。路很长,但风景绝对值得。