Notes on Mobile Robot Localization in Autonomous Mobile Robots

Introduction

Autonomous robots need to move. For moving they need to know where they are and where they are going. This is the job of localization. That is given a map(probably constructed by humans) can I(robot) determine where I am? If I know where I am and I know where to go(destination on map) then I can design a path to get there. Path planning is a topic in itself and in this post I(Author) will write my notes on localization as collected from different sources(duly noted), experiments in software and hardware to cement my understanding of the topic.

The challenge of localization

Noise

If perception was perfect and map was accurate the problem of localization would be trivial. For example if GPS was fully accurate to the resolution(accuracy of position, for example 1mm for table robots and perhaps 10 cm for an AGV) demanded from the robot and we had a detailed map of the environment we would know where we are and where to go. However, sensors are not accurate and there is an inherent noise in the sensor readings. That is $x = h + \mathcal{N(\mu, \sigma)}$ where $h$ is the actual observation and $\mathcal{N(\mu, \sigma)} $ is noise, which is characterized here as Gaussian white noise. However sometimes the noise may not be white and then it is called colored noise.

In particular, if each sample has a normal distribution with zero mean, the signal is said to be additive white Gaussian noise. Wikipedia With a white noise audio signal, the range of frequencies between 40 Hz and 60 Hz contains the same amount of sound power as the range between 400 Hz and 420 Hz, since both intervals are 20 Hz wide. The frequency spectrum of pink noise is linear in logarithmic scale; it has equal power in bands that are proportionally wide. This means that pink noise would have equal power in the frequency range from 40 to 60 Hz as in the band from 4000 to 6000 Hz. Wikipedia

In addition to sensor noise, actuators have improper actuation which is another form of noise. Wheels slip and friction coefficients are unknown. Odometry calculations using dead reckoning is known to diverge from the true state over time no matter how accurate the sensors are.

Aliasing

Another problem with sensor readings is that there is a many to one relationships between environmental features and the sensor readings. That is the robot sees one value which could then correspond to feature 1 or equally to feature 2 and the robot has no more information how to discriminate between these two. Imagine using a sonar as the distance sensor. If we detect an obstacle is that a wall or a human?

In effect, even if the sensors were error free(which is just a theoretical construct) there is still lack of information due to aliasing that we cannot fully discriminate just based on one sensor reading and hence must develop techniques to use multiple sensor reading and fuse them somehow to be able to make out the reality.

Error model for odometric position estimation

You can get high quality sensors which should limit the noise and improve their resolution but error is not something which can be eliminated. Every sensor has an error model and for proper localization it is important to characterize this model. This is shown for a differential drive robot below

Pose can be represented as a vector

\begin{equation} p = \begin{bmatrix} x \cr y \cr \theta \end{bmatrix} \end{equation}

For a know starting pose and the control input (distance moved by left and right wheel) we can estimate the next pose. This can be written as

\begin{equation} p^´ = p + \begin{bmatrix} \Delta s cos(\theta + \frac{ \Delta \theta}{2}) \cr \Delta s sin(\theta + \frac{\Delta \theta}{2}) \cr \Delta \theta \end{bmatrix} \end{equation}

This gives a rough estimate of where the robot is now. But how rough? What's the uncertainty in the measurement? To quantify the uncertainty we can derive the covariance matrix $\Sigma_{p^´}$ if we know $\Sigma_p$ (assume 0 to start with ) and if we know $\Sigma_\Delta$ which is the covariance of the control input and is determined experimentally.

What we are saying is that if we know the uncertainty of errors in the control input and we know the errors from the previous estimate of the pose then we can calculate the error in the current estimate of the pose.

We can utilize the error propagation law to arrive(after a lot of matrix manipulations and calculus) at

\begin{equation} \Sigma_{p^´} = \triangledown_p f \cdot \Sigma_p \cdot \triangledown_p f^T + \triangledown_{\Delta_{rl}} f \cdot \Sigma_{\Delta} \cdot \triangledown_{\Delta_{rl}} \cdot f^T \label{eq:errormodel} \end{equation}

Error Propagation Law: If we know $Y_i = f(X_i)$ then the uncertainty of the output is related to the uncertainty of the input as $C_Y = F_X \cdot C_X \cdot F^T_X$ where $C_X$ is the covariance of the input uncertainties and $C_Y$ is the covariance of the output uncertainties and $F_X$ is the Jacobian matrix(partial derivatives of the $f$ with the input variables $X$)

Now we have the measure of uncertainty in the position estimate. As we go forward each time step the uncertainty grows. This has be now to be corrected using the onboard sensors.

Probabilistic map based localization

Map representation

A map is a representation of the environment. The fidelity of the map limits what the robot can achieve using this map. If the map is wrong then the robot's belief about the environment is wrong. If the resolution of the map is coarse then the robot will only achieve coarse goals. The question then is how should we represent the map?

Continuous 3D real world can be represented as a continuous 2D space where the polygons(simple convex) represent the obstacles. Another approach is to use infinite lines as extracted using laser range finding readings.

Discrete Real world can be tessellated into a grid of chosen resolution. This resolution then governs what is the exactness of operations. One popular approach is occupancy grid where each cell is either filled or empty and is generated for example using range finding sensor as it hits the environment. The obvious problem with the occupancy grid approach is that the memory requirements grows with the size of the map and the chosen resolution and fine grids may be untenable. This can be solved to some degree with adaptive tessellation.

Belief representation

The other question is how should the robot identity it's own position on the map? Due to uncertainty of sensor readings and position estimation it makes sense not to be too certain of one's position and instead maintain a “belief” in one's position with a certain probability distribution. We don't exactly know our errors but acknowledging that we have errors is a good position to start correcting them(just like in life).

The localization problem

Given a map of some fidelity and an error model of the uncertainty in position estimates using odometry how can we use the onboard sensors to generate a belief of possible poses the robot might be in? This is the localization problem. We predict the robot position using the odometry and then use perception sensors to correct the estimate. The prediction update increases the uncertainty as explained using \eqref{eq:errormodel} and using sensor measurements reduce this error.

Kalman Filter

A typical robot has many sensors onboard, each with it's own error model and uncertainty. How can we combine these uncertain measurements which still have information content to arrive at a robust estimation? That is the problem of sensor fusion and Kalman filter is an effective solution.

In basic Kalman filter the system is assumed to be linear with white Gaussian noise. We model the robot belief $bel(x_t)$, odometric error model and measurement model as Gaussians with a mean($\mu$) and variance($\sigma^2$). In multivariate systems(as in practical robots) ($\mu$) is a vector and instead of ($\sigma^2$) we use a covariance matrix ($\Sigma_t$)

If $x_1$ and $x_2$ are normally distributed random variables and $y = f(x_1, x_2)$ then $y$ is also normally distributed with a certain mean and covariance which is a linear combination of the input mean and variance. $f$ is used to represent odometric position update with robot previous position $x_{t-1}$ and control signal $u$ as input.

If $q$ is the robot position and $p_1(q)$ (normal Gaussian with $\hat{q}_1$ and $\sigma_1^2$) is the robot's belief after position prediction and $p_2(q)$ (normal Gaussian with $\hat{q}_2$ and $\sigma_1^2$) the belief after sensor measurement then $p(q)$, final posterior belief is given as the product of the first two belief states which is also a Gaussian (with mean $\hat{q}$ and covariance $\hat{P}$)

\begin{align} \hat{q} &= q_1 + K(q_2 – q_1) \nonumber \\ \hat{P} &= P – K \cdot \Sigma_{IN} \cdot K^T \label{eq:kalman} \end{align}

where $K = P(P+R)^{-1}$ is the Kalman gain, $(q_2 – q_1)$ is the innovation, $\Sigma_{IN}$ is the innovation covariance and and $P$ and $R$ are the covariances of $q_1$ and $q_2$ respectively.

Kalman gain is a measure of how much to weigh the prediction vs the measurement.

Process of Kalman filter localization

  1. Use the odometry to make a prediction of the position $x_t$ given the previous position $x_{t-1}$ and the motion model. Also calculate the uncertainty of measurement. This step leads to increase in the position uncertainty.
  2. Make an observation by collecting the sensor data and extract appropriate features – like lines or points.
  3. Make a measurement prediction. That is given the predicted position $x_t$ and the map the robot generates a list of features it expects to see at this position.
  4. Match what was predicted to be observed and what was actually observed.
  5. Use \ref{eq:kalman} to fuse the information of the matches into a best estimated position.

Extended Kalman Filter

Real systems are not linear, which the basic Kalman filter assumes. In non linear cases we can always linearize the system around the operating point. If $f$ is non linear then $y$ is not normally distributed and a first order taylor approximation is needed

\begin{equation} y \cong f(\mu_1, \mu_2) + F_{x_1}(x_1 – \mu_1) + F_{x_2}(x_2 – \mu_2) \end{equation}

where $F_{x_1}$ and $F_{x_2}$ are the Jacobians of $f$ about $(\mu_1, \mu_2)$

Similarly the covariance of $y$ $\Sigma_y = F_{x_1} \Sigma_1 F_{x_1}^T + F_{x_2} \Sigma_2 F_{x_2}^T$

Unscented Kalman Filter

Simultaneous localization and mapping (SLAM)

Above we discussed how to localize which requires a map, perhaps made by humans. Maps are expensive to make and they are subjective. The aim of SLAM is for the robot to build and localize itself in an environment with just the onboard sensors. That is to recover both the robot path and the environment map with nothing but exploration using the onboard sensors.

All sensors have noise and not having a ground truth in the form of map makes the problem of SLAM harder than just localization. The errors introduced in the position using odometry are combined with the errors introduced due to measurement. The map is correlated to the position estimate and the reverse is also true if the robot tries to localize using imprecise map features. The robot has to move around and when it detects the features it has seen before, a condition called loop closure then the robot uncertainty can reduce.

If the robot pose at each instant is defined as $x_t$ then the robot path can be defined as the set $X_T = { x_0, x_1, x_2, \ldots x_T }$. The control input(or proprioceptive sensor readings) is $U_T = {u_0, u_1, u_2, \ldots u_T}$ and the true Map is denoted as a set of features $M = {m_0, m_1, m_2, \ldots m_{n-1}}$. The observations/measurements is a set $Z_T = {z_0, z_1, z_2, \ldots z_T}$.

The task at hand is to recover $M$ and $X_T$ given $U_T$ and $Z_T$. That is formally

\begin{equation} p(X_t, M | Z_T, U_T) \end{equation}

We need to know the probabalistic motion model as well as the probabalistic measurement model to calculate the conditional probability above.

Extended Kalman filter SLAM

EKF SLAM is exactly like EKF except that we use an extended state vector which contains both the robot pose and the position of all features in the map.

\begin{equation} y_t = [x_t,m_0 \ldots m_{n-1}]^T \end{equation}

As the robot moves the state vector and the associated covariance matrices are updated according to equations \ref{eq:kalman}. EKF SLAM needs to track a much large state vector and hence is more computationally expensive. The measurement model and updates is exactly the same as EKF, but the prediction of course does not update the map features, just the pose of the robot.

To start with the covariance matrix is sparse(non diagonal elements are zero). As the robot moves the robot pose and the features become correlated. The more observations are made, the more the correlation will grow and a large correlation is an indicator that the EKF SLAM is working.

The problems with EKF SLAM is that the computation grows quadratically in the number of features. Various methods to control that have been proposed for example submaps. Linearization creates it's own set of inconsistencies because the underlying system is non linear. During loop closure inconsistent association of features can happen, especially due to sensor aliasing which can lead to EKF subpar performance.

Visual SLAM

Instead of using laser range finders which have the range and bearing measurement, we can use bearing only sensor like a camera to also do SLAM. We can also assume a constant velocity model and not use any odometry sensors either. The state vector at time $t$ is composed of the pose, the velocity(in the original VSLAM paper, quaternion representation is used to avoid singularities) and the map features. The predicted pose is simply integration using constant velocity model. In the measurement phase the camera pose is corrected based on the observation of the features. Basically the same as EKF updates noted earlier.

Graph SLAM

Assuming the robot pose and the features as nodes of an elastic net we can find a SLAM solution by computing the state of minimal energy of this net using common optimization techniques such as gradient descent. The edges between the nodes is modelled as soft nodes(like springs) which can be relaxed to find a solution. The main advantage of Graph SLAM is that the computation is linear in the number of features

Particle filter SLAM

Also called Rao-Blackwelized particle filter SLAM, in which we use random sampling of belief distribution to hold number of particles instead of a parametric belief like a Gaussian.

For every time step the particle filter maintains K particles. Each particle contains the estimate of the robot path and the estimate of the position of each feature of the map(represented with a mean and covariance). That is a particle is characterized as in \ref{eq:particle}

\begin{equation} \label{eq:particle} X_t^{[k]}; (\mu_{t,0}^{[k]}, \Sigma_{t,0}^{[k]}), (\mu_{t,1}^{[k]}, \Sigma_{t,1}^{[k]}) \ldots \end{equation}

The main thing to note is that each map feature has it's own mean and covariance and it updated using their own Kalman filter. Like vanilla Kalman filter, when the robot moves the odometry is used to update the pose to $x_t^{[k]}$ from $x_{t-1}^{[k]}$. When robot makes an observation we determine the probability of observing $z_t$ given the particle $x_t^{[k]}$ and all previous observations $z_{0 \rightarrow {t-1}}$ as shown in \ref{eq:impfactor}

\begin{equation} \label{eq:impfactor} w_t^{[k]} = p(z_t | x_t^{[k]}, z_{0 \rightarrow {t-1}}) \end{equation}

Then we replace the set of particles with the particles according to importance factor described above. The mean and covariance are updated according to standard Kalman filter equations \ref{eq:kalman}. The main benefit of this approach is that we do not have to linearize the system as we have to in EKF.