Humanoid State Estimation in RoboCup
This work presents a comprehensive pipeline for kinematic state estimation of humanoid robots in the RoboCup competition. The dynamic and sensor-limited environment of RoboCup poses significant challenges for accurate state estimation, including unstable walking surfaces, frequent collisions, and restrictions on external sensing modalities like LiDAR and GPS.
Introduction
For a humanoid robot, locomotion involves controlling the unactuated floating base to a desired location in the world. Before a control action can be applied, an accurate estimate of the position and orientation of the floating base is required. In the context of RoboCup, the artificial grass surface and collisions with other robots complicates stable walking, making state estimation challenging due to falls, noise and drift over time.
Kinematic state estimation in RoboCup can be broken down into two major areas:
- Odometry: Estimation of the robot's pose with respect to an inertial world frame
- Localization: Estimation of the robot's pose with respect to the soccer field frame
Odometry
Since the contact configuration of a robot during walking is always changing, we construct a representation of the system in a general World-fixed inertial frame. We consider two reference frames: World-fixed inertial frame attached to the ground and body-fixed frame rigidly attached to the robot midway between the robot's hip yaw joints.
The homogeneous transformation matrix capturing the relationship between these frames is given by:
where is the rotation matrix from the body frame to the world frame, and is the position vector of the body frame with respect to the world frame.
To estimate the orientation of the floating base body frame, we use the Mahony Filter, a simple and efficient approach for real-time attitude estimation. The Mahony filter has only two tuning parameters, the PI compensator gains and , making the tuning process straightforward.
For floating base translation estimation, we use the anchor point strategy. We select an anchor point , located on the robot's foot sole and assume that this point is grounded at position in the world frame whenever it serves as the support foot. In the floating-base frame, the position of this anchor point is known through forward kinematics allowing continuous tracking of the floating base translation relative to the world frame.
Visual Landmark Detection
Our localization approach relies on visual landmarks detected using two computer vision methods:
- YOLOv8n: State-of-the-art real-time object detection for identifying objects and key landmarks
- Visual Mesh: Highly efficient semantic segmentation network specifically tuned for detecting field lines
The landmarks include YOLOv8n-detected goal posts, T, L, and X intersections, and field line points detected by the Visual Mesh.
Without loss of generality, through a combination of our camera model and the extrinsic matrix , the pixel-based detections can be projected onto the field plane. A detection in world space is given by:
where is the unit vector associated with a pixel obtained through our camera model, is the position of the camera in the world frame, is the rotation matrix from the camera frame to the world frame, and is the basis vector .
Performance Benchmarks
Method | Simulation (i7-11850H) | Robot (i7-1260P) |
---|---|---|
YOLOv8n | 47 FPS | 66 FPS |
Visual Mesh | 152 FPS | 259 FPS |
Localization
The localization problem can be formulated as estimating the pose of the field relative to the world frame. Due to the flat nature of the soccer field, this can be fully described by the transformation matrix:
where is a vector containing the x-y translation and yaw rotation.
We propose a localization method leveraging nonlinear optimization to compute the optimal state in real-time. Our framework employs the derivative-free algorithm COBYLA (Constrained Optimization BY Linear Approximations), integrating multiple cost components and constraints.
The optimization problem is given by:
where are the lower and upper bounds on the state vector .
Cost Function Components
The overall cost function is defined as:
- Field Line Alignment Cost : Measures how well the observed field line points align with actual field lines:
where is the number of observed field line points, represents the -th field line point in the world frame, transformed into the field frame via , and is a function which provides the distance to the nearest field line using a precomputed distance map.
- Landmark Cost : Assesses the alignment of observed field line intersections and goal posts with known positions:
where is the number of associated landmarks, is the known position of the -th landmark in the field frame, and is the observed position of the -th landmark in the world frame.
- State Change Cost : Penalizes significant deviations from the prior state estimate:
where is the prior state estimate (initial guess).
Results
After each optimization step, the solution is filtered using a standard Kalman filter to smooth state estimates over time. Our method achieves the lowest RMSE errors compared to other approaches:
Method | x [m] | y [m] | yaw [deg] |
---|---|---|---|
Particle Filter | 0.0563 | 0.0890 | 1.6180 |
NLopt (field lines only) | 0.0503 | 0.0563 | 0.8389 |
NLopt (all cost terms) | 0.0500 | 0.0559 | 0.8273 |
On average, the optimization routine and filtering step take only 2 milliseconds to complete, making it suitable for real-time applications on resource-constrained humanoid robot hardware.
Key Contributions
- Integrated odometry approach combining Mahony filter with anchor point strategy
- Real-time visual landmark detection using YOLOv8n and Visual Mesh
- Novel nonlinear optimization framework for localization with multiple cost terms
- Efficient implementation achieving sub-5ms computation time
- Robust performance in challenging RoboCup environments with limited sensors