Reinforcement Learning Boosts Exoskeleton Squat Stability

Robotic lower extremity rehabilitation exoskeletons face a persistent challenge: maintaining stability and robustness during assisted motions for mobility-impaired users. Variations in disability level, unpredictable human-exoskeleton interaction forces, and external perturbations can destabilize conventional controllers, risking falls. Addressing this, researchers developed a reinforcement learning (RL)-based motion controller designed to enable collaborative squatting exercises with high efficiency, stability, and robustness.

Image Credit to depositphotos.com

The exoskeleton distinguishes itself with powered ankle actuation in both sagittal and frontal planes, plus multiple foot force sensors to directly measure ground reaction forces (GRFs) and calculate the center of pressure (CoP). CoP is a critical balance indicator, and here it is integrated into the controller’s state inputs and reward functions. This allows the RL policy to actively maintain balance throughout motion. To enhance robustness, training incorporates dynamics randomization and adversarial force perturbations, including large simulated human interaction forces.

The mechanical design uses a lightweight 20.4 kg frame printed in Onyx reinforced with continuous carbon fiber. It offers 14 degrees of freedom (DoF), including powered hip, knee, and 2-DoF ankle joints per leg. Four high-capacity 3-axis force sensors per foot measure GRFs for precise CoP estimation. Hip and knee joints employ bevel gears for compactness, while ankle actuation uses parallel motors with universal and screw joints to achieve dorsiflexion/plantarflexion and inversion/eversion.

To simulate realistic conditions, the exoskeleton model is integrated with a full-body human musculoskeletal model comprising 284 musculotendon units. Passive muscle forces are modeled via Hill-type dynamics, and spring elements represent strap connections at hip, femur, and tibia, generating interaction forces during motion.

The RL controller is implemented as a multi-layer perceptron with three hidden layers, trained using Proximal Policy Optimization (PPO). Inputs include recent joint positions, velocities, action history, future target poses, and CoP history. Outputs are joint position targets processed through low-pass filtering and PD control to generate torques. The reward function blends pose, velocity, end-effector, root, center-of-mass, CoP, and torque terms, encouraging accurate motion imitation, balance maintenance, and energy efficiency.

Dynamics randomization during training varies parameters such as mass, friction, and sensor latency, preparing the policy for sim-to-real transfer. Numerical experiments test three scenarios: baseline squatting without perturbations, squatting under large random perturbations, and squatting with human-exoskeleton interaction forces.

In the baseline case, the controller produced symmetric squatting cycles with average joint angle tracking error around 1.22°, keeping CoP within a defined stable region. Under perturbations up to 200 N at the hip and 100 N at femur and tibia, the controller maintained balance, with tracking error rising modestly to 2.64°. Even with forces 75% greater than training levels, CoP remained within safe bounds.

In the human-interaction scenario, strap forces and passive muscle dynamics introduced realistic disturbances. The controller sustained stable squatting with tracking error near 2.49°, peak torques well below actuator limits, and balanced CoP trajectories between left and right feet. Testing across 200 varied dynamic environments showed high rewards for end-effector tracking, joint position accuracy, and CoP stability, with no falls.

The study demonstrates that incorporating CoP feedback into RL-based control yields robust balance under diverse conditions. The approach avoids the limitations of trajectory tracking and model-based predictive control, offering adaptability without precise dynamics modeling. Motion imitation streamlines learning for different tasks, requiring only a reference trajectory rather than complex reward engineering.

While sim-to-real transfer remains challenging, dynamics randomization and potential domain adaptation strategies can bridge the gap. The framework’s versatility extends beyond squatting to motions like sit-to-stand, walking, and stair climbing, with minimal changes beyond generating new reference motions. By combining advanced sensing, lightweight mechanical design, and RL-based control, this work points toward exoskeletons capable of safe, independent operation for a wider range of rehabilitation activities.

Leave a Reply

Your email address will not be published. Required fields are marked *