Reinforcement Learning Drives Ultrasound Microrobot Precision
Ultrasound-powered microrobots are emerging as a promising solution for precise manipulation at the microscale, with potential applications ranging from microassembly to targeted medical interventions. Traditional propulsion methods—chemical fuels, electric fields, or light—often suffer from limitations in biocompatibility, speed, or in vivo applicability. Magnetic actuation offers precision but requires complex fabrication. Ultrasound propulsion, by contrast, delivers strong forces, low power consumption, and safety for biological environments, yet has historically struggled with navigation control.

Recent work has addressed this challenge by integrating reinforcement learning into ultrasound microrobot systems. The experimental platform centers on a microfluidic channel fabricated from polydimethylsiloxane (PDMS), flanked by four piezoelectric transducers (PZTs) acting as actuators. Microbubbles introduced into the channel self-assemble into swarms under acoustic forces, driven by primary radiation forces for guidance and secondary Bjerknes forces for clustering.
The control architecture employs an in-house Python pipeline linking the function generator, microcontroller, camera, and PZTs. Images captured at 33 frames per second are processed to detect and track swarms in real time. This visual feedback enables a model-free reinforcement learning algorithm to steer the microswarm along arbitrary paths toward target points. Initially, the action space—defined by voltage amplitude, frequency, and PZT index—was too large for efficient convergence. By pruning it to a single dimension (PZT index) and fixing optimal voltage and resonance frequency for each transducer, the navigation problem was simplified to four possible actions.
A comprehensive grid search was conducted to characterize swarm dynamics, producing nearly 100,000 images across 984 parameter combinations. Analysis revealed resonance frequencies for each PZT and a linear relationship between voltage and swarm velocity. These data informed constraints that maximized swarm speed, reducing computational complexity and improving reproducibility.
The reinforcement learning implementation is based on Q-learning, designed to maximize cumulative reward in a finite Markov decision process. The spatially nonlinear nature of the microswarm-PZT system—affected by microbubble variability, contaminants, and evolving acoustic wavefronts—necessitated a hybrid approach. A global dynamics matrix (Qglobal) was constructed from experimental data to predict swarm movement across the channel. However, reliance on global dynamics alone led to swarms becoming trapped in local minima.
To address this, a local dynamics matrix (Qlocal) was introduced, updated in real time with a tunable learning rate to adapt to environmental changes. Combining Qglobal and Qlocal into a weighted matrix allowed the policy to balance generalized behavior with local adaptability. The policy selects the optimal PZT activation to minimize the distance to the target position at each step.
Demonstrations included navigating swarms to trace complex paths, such as spelling “ARSL” and “ETH,” and completing multiple laps of a circular trajectory. The hybrid dynamics approach reduced the likelihood of swarms becoming stuck while maintaining efficient long-distance navigation.
The experimental setup leveraged PDMS microfluidic channels produced via soft lithography, bonded after plasma pretreatment. Microbubble contrast agents, with diameters averaging 2.5 µm, were prepared from a lyophilized sulfur hexafluoride phospholipid powder and saline solution. Their stability and size distribution made them suitable for acoustic manipulation.
Image processing relied on background subtraction, Gaussian blurring, and Canny edge detection, followed by contour analysis to isolate swarms from contaminants. Tracking algorithms, specifically OpenCV’s discriminative correlation filter with channel and spatial reliability, ensured continuity in swarm position data.
This reinforcement learning framework is propulsion-agnostic and could be adapted to other ultrasound microrobot systems. Future directions include programmable control of swarm size, manipulation of living cells using standing acoustic wavefields, and integration with advanced imaging modalities such as ultrasound arrays, photoacoustic, or two-photon microscopy for 3D in vivo navigation. Such capabilities could enable microrobot tracking and manipulation within blood vessels, addressing a longstanding challenge in biomedical microrobotics.
