Seeing Clearly Underwater: Teaching Robots to Manipulate in Murky Depths

Author: Denis Avetisyan

A new imitation learning framework empowers underwater robots to perform complex tasks by intelligently adapting to challenging and variable lighting conditions.

Bi-AQUA integrates lighting-aware visual features with proprioceptive data from multi-view underwater observations and follower joint states to predict leader-side action chunks within a bilateral control loop, enabling nuanced underwater manipulation.

Bi-AQUA utilizes bilateral control, lighting-aware action chunking, and transformer networks to enhance visuomotor policy and force feedback in underwater manipulation.

Underwater robot manipulation is hampered by the significant challenges posed by variable lighting, color distortion, and limited visibility. This paper introduces ‘Bi-AQUA: Bilateral Control-Based Imitation Learning for Underwater Robot Arms via Lighting-Aware Action Chunking with Transformers’, a novel imitation learning framework designed to address these issues through explicit modeling of underwater lighting conditions. By integrating a hierarchical lighting adaptation mechanism within a bilateral control architecture, Bi-AQUA achieves robust performance in complex pick-and-place tasks. Could this approach pave the way for more autonomous and force-sensitive underwater robotic operations in challenging marine environments?

Illuminating the Challenge: Underwater Visual Perception

Robotic manipulation, a cornerstone of modern automation, is fundamentally reliant on precise visual feedback to understand and interact with the environment. However, this reliance faces a critical challenge when extended to underwater environments. The ocean’s inherent properties-variable lighting caused by depth and particulate matter, and the presence of turbidity-severely degrade the quality of images captured by underwater robots. Light attenuation diminishes signal strength, while scattering and backscatter introduce noise and reduce contrast, obscuring details and making it difficult for standard computer vision algorithms to accurately identify objects, determine their position, and assess their physical properties. This visual impairment necessitates the development of novel perception strategies that can overcome the limitations imposed by underwater conditions, enabling robots to perform complex tasks reliably and autonomously.

The underwater environment presents a unique challenge to visual perception due to the way light behaves in water. As light travels through water, its intensity diminishes through attenuation, effectively reducing the range of visibility. Simultaneously, light particles collide with suspended matter – like silt, plankton, or other particles – causing scattering, which diffuses the light and reduces image contrast. A specific type of scattering, backscatter, reflects light directly back towards the camera, creating a pervasive haze that further obscures details and introduces noise. These combined effects drastically degrade the performance of standard computer vision algorithms, designed for clear, consistent lighting conditions, making object recognition and manipulation significantly more difficult. Consequently, robotic systems operating underwater must contend with severely compromised visual data, demanding specialized image processing techniques to filter noise and reconstruct a usable representation of the surroundings.

Effective underwater robotic manipulation transcends simple visual acquisition; it demands sophisticated computational frameworks capable of deciphering distorted imagery. Unlike terrestrial robots operating in controlled lighting, underwater systems encounter exponential attenuation of light and pervasive scattering from particulate matter, fundamentally altering the appearance of objects. Consequently, robots must employ algorithms that actively compensate for these distortions – not merely detecting edges and shapes, but interpreting them within the context of diminished contrast and pervasive noise. This requires a shift from passive image analysis to dynamic adaptation, where robotic systems continuously refine their perception models based on real-time feedback and an understanding of the prevailing optical conditions, enabling robust grasping and manipulation even in highly turbid environments.

Autonomous operation of the Bi-AQUA system demonstrates successful navigation and task completion even under varying lighting conditions.

Bi-AQUA: A Lighting-Aware Framework for Underwater Robotics

Bi-AQUA is an imitation learning framework developed to address the challenges of underwater robotic manipulation. The system utilizes bilateral control, a method combining velocity and force control, to improve both the precision and force sensitivity of robotic actions. This approach allows the robot to react to external contact forces while simultaneously maintaining accurate positional control, crucial for delicate tasks in unpredictable underwater environments. By learning from demonstrations and incorporating bilateral control, Bi-AQUA aims to enable robust and adaptable underwater manipulation capabilities, surpassing the limitations of traditional position-only control schemes.

The Bi-AQUA framework utilizes a Lighting Encoder to address the significant impact of variable underwater lighting on robotic manipulation. This encoder processes RGB images to generate compact lighting embeddings, which are low-dimensional vector representations of the prevailing light conditions. These embeddings capture key characteristics of the underwater visual environment, providing the imitation learning process with crucial contextual information that would otherwise be absent. By explicitly representing lighting as a quantifiable feature, the system gains the ability to generalize across diverse underwater scenarios and improve the reliability of learned policies.

The Bi-AQUA framework utilizes a Lighting Token to incorporate learned lighting embeddings into a Transformer architecture. This token, a learnable vector, is concatenated with the standard input token sequence of the Transformer. By prepending or appending this Lighting Token, the model receives explicit information regarding underwater illumination as part of its input context. The Transformer then processes this augmented sequence, allowing it to condition its subsequent action predictions on the encoded lighting conditions. This integration enables the robot to dynamically adjust its manipulation strategies based on observed changes in underwater visibility and color casts, without requiring retraining for each distinct lighting scenario.

Experimental results demonstrate Bi-AQUA’s performance across varied underwater lighting scenarios. The framework achieved a 100% success rate in seven of the eight tested lighting conditions, indicating a high degree of operational reliability. Even when subjected to challenging blue light, which typically degrades performance in underwater robotics, Bi-AQUA maintained an 80% success rate. This suggests the explicit modeling of lighting conditions significantly improves the robot’s ability to perform manipulation tasks consistently, regardless of environmental illumination.

Bi-AQUA utilizes a multi-stage inference pipeline to process and interpret input data.

Refining Control: Advanced Methods for Robust Performance

A Conditional Variational Autoencoder (CVAE) is utilized to expand the range of possible robot actions beyond those explicitly demonstrated in training data. The CVAE operates by learning a latent distribution over action sequences, conditioned on observed states. This allows the robot to generate novel, yet plausible, actions by sampling from this learned distribution. By introducing stochasticity in action generation, the CVAE facilitates increased adaptability in dynamic and uncertain environments and supports more effective exploration of the state-action space, ultimately improving the robot’s ability to handle unforeseen circumstances and learn more robust policies.

Bi-ACT builds upon existing control methodologies by extending them to bilateral robotic systems, which are robots capable of interacting with an environment using two arms or end-effectors. This extension is critical for force-sensitive visuomotor learning, as it allows the robot to not only process visual information but also to accurately sense and respond to forces exerted during interaction. The bilateral configuration enables the robot to learn complex manipulation tasks requiring force control, such as assembly or insertion, by correlating visual input with tactile feedback. The architecture facilitates the development of control policies that account for both position and force errors, improving robustness and adaptability in dynamic environments.

The learning process is optimized through the implementation of the AdamW optimization algorithm, a variant of stochastic gradient descent incorporating weight decay for improved generalization. Simultaneously, Action Chunking is utilized to address the complexity of robotic manipulation by grouping individual actions into larger, more manageable units. This approach reduces the dimensionality of the action space, thereby accelerating learning and improving the stability of the visuomotor policy. By treating sequences of primitive actions as single, high-level actions, the algorithm minimizes the number of steps required for effective training and allows for the learning of more complex behaviors.

Comparative testing demonstrates Bi-AQUA achieves an execution time of 15.73 seconds, indicating a high degree of efficiency. This performance is statistically similar to human teleoperation, which averaged 15.39 seconds in the same trials. Notably, Bi-AQUA outperforms the Bi-ACT method, which required 20.17 seconds to complete the same task. These results suggest Bi-AQUA provides a computationally efficient solution without sacrificing performance relative to human operation and represents a substantial improvement over the baseline Bi-ACT implementation.

Bilateral control facilitates imitation learning by enabling the robot to react to and learn from physical interactions.

Towards Autonomous Underwater Systems: Expanding the Horizon

Bi-AQUA represents a substantial advancement in underwater robotics, specifically addressing the challenges posed by fluctuating light conditions which commonly degrade the performance of manipulation tasks. The framework achieves improved robustness and precision by actively modeling the way light behaves in underwater environments, rather than relying on static assumptions or extensive pre-calibration procedures. This allows robotic manipulators to reliably grasp and interact with objects even as ambient light levels change, a critical capability for real-world applications like underwater infrastructure inspection and marine archaeology. By effectively compensating for light-induced distortions, Bi-AQUA enables more consistent and accurate performance, paving the way for truly autonomous underwater operations and reducing the need for constant human intervention.

Traditional underwater robotic systems often demand meticulous pre-calibration to compensate for the distortions and inconsistencies caused by fluctuating light conditions – a significant limitation in the unpredictable ocean environment. Bi-AQUA addresses this challenge by directly incorporating a model of lighting variability into its core algorithms. This innovative approach allows the system to adapt to changing illumination without requiring extensive, time-consuming adjustments before each deployment. Consequently, the framework substantially simplifies the process of operating underwater robots in real-world scenarios, reducing logistical burdens and enabling more flexible, responsive interventions for tasks like infrastructure inspection, environmental monitoring, and deep-sea exploration. The reduction in calibration needs not only streamlines operation but also increases the reliability of robotic manipulation in visually complex underwater settings.

The development of robust underwater robotics is poised to revolutionize how humans interact with the aquatic world, and this framework enables a significant leap toward truly autonomous operation. Previously challenging tasks – such as inspecting submerged infrastructure like pipelines and offshore platforms, performing intricate maintenance on underwater equipment, and conducting detailed exploration of previously inaccessible environments – can now be undertaken with greater efficiency and reduced risk. By empowering robots to adapt to dynamic lighting and perform complex manipulations independently, this approach minimizes the need for constant human intervention and opens doors for long-duration missions and data collection in challenging underwater settings. This advancement promises not only cost savings but also access to critical information and resources previously beyond reach, fundamentally altering the landscape of oceanographic research, resource management, and underwater security.

The continued development of Bi-AQUA envisions extending its capabilities beyond current limitations through increased computational scaling and sensor fusion. Researchers aim to address more intricate underwater environments, including those with turbulent currents, reduced visibility, and complex object geometries. Integrating data from diverse sensor modalities – such as sonar, inertial measurement units, and high-resolution cameras – will provide a more comprehensive understanding of the surroundings, enabling the robotic system to make more informed decisions and execute tasks with greater autonomy. This multi-sensor approach promises to significantly improve robustness and reliability in challenging real-world scenarios, paving the way for advanced underwater applications like autonomous infrastructure inspection and deep-sea exploration.

Data was collected using the Bi-AQUA system to support analysis and experimentation.

The presented Bi-AQUA framework embodies a systemic approach to underwater manipulation, recognizing that robustness isn’t achieved through isolated improvements but through holistic adaptation. This mirrors a core tenet of elegant design-understanding the interplay of components. As Ada Lovelace observed, “The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform.” Bi-AQUA doesn’t attempt to overcome the challenges of underwater environments with novel mechanics; rather, it skillfully orchestrates existing techniques – imitation learning and transformer networks – to address the specific problem of lighting variance, demonstrating that a well-structured system, informed by environmental feedback, yields significant performance gains.

What Lies Ahead?

The pursuit of robust underwater manipulation, as exemplified by Bi-AQUA, consistently reveals a fundamental truth: optimization in isolation is a transient victory. Adapting to the vagaries of underwater lighting is a necessary step, certainly, but it merely shifts the locus of fragility elsewhere within the system. The architecture of control-the interplay between visual feedback, force sensing, and learned policy-dictates the nature of future failures. A policy robust to illumination shifts the problem to turbulence, sensor noise, or the inherent dynamics of the arm itself.

Future work must therefore move beyond feature-specific adaptations. The field needs to embrace a more holistic view, investigating methods that learn representations inherently resilient to any disturbance, not simply those explicitly modeled. This demands a deeper engagement with system identification – not to create a perfect model, but to understand the limitations of any model.

Ultimately, the true measure of progress will not be the ability to perform a single task reliably, but the capacity to degrade gracefully – to maintain some level of functionality even when faced with unforeseen circumstances. Bi-AQUA represents an advance, but it is a single node in a far more complex network of challenges. The system’s behavior over time will be the ultimate arbiter of success.

Original article: https://arxiv.org/pdf/2511.16050.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Illuminating the Challenge: Underwater Visual Perception

Bi-AQUA: A Lighting-Aware Framework for Underwater Robotics

Refining Control: Advanced Methods for Robust Performance

Towards Autonomous Underwater Systems: Expanding the Horizon

What Lies Ahead?

See also: