Shedding New Light on Robotic Grasping

Author: Denis Avetisyan


A new dataset and synthesis strategy aims to improve robotic manipulation by systematically controlling and varying lighting conditions.

The Light Cubes system facilitates the collection of RoboLight, a novel dataset of robotic manipulation episodes captured under systematically varied lighting conditions, enabling controlled and repeatable data curation for robotic lighting research.
The Light Cubes system facilitates the collection of RoboLight, a novel dataset of robotic manipulation episodes captured under systematically varied lighting conditions, enabling controlled and repeatable data curation for robotic lighting research.

RoboLight provides a real-world dataset with linearly composable illumination, enabling robust training of vision-based robotic grasping systems.

Robust robotic manipulation demands perception that is resilient to varying illumination, yet existing datasets lack systematic control over lighting conditions. To address this, we introduce RoboLight: A Dataset with Linearly Composable Illumination for Robotic Manipulation, comprising both a real-world collection of 2,800 episodes and a synthetically expanded dataset of 196,000 episodes, all captured with calibrated, high-dynamic-range imaging and precisely controlled variations in color, direction, and intensity. Leveraging the linearity of light transport, RoboLight facilitates the study of illumination-invariant perception and enables the training of more robust robotic systems-but can this level of control truly bridge the gap between simulation and real-world performance?


Illumination: The Bottleneck of Robotic Vision

Robotic perception, the ability of a robot to “see” and interpret its environment, is demonstrably vulnerable to changes in illumination. While algorithms may perform flawlessly in controlled laboratory settings, performance frequently degrades when exposed to the unpredictable lighting of real-world scenarios. This susceptibility stems from the reliance of many computer vision systems on pixel intensities, which are directly affected by shadows, highlights, and varying light sources. Consequently, a robot capable of identifying objects under ideal conditions may fail to do so when faced with even minor alterations in ambient light. This limitation significantly hinders the deployment of robots in dynamic, unstructured environments, impacting applications ranging from autonomous navigation and manufacturing to search and rescue operations, and necessitates the development of more robust and illumination-invariant perception techniques.

The efficacy of robotic perception hinges on the quality of training data, and a significant limitation lies in the lack of systematic variation within existing datasets. Current collections frequently prioritize quantity over diversity, failing to adequately represent the full spectrum of illumination changes encountered in real-world scenarios. This deficiency results in policies that perform well under controlled laboratory conditions but exhibit brittle behavior when faced with novel lighting – a dimly lit room, harsh sunlight, or the complex interplay of shadows and reflections. Consequently, robots struggle to generalize their perceptual abilities, hindering their capacity for reliable operation in dynamic and unpredictable environments. Addressing this data gap is paramount to unlocking truly robust and adaptable robotic systems.

Reliable robotic manipulation and interaction fundamentally depend on consistent and accurate perception, and this is increasingly challenged by variations in illumination. A robot operating in a dynamic environment – transitioning from bright sunlight to shadow, or encountering artificial light sources – must be able to consistently identify and locate objects, determine their pose, and understand their physical properties. Failure to do so can lead to unsuccessful grasps, collisions, and an inability to complete intended tasks. Consequently, research focuses on developing perception algorithms and training datasets that specifically address the problem of illumination change, striving for robustness that mirrors human visual capabilities and enables robots to function effectively across a wider range of real-world conditions. This pursuit is not merely about improving existing systems, but about unlocking the full potential of robotic automation in unstructured and unpredictable environments.

The RoboLight-Real dataset, captured with the Light Cube, systematically varies illumination color (white, red, green, blue, purple), direction (all, front, rear, left, right, left+right), and intensity ([latex]1400[/latex], [latex]700[/latex], and [latex]140[/latex] lux) to provide a comprehensive resource for studying lighting effects on object recognition.
The RoboLight-Real dataset, captured with the Light Cube, systematically varies illumination color (white, red, green, blue, purple), direction (all, front, rear, left, right, left+right), and intensity ([latex]1400[/latex], [latex]700[/latex], and [latex]140[/latex] lux) to provide a comprehensive resource for studying lighting effects on object recognition.

RoboLight: A Systematically Varied Dataset for Robust Perception

The RoboLight dataset consists of 2,800 episodes captured in real-world conditions and a significantly larger set of 196,000 synthetically generated episodes. This combination yields a total dataset size of 198,800 episodes, designed to provide both realistic data and extensive variation for training and evaluating robotic perception and learning algorithms. The large scale of RoboLight addresses the data requirements of modern machine learning models, while systematic variation ensures robustness and generalization capability across diverse lighting and environmental conditions.

RoboLight’s synthetic data generation is based on the principle of Linearity of Light Transport, which states that the combined effect of multiple light sources is the sum of the effects of each individual light source. This allows for the creation of new lighting conditions by linearly combining existing, captured lighting conditions without requiring physically rendering each new scene. Specifically, RoboLight decomposes scenes into albedo, shading, and lighting components; new lighting conditions are then created by scaling and adding lighting components from the existing dataset. This approach enables scalable data augmentation, significantly increasing dataset size and variability without the computational cost of full scene rendering, and provides a robust means of generating diverse training data for machine learning algorithms.

Data acquisition for RoboLight utilizes a custom-built Light Cube integrating a Bluetooth Controlled Lighting System, enabling precise and repeatable control over illumination parameters during image capture. The system employs 16-bit RAW High Dynamic Range (HDR) images to maximize radiometric accuracy and preserve a wide range of light intensities. This approach minimizes quantization errors and supports accurate rendering and analysis of lighting effects, crucial for training robust vision algorithms. The RAW format ensures that images are captured without lossy compression or color grading, retaining the full dynamic range of the scene illumination.

RoboLight-Synthetic generates novel images by smoothly interpolating between real high dynamic range images from RoboLight-Real, creating sequences where only the initial and final frames are authentic.
RoboLight-Synthetic generates novel images by smoothly interpolating between real high dynamic range images from RoboLight-Real, creating sequences where only the initial and final frames are authentic.

Evaluating Robustness Through Targeted Manipulation Tasks

The RoboLight dataset incorporates three primary manipulation tasks – Sparkling Sorting, RGB Stacking, and Donut Hanging – specifically engineered to assess the impact of lighting changes on policy performance. Sparkling Sorting requires the manipulation of small, reflective objects, increasing sensitivity to specular highlights. RGB Stacking involves assembling blocks of primary colors, allowing for evaluation under varying color temperatures and casts. Donut Hanging utilizes a deformable object and a pegboard, introducing challenges related to visual perception of shape and pose under different illumination conditions. These tasks, when combined with controlled lighting variations, provide a granular method for isolating and quantifying the effects of lighting on robotic manipulation policies.

The image acquisition process employs a custom High Dynamic Range (HDR) Image Processing Pipeline to capture and process visual data. This pipeline is designed to preserve a wide dynamic range, enabling the capture of detail in both bright and dark regions of a scene. Utilizing HDR Image Format allows for a greater representation of luminance levels compared to standard image formats. This is achieved through multi-exposure capture and subsequent tone mapping, resulting in images with improved contrast and reduced clipping in both highlights and shadows, which is critical for robust policy learning under varying lighting conditions.

Policy evaluation was conducted using the Diffusion Policy framework and quantified through the Lighting Robustness Benchmark. This benchmark assesses performance consistency across a spectrum of lighting conditions, revealing that policies trained on the RoboLight-Synthetic dataset achieve comparable success rates to those trained directly on real-world image data. Specifically, performance metrics demonstrate no statistically significant difference in task completion rates between the two training methodologies when subjected to varied illumination, indicating the synthetic data effectively transfers to real-world scenarios for lighting-invariant robotic manipulation.

The HDR image processing pipeline transforms a raw [latex]RAW16[/latex] image through sequential denoising, lens shading and white balance corrections, and color/gamma adjustments to produce a final [latex]PNG[/latex] image suitable for policy training.
The HDR image processing pipeline transforms a raw [latex]RAW16[/latex] image through sequential denoising, lens shading and white balance corrections, and color/gamma adjustments to produce a final [latex]PNG[/latex] image suitable for policy training.

Augmenting Reality: Scaling Data with Visual Condition Scaling

Visual Condition Scaling addresses the challenge of training robust artificial intelligence systems when data is limited by ingeniously manipulating the High Dynamic Range (HDR) image format. This technique doesn’t simply rely on acquiring more images; instead, it extracts a rich spectrum of visual conditions – representing variations in lighting, exposure, and even simulated weather – directly from existing HDR captures. By intelligently altering these parameters within the HDR data, a single image effectively becomes a source for generating numerous distinct, yet realistic, training scenarios. This process expands the dataset’s diversity without the need for costly and time-consuming real-world data collection, ultimately allowing AI models to generalize more effectively to unseen conditions and perform reliably across a wider range of environments.

A robust artificial intelligence policy isn’t simply about excelling in ideal conditions; it requires reliable performance across a spectrum of real-world variations. Training policies with a limited dataset often results in brittle behavior when confronted with novel lighting or visual circumstances. This work addresses this limitation by intentionally broadening the training data with diverse lighting scenarios. By exposing the policy to a wider range of visual conditions – from bright sunlight to deep shadow, or stark contrast to muted tones – the system learns to generalize beyond the specifics of its initial training. This proactive approach cultivates a more resilient policy, capable of maintaining consistent performance even when faced with unpredictable or challenging visual input, ultimately improving its real-world applicability and reducing the potential for unexpected failures.

Efforts to deploy artificial intelligence in real-world scenarios often encounter the “sim-to-real gap,” where policies trained in simulated environments struggle to generalize to the complexities of authentic data. Researchers are actively addressing this challenge through strategies that combine the strengths of both data types; real-world data provides crucial fidelity, while synthetic data offers scalability and control. A promising approach involves augmenting limited real-world datasets with procedurally generated, yet realistic, synthetic examples. Techniques like Visual Condition Scaling further refine this process, enabling the creation of diverse visual conditions from existing data, effectively expanding the training set and enhancing a policy’s robustness to variations encountered in unpredictable real-world lighting and atmospheric conditions. This combined strategy aims to produce AI systems capable of seamless adaptation and reliable performance beyond the constraints of their initial training environment.

High dynamic range post-processing effectively scales visual conditions to simulate environments ranging from bright ambient light to high-exposure and dim illumination.
High dynamic range post-processing effectively scales visual conditions to simulate environments ranging from bright ambient light to high-exposure and dim illumination.

The creation of RoboLight prioritizes directness in addressing a core challenge within robotic manipulation: reliable perception under varying illumination. The dataset’s strength lies not in elaborate complexity, but in the controlled, linear composition of lighting conditions. This mirrors a fundamental tenet of efficient design. As Donald Davies observed, “Simplicity is a prerequisite for reliability.” RoboLight embodies this principle, offering a focused resource for advancing robotic vision systems, eschewing superfluous detail in favor of a clear, dependable foundation for research. The linearity of light transport, central to the data synthesis, further reinforces this commitment to structural honesty.

What’s Next?

The construction of RoboLight, while a necessary step, merely clarifies the scope of the problem. The linearity of light transport, exploited here, is an assumption, not a universal truth, and its failure modes in increasingly complex scenes remain largely uncharted. Future work must confront the inevitable: real-world light isn’t always predictably additive. The dataset’s value isn’t in the illumination itself, but in forcing a reckoning with the limitations of current perception pipelines.

A truly robust system will require more than synthetic variation. The current emphasis on HDR imaging, while technically sound, risks becoming a local optimum. The goal isn’t simply to capture more light, but to understand its interaction with surfaces at a fundamental level. Intuition suggests that disentangling material properties from illumination conditions remains the central, and perhaps intractable, challenge. The current reliance on data synthesis, though pragmatic, obscures this core difficulty.

The proliferation of datasets, while superficially encouraging, often delays true progress. The temptation to add more examples, rather than refine existing algorithms, is a persistent hazard. A more fruitful avenue may lie in the development of principled models of light and surface interaction – models that generalize beyond the specific conditions captured in any dataset, however meticulously constructed. Code should be as self-evident as gravity; a simple, elegant solution is always preferable to a complex, data-hungry one.


Original article: https://arxiv.org/pdf/2603.04249.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-05 15:10