Picking the Future: A Smarter Tomato Harvest

Author: Denis Avetisyan


Researchers have developed a complete robotic system that combines advanced perception and a novel soft-rigid gripper to gently and reliably harvest ripe tomatoes.

The robotic system successfully executes a complete tomato picking cycle through a coordinated sequence of actions: approaching the cluster, isolating the target fruit with a conical separator, precisely cutting the stem with a micro-servo cutter, gently grasping the tomato with auxetic fingers and a latex basket, transferring it away from the plant, and finally releasing it into a punnet.
The robotic system successfully executes a complete tomato picking cycle through a coordinated sequence of actions: approaching the cluster, isolating the target fruit with a conical separator, precisely cutting the stem with a micro-servo cutter, gently grasping the tomato with auxetic fingers and a latex basket, transferring it away from the plant, and finally releasing it into a punnet.

This work details a hybrid robotic system integrating semantic segmentation, keypoint detection, and force control for automated tomato harvesting.

Despite advances in agricultural robotics, delicate fruit harvesting remains a significant challenge due to the need for gentle manipulation and robust perception in complex environments. This is addressed in ‘A Novel Approach to Tomato Harvesting Using a Hybrid Gripper with Semantic Segmentation and Keypoint Detection’, which details a complete system integrating a soft-rigid hybrid gripper, deep learning-based vision, and closed-loop force control. Experimental results demonstrate an 80% success rate with low grasp forces, validating the approach for reliable harvesting in cluttered conditions. Could this integrated system pave the way for more adaptable and efficient robotic solutions across a wider range of agricultural tasks?


The Precarious Balance of Harvest Automation

The agricultural sector currently faces substantial economic pressures stemming from a reliance on manual tomato harvesting. For decades, skilled laborers have been essential for selectively picking ripe fruit without causing damage, a task demanding both precision and speed. However, dwindling labor pools and increasing wage costs are significantly impacting the profitability of tomato production, particularly in regions dependent on seasonal workers. This trend isn’t simply a matter of economics; consistent labor shortages threaten the stability of the food supply chain, forcing growers to seek innovative solutions beyond traditional practices. The escalating costs associated with finding and retaining harvest labor are prompting a critical reevaluation of current methods and accelerating investment in automated alternatives, despite the inherent technical challenges.

Current robotic harvesting systems frequently struggle with the delicate nature of ripe produce, leading to unacceptable levels of damage. These systems often rely on rigid grippers or lack the sophisticated perception necessary to distinguish between ripe fruit and leaves or branches. This results in bruising, crushing, or incomplete harvests – issues stemming from an inability to adapt to the varied shapes, sizes, and fragility of natural produce. While advancements in machine vision and soft robotics offer potential solutions, achieving a gentle yet firm grasp, coupled with precise localization, remains a key obstacle to widespread adoption and a significant driver of ongoing research in agricultural automation.

The economic viability of automated tomato harvesting hinges on a robot’s ability to replicate the delicate touch of a human picker. Tomatoes, unlike many other harvested crops, are remarkably susceptible to bruising, impacting not only their visual appeal but also accelerating spoilage and reducing marketable yield. This necessitates the development of robotic end-effectors – the ‘hands’ of the robot – that can discern the ripeness of a fruit, conform to its shape, and apply just the right amount of force for detachment, all while avoiding punctures or pressure marks. Current research focuses on soft robotics, utilizing flexible materials and advanced sensor feedback to create grippers that ‘feel’ the fruit, and machine learning algorithms to continuously refine grasping strategies, addressing a complex engineering challenge where precision and gentleness are paramount to minimizing waste and maximizing profitability.

The duration of each stage in the tomato picking cycle-approach, separation, cutting, grasping, departure, and release-varies with tomato diameter, influencing overall cycle time.
The duration of each stage in the tomato picking cycle-approach, separation, cutting, grasping, departure, and release-varies with tomato diameter, influencing overall cycle time.

A Compliant Grip: Engineering for Delicacy

The Hybrid Gripper architecture integrates a Rigid Exoskeleton for structural support and efficient force transmission with Auxetic Structures to provide compliant contact surfaces. The exoskeleton, typically constructed from materials like aluminum or high-strength polymers, directs applied actuator forces to the grasping points. Simultaneously, auxetic materials – characterized by a negative Poisson’s ratio – expand laterally when stretched, increasing contact area and conforming to the geometry of grasped objects. This combination enables the gripper to exert substantial gripping force while minimizing stress concentration on delicate items and improving grasp stability across varying object shapes and sizes. The auxetic elements are strategically implemented in the gripper’s fingers and contact pads to maximize compliant deformation.

The gripper’s mechanical design leveraged the Virtual Work Principle to establish a kinematic relationship between actuator torque and applied grasping force. This principle allowed for the derivation of a Jacobian matrix, mapping joint torques to end-effector forces, and subsequently enabled precise control of grasping force by optimizing actuator torque requirements. Specifically, the analysis determined the minimum torque needed at each actuator to achieve a desired force vector on the grasped object, accounting for the gripper’s geometry and kinematic constraints. This optimization minimizes energy consumption and ensures stable, controlled grasping without exceeding the object’s fragility limits.

Force Sensing Resistors (FSRs) are integrated into the gripper’s contact surfaces to provide continuous, real-time feedback on applied grasping force. These sensors measure resistance changes proportional to the deformation caused by contact with an object, allowing the control system to precisely modulate actuator torque. This feedback loop is crucial for handling delicate produce like tomatoes, preventing damage from excessive force. The FSRs enable adaptive grasping by allowing the gripper to conform to the shape of the tomato and maintain a secure hold with minimal pressure, thus minimizing bruising and deformation. Data from the FSRs is processed by a microcontroller, enabling dynamic adjustments to the grasping force based on the detected contact pressure and object characteristics.

This hybrid gripper design integrates a micro-servo-actuated cutter, auxetic soft fingers for delicate handling, and a conical separator to enable both secure grasping and controlled object release via a Scotch-yoke mechanism.
This hybrid gripper design integrates a micro-servo-actuated cutter, auxetic soft fingers for delicate handling, and a conical separator to enable both secure grasping and controlled object release via a Scotch-yoke mechanism.

Seeing the Harvest: Perception for Precise Localization

The system utilizes the Detectron2 deep learning framework to concurrently perform semantic segmentation and keypoint detection on tomato imagery. Semantic segmentation classifies each pixel in an image, assigning it to a specific category – in this case, differentiating tomato plants, leaves, and fruit from the background. Keypoint detection, building upon the segmentation data, identifies and locates specific points of interest, such as the center and boundaries of each tomato, as well as the attachment point of the stem. This dual approach enables both a comprehensive understanding of the scene and the precise localization of individual tomatoes within it.

Keypoint detection is implemented as a subsequent process to semantic segmentation, utilizing the pixel-wise classifications generated by the initial segmentation to identify specific points of interest on each tomato. While semantic segmentation establishes the boundaries of the tomato fruit and stem within an image, keypoint detection refines this information by locating discrete coordinates representing critical features, such as the center of the fruit, the attachment point of the stem, and the stem tip. This allows for a more precise localization of the tomato and its stem than is achievable with semantic segmentation alone, providing data necessary for robotic manipulation and yield estimation.

The system’s training utilizes the Rob2Pheno Dataset, a publicly available resource containing images and annotations of tomatoes grown in varied field conditions. This dataset includes data captured across multiple growing seasons and diverse environmental factors, such as differing lighting, plant densities, and levels of occlusion. Training on this dataset allows the system to generalize effectively to new, unseen tomato plants and agricultural settings, demonstrably improving its performance in real-world deployments compared to models trained on more limited or synthetic data. The Rob2Pheno dataset consists of over $10^4$ annotated images, providing sufficient data for robust deep learning model training and validation.

The perception module accurately segments tomato instances by ripeness and identifies keypoints defining the pedicel and fruit center.
The perception module accurately segments tomato instances by ripeness and identifies keypoints defining the pedicel and fruit center.

The Integrated Harvest: Performance and Precision

The robotic harvesting system employs a closed-loop Proportional-Integral-Derivative (PID) force control mechanism to delicately manage the grasping of ripe tomatoes, significantly minimizing potential damage. This control system continuously monitors the force exerted by the robotic gripper on the fruit, comparing it to a pre-defined optimal range. Any deviation from this target force triggers immediate adjustments to the gripper’s pressure, ensuring a firm yet gentle hold. By dynamically adapting to variations in fruit size, firmness, and position, the PID controller prevents crushing or bruising that commonly occurs with manual harvesting or less sophisticated robotic systems. This precise force regulation is crucial for maintaining fruit quality and extending shelf life, ultimately reducing food waste and improving the economic viability of automated harvesting.

The system’s robotic arm navigates the complexities of tomato harvesting through Particle Swarm Optimization (PSO)-based trajectory planning. This computational method allows the robot to efficiently determine optimal, collision-free pathways for reaching and picking fruit, even within dense foliage. By simulating a swarm of particles, each representing a potential arm trajectory, the algorithm iteratively refines these paths based on factors like distance, energy consumption, and obstacle avoidance. The resulting trajectories aren’t simply direct routes; they are dynamically calculated movements designed to maximize harvesting speed while minimizing the risk of damaging the plant or the fruit itself, contributing significantly to the system’s overall performance and gentle handling of produce.

The culmination of these advancements is demonstrated by the fully integrated Tomato Harvesting System, which achieves an 80% success rate in carefully controlled laboratory settings. This performance translates to a significant improvement over manual harvesting techniques, not only in terms of successful picks but also in minimizing damage to the delicate fruit. The system completes an average harvesting cycle in just 24.3 seconds, a speed that, combined with the precise force control and optimized trajectory planning, contributes to a marked reduction in bruising and other forms of damage commonly associated with traditional methods. This level of automation promises a more efficient and sustainable approach to agricultural harvesting, potentially reducing waste and improving yield for commercial tomato production.

A Simulink controller regulates grasp force by comparing measured force from an FSR sensor to a reference value and mapping the discrete PID output to a safe servo-angle range for a Scotch-yoke gripper.
A Simulink controller regulates grasp force by comparing measured force from an FSR sensor to a reference value and mapping the discrete PID output to a safe servo-angle range for a Scotch-yoke gripper.

The presented system prioritizes a nuanced interaction with a delicate subject – the tomato – mirroring a core tenet of effective design. It achieves this through a convergence of perception and action, employing semantic segmentation and keypoint detection to inform a gentle, yet firm, grip. This echoes Tim Bern-Lee’s sentiment: “The web as I envisaged it, we have not seen it yet. The future is still so much bigger than the past.” The ambition within this research isn’t merely automation, but a reimagining of how robots can interact with the physical world, moving beyond brute force to embrace adaptability and precision – a future where technology complements, rather than dominates, natural processes. The focus remains resolutely on stripping away unnecessary complexity to reveal an elegant solution.

The Road Ahead

The presented system, while a functional convergence of perception and manipulation, merely clarifies the inherent difficulty of the task. Gentle grasping, it transpires, is not solved by more sensors or refined algorithms, but by a deeper reckoning with the fragility of the fruit itself. Future iterations will undoubtedly refine the semantic segmentation and keypoint detection, chasing diminishing returns in pixel accuracy. The true leverage, however, lies in accepting the imprecision inherent in biological systems-a tomato is not a perfect sphere, nor a rigid body.

The auxiliary promise of auxetic structures remains largely untapped. Beyond simple compliance, their potential for adaptive grasping – a gripper that molds to the tomato, rather than attempting to contain it – deserves focused investigation. This necessitates a shift in control paradigms, moving beyond force regulation toward a more nuanced understanding of material interaction. Less control, perhaps, is the most direct path to reliable harvesting.

Ultimately, the question is not whether a robot can pick a tomato, but why it should. The elegance of a biological solution-a hand, a practiced eye-lies in its economy of means. The pursuit of automation should not be an exercise in replicating complexity, but in stripping it away, revealing the essential simplicity at the heart of the task.


Original article: https://arxiv.org/pdf/2512.03684.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-12-05 00:06