On-tree fruit image segmentation comparing Mask R-CNN and Vision Transformer models. Application in a novel algorithm for pixel-based fruit size estimation

Jaime Giménez Gallego, Jesus Martinez-del-Rincon, Juan D. González-Teruel, Honorio Navarro Hellin, Pedro J. Navarro, Roque Torres Sanchez*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Downloads (Pure)

Abstract

In situ automatic fruit monitoring is of great interest for more accurate and cost-efficient decision making in agriculture. For this purpose, the development of computer vision-based tools is essential. Deep Learning techniques have shown good performance in fruit detection and segmentation. Recently, new models based on Transformer architecture have emerged with promising potential and zero-shot inference capability. In this paper, a Deep Learning model, Mask R-CNN, was trained for on-tree pomegranate fruit segmentation and compared with foundational models based on Vision Transformer, Grounding DINO and Segment Anything Model. Results with Mask R-CNN proved a better performance, according to F1 score and AP metrics, and a lower computational cost, according to prediction time. One of the most interesting derived applications from fruit segmentation is fruit size estimation. However, segmented fruit masks are frequently incomplete due to occlusions. Therefore, image fruit size estimation is not a straightforward process. In this work, we also propose a novel algorithm to estimate and monitor the fruit size in pixel units from the automated masks. A median relative error of 1.39% was obtained, demonstrating the potential and feasibility of future fully-automatic fruit size estimators.
Original languageEnglish
Article number109077
Number of pages16
JournalComputers and Electronics in Agriculture
Volume222
Early online date24 May 2024
DOIs
Publication statusEarly online date - 24 May 2024

Fingerprint

Dive into the research topics of 'On-tree fruit image segmentation comparing Mask R-CNN and Vision Transformer models. Application in a novel algorithm for pixel-based fruit size estimation'. Together they form a unique fingerprint.

Cite this