A Computer Vision Tomato Pest Assessment and Prediction Tool

A high yielding crop such as tomato with high economic returns can greatly increase smallholder farmers income when well managed. however, it is apparently constrained by the recent invasion of tomato pest Tuta absoluta that is devastating tomato yield. Look at tomato field situation in highly affected areas of arush [Arusha- mp4 video] and Morogoro regions.

Denis Pastory, team selfie - researcher and field assistant in the field.
Denis Pastory, team selfie – researcher and field assistant in the field.

To tackle this challenge, our work focus on an early detection and control measure initiatives in order to strengthen phytosanitary capacity and systems to help solve Tuta absoluta devastation using computer vision technique. It should be noted that Tuta absoluta control still rely on low-speed inefficient manual identification and a few on the support of limited number of agriculture extension officers.

Our initial works involved field work and in-house experiments to collect data in areas that are mostly affected by Tuta absoluta. We collected image data in Arusha and Morogoro regions of Tanzania.

As for any computer vision task, getting the right images for the task at hand is sometimes challenging. Regarding our use case, we had to generate our own image data. To accumulate enough data for model training, we have been collecting data since June 2018 and have had four (4) in-house experiments in the target areas. The whole data collection process is shown in this link.

The data collection process involved taking images of tomato inoculated with Tuta absoluta larvae for the first two (2) weeks of tomato growth since transplanting date. Images were taken for each plant on a daily basis. These images are RGB (Red, Green, Blue) photos of high and low resolutions. In order to acquire high resolution images, we used Canon EOS KISS X7 camera with a resolution of 5184 x 3456 pixels and we used mobile phone camera (set to low resolution).

Fig: Image of the P.I in one of in-house experiment site in Arusha.
Fig: Image of the P.I in one of in-house experiment site in Arusha.

For our previous first in-house experiment, we had encountered some challenge with the data collection process. The inoculated tomatoes were tagged with a red ribbon. Tagging species or target organisms is a common practice in fields such as entomology. We came to realize, that these tagged images couldn’t be included in the dataset for training our models and therefore had to exclude them from our model.

To meet our objectives, we worked on Convolution Neural Network (CNN) based model for a binary classification that could be able to identify tomatoes affected and not affected by Tuta Absoluta using the state-of-art of CNN architectures (VGG16, VGG19, ResNet50, InceptionV3). The results of this task were promising. Primary preprocessing tasks were limited to selecting the suitable images for training CNN model.

We are certain that the images we collected represented real images of small scale farmers’ fields. The images collected had more images with healthy tomato leaves than those inoculated with Tuta absoluta which implies  data imbalance. To reduce the bias our CNN model may encounter towards images with no Tuta absoluta samples, the number of samples per class were selected to create  balanced classes during model training.

The main aim of the image data collection process was expected to cover the main tomato growing regions in Tanzania affected mostly by Tuta absoluta, though we ended up obtaining data from only two main areas. Our team is certain that the collected data can be a representative case covering Tanzania situation. Also we had to adopt to local agronomic practices of the two areas.

For instance, we collected data of the commonly grown tomato varieties. The in-house experiment was also carried out following the cropping calendar of the respected two regions. To cover the main two growing season in Arusha, we had to carry out three experiments and one experiment in Morogoro.

During CNN model training, following a typical early detection of pest or disease model approach, we managed to focus on identification of affected and none affected plants. We have successfully been able to develop this type of binary classification model to identify tomato affected by tuta and not affected by tuta.

We further, developed another multiclass classification, that would be used to classify tomato affected at mainly three levels of damage i.e. low, high and no damage. This approach gave us a much better sense of the original idea we had. The model results showed us that to meet an early detection system in determining damage at early stage, a typical quantification based model is much better than a binary classification model.

For instance, results of the multiclass model showed us that tomatoes that are highly damaged are easily identified compared to lowly damage tomato. In such case, it would be best to identify tomato damage at early stage i.e at low damage level in order to enhance early control measures for Tuta absoluta.

And this point, we are to redefine the model classification approach. Since the objective is early identification and if a simple classification model cannot perform such a task, this puts us at risk. With that in mind, we are further working on models that can identify Tuta absoluta mine density, a quantification method based on instance segmentation.