How FarmX Increased Crop Yields by 28% Using Computer Vision

David Kimani

Lead ML Engineer

David leads the machine learning team at Kavod Technologies, specializing in computer vision and precision agriculture systems. Previously at DeepMind.

The Problem: Invisible Crop Disease

Across Sub-Saharan Africa, smallholder farmers lose an estimated 30–40% of their harvests every year to crop diseases that could have been caught early. The challenge is not that solutions don't exist — it's that solutions have never been built for the conditions these farmers actually work in: intermittent connectivity, extreme heat, heterogeneous crop varieties, and limited access to agronomists.

When we started building FarmX, we set ourselves a single north-star metric: reduce preventable crop loss by at least 20% within the first two growing seasons. Eighteen months later, our pilot across 1,200 farms in Kenya, Ghana, and Nigeria delivered a 28% average increase in usable yield. This post explains how we got there.

Designing the Computer Vision Pipeline

Data Collection at Scale

The first — and hardest — part of any ML project is data. We needed tens of thousands of labeled images of diseased crops under real field conditions, not the pristine lab photos that dominate existing academic datasets like PlantVillage.

We built a lightweight data-collection app (React Native + offline-first SQLite sync) that agronomist partners used to photograph crops during routine field visits. Each image was geotagged, timestamped, and tagged with:

Crop type (maize, cassava, tomato, cowpea, sorghum)
Growth stage (seedling, vegetative, flowering, fruiting)
Disease label (fall armyworm, maize lethal necrosis, cassava mosaic, late blight, etc.)
Severity score (0–4 scale)

Over six months we collected 147,000 labeled images across five countries. Critically, we captured images at multiple times of day, in varying weather, and with different smartphone cameras — because that's what real usage looks like.

Model Architecture

We evaluated several architectures before settling on a two-stage approach:

Stage 1 — Detection: A YOLOv8-nano model identifies regions of interest (individual leaves, fruit clusters) in each frame. We chose the nano variant because our target deployment is on-device (mobile phones and Raspberry Pi-class edge nodes attached to IoT sensor stations).

Stage 2 — Classification: Each cropped region is passed to an EfficientNet-B2 classifier fine-tuned on our proprietary dataset. The classifier outputs a disease label and a confidence score.

# Simplified inference loop
from farmx.models import load_detector, load_classifier

detector = load_detector("yolov8n-farmx.pt")
classifier = load_classifier("effnet-b2-farmx.pt")

def analyze_frame(image):
    regions = detector.predict(image, conf=0.45)
    results = []
    for region in regions:
        crop = image[region.bbox]
        label, confidence = classifier.predict(crop)
        if confidence > 0.80:
            results.append({
                "disease": label,
                "confidence": confidence,
                "bbox": region.bbox,
                "severity": estimate_severity(crop, label),
            })
    return results

We quantized both models to INT8 using TensorFlow Lite, reducing combined model size from 210 MB to 38 MB while losing less than 1.2% mAP on our test set.

Training Pipeline

Training runs on a GPU cluster we manage via Kubernetes. Each experiment is tracked in MLflow, and we use DVC for dataset versioning so every model checkpoint can be traced back to the exact images it was trained on.

Key hyperparameters that mattered most during tuning:

Image resolution: 640 × 640 for detection, 224 × 224 for classification
Augmentations: Random brightness/contrast jitter (critical for handling harsh African sunlight), random crop, horizontal flip, and a custom "dust overlay" augmentation that simulates the haze present on many field photos
Class weighting: We applied inverse-frequency weighting because some diseases (e.g., cassava brown streak) were far rarer in our dataset than others

Handling Class Imbalance

Our dataset was heavily skewed — cassava mosaic disease alone accounted for 31% of positive samples. To combat this we combined three strategies:

Oversampling minority classes during training
Focal loss instead of standard cross-entropy
Synthetic data generation using a fine-tuned Stable Diffusion model conditioned on disease type and crop species

The synthetic data alone improved recall on rare diseases by 14 percentage points without degrading precision on common diseases.

Integration with IoT Sensors

Computer vision tells you what is happening to a crop. To predict when disease is likely to strike, we fuse visual data with environmental telemetry from our FarmX Sensor Stations — solar-powered devices deployed at the edge of each field.

Each station collects:

Soil moisture (capacitive probe, every 15 min)
Air temperature and humidity (DHT22, every 5 min)
Leaf wetness (resistive sensor, every 5 min)
Rainfall (tipping bucket, event-driven)

Sensor data streams to our backend via MQTT over a lightweight LoRaWAN gateway that covers a 5 km radius. The backend runs an ensemble gradient-boosted model (XGBoost) that combines the last 48 hours of environmental readings with regional disease pressure data to output a 72-hour disease risk score for each field.

┌────────────┐   LoRaWAN   ┌──────────┐   MQTT   ┌───────────────┐
│ Sensor     │ ──────────> │ Gateway  │ ──────> │ Cloud Ingest   │
│ Station    │             └──────────┘         │ (Kafka + Flink)│
└────────────┘                                  └───────┬────────┘
                                                        │
                                          ┌─────────────▼──────────────┐
                                          │  Risk Score Engine (XGBoost)│
                                          └─────────────┬──────────────┘
                                                        │
                                          ┌─────────────▼──────────────┐
                                          │  Alert Service (SMS / Push) │
                                          └────────────────────────────┘

When the risk score exceeds a configurable threshold, the farmer receives an SMS alert (we support USSD fallback for feature phones) with a plain-language recommendation: which fields to inspect, what to look for, and which intervention to apply.

Results from the Pilot

We ran a controlled pilot across three growing seasons (two maize, one cassava) with 1,200 participating farms and a matched control group of 400 farms that received only traditional extension services.

| Metric | Control | FarmX Farms | Delta | |---|---|---|---| | Usable yield (tonnes/ha) | 2.1 | 2.7 | +28% | | Disease detection lead time | ~14 days (visual) | ~3 days (alert) | −11 days | | Pesticide usage | Baseline | −18% | Cost saving | | Farmer satisfaction (NPS) | n/a | 72 | — |

The yield improvement was not uniform — farms in high-humidity regions (western Kenya, southern Ghana) saw the largest gains because fungal diseases are both more prevalent and more responsive to early intervention.

What's Next

We are expanding the model to cover 12 additional crop species by Q3 2026, and we're partnering with two national agricultural research institutes to create an open, high-quality labeled dataset that the entire AgTech community can build on. On the infrastructure side, we're rolling out FarmX Sensor Station v2 with an integrated camera module, eliminating the need for smartphone-based image capture entirely.

If you're working in AgTech or precision agriculture and want to collaborate, reach out at partnerships@farmx.co or explore the platform at farmx.co.

#machine-learning#computer-vision#agriculture#farmx

Try FarmX today

Discover how FarmX can help you build better, faster. Get started for free and see the difference.

Get Started

How FarmX Increased Crop Yields by 28% Using Computer Vision

David Kimani

The Problem: Invisible Crop Disease

Designing the Computer Vision Pipeline

Data Collection at Scale

Model Architecture

Training Pipeline

Handling Class Imbalance

Integration with IoT Sensors

Results from the Pilot

What's Next

Try FarmX today

Related Articles

Related Posts

Building BantuStream: Lessons from Scaling Video Delivery Across Africa

Buslyft's Real-Time Matching Algorithm: From Prototype to Production

How RunnerStack Handles 50K Concurrent Sellers at Peak Traffic

Annual Report FY2025

Stay updated

Headquarters

Regional Offices

Contact

Accessibility Options