humanized technology habitat ▮ innovative products & services ▮ welcome to the sky world

How to detect a single tree from a drone imagery of a dense forest?

Are you sitting in front of your computer and thinking “Ohh… I wish to knew how to teach my computer to recognise objects.” “How can I do that on my own data?!”? Do you think it is difficult and training a model on your custom data is something super complicated? So STOP! Uff… I’m glad that you came here. Just focus and read more.


Spring, summer, autumn, winter… No matter the season, trees live all year round. Can you imagine how many trees there are in a forest? Do you know that efficient forest management requires detailed and up-to-date information? At a time when high-resolution images/videos are available, there is an excellent potential for automatic and cost-effective accurate forest inventory and analysis.

This article is addressed to you if you are a person who wants to know how to recognise objects in your own data collection, or if you are a forester/grower who wants or even needs help to get to grips with your woodland.

Are you ready? Let’s start.


At first sight, it seems natural for us to recognise each tree on the image and count them. But
what if you have a thousand such images or one long video where the screen is moving? Your eyes
will probably blow up! But hey… we do object detection nowadays. Easy to say, more challenging
to do? What type of neural network to choose?

Have you ever heard about R-CNN, Fast R-CNN and Faster R-CNN? If you’re interested in getting familiar with the evolution of object detection approaches, the best thing you can do is to read about these three solutions in the proper order. For the purposes of the presented work, we’ll focus on Faster R-CNN network. Are you wondering why? Keep reading…

Faster R-CNN in a pill

Faster R-CNN consists of two modules:

  • a deep convolutional network that proposes regions,
  • and region-based convolutional network detector that uses the proposed regions

which creates a single network for object detection.

Complete Faster R-CNN architecture. Source
  1. The discussed network takes an image as an input to a convolutional network which provides a
    convolutional feature map.
  2. Then uses a separate network, called Region Proposal Network (RPN), to predict the region proposals (bounding boxes), which may contain objects.
  3. Having a list of possible relevant objects along with their locations on the original image, algorithm extracts features for each of those proposals which would correspond to the relevant objects via RoI Pooling.
  4. And finally comes region-based convolutional network (R-CNN) to both classify or discard objects in the bounding boxes as well as adjust the bounding boxes coordinates to better fit the object.

It’s time to detect objects! Don’t worry… These are only 5 simple steps

Where to start? Just follow the steps below.

Step 0. Clean your head and set up an environment

Take three deep breaths and configure your environment. This is always the most important step
before further steps. All you need to do is install (if you haven’t done it yet):

python object_detection/builders/

and make sure that all tests will pass.

If you aren’t familiar with .bashrc file or somehow you don’t want to modify it, go to the model/research catalogue and enter in the console:

export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Note that you will be running this command every time you open a new console. And that’s it. Your working station is ready! Yay!

Step 1. Playing with the data

As you probably know data processing is the crucial part during the whole analysis. Remember: your model is as good as your data. And you really need a lot of images to properly train your model. But we assume that you’re a really ambitious and hardworking person who’s not afraid of any challenge, so don’t panic!

How to tell the network, what are the objects it is supposed to look for in the images?
Finger it!
But how?
Simply. Annotate your images.

Images annotation

You need to collect images and label them all. Depending on the objects you’re interested in, sometimes it’s a tough task, like in the presented example. To obtain a set of images, we simply split a video into frames. Easy, right? Because we resized all images to 960×540 pixels for memory reasons, it’s very important to have high-quality inputs.

To train the object detection model, we need to know the image’s width, height and each class with the respective xmin, xmax, ymin and ymax bounding box. What are these parameters? Simply it’s the frame that captures where our class/object is in the image. Ugh… it sounds like really tedious work. Thankfully there are some available programs that help to create bounding boxes like LabelImg or RectLabel (only for macOS). They will create .xml files for each labelled image.

We collected around 40 training images including 20–40 objects in each.

Right after you’ll finish annotating, export or create two additional files:

  1. trainval.txt which contains a list of images and correlated xmls file names, e.g.
  2. map.pbtxt which contains a list of classes/objects to detect
    item {
      id: 1
      name: 'fruit tree'

Labels conversion to the TFRecord format

The TensorFlow Object Detection API requires all the data used for training and validation to be in specific TFRecord format. Now you may wonder why. Well, TFRecord file format is TensorFlow’s own binary storage format. And by using a binary data you save a lot of space on disk as well as time to load the required data from disk and then processed. If you want to dig deeper, check out this article. We can generate a TFRecord using the following adapted from the TensorFlow authors’ detector:

python object_detection/

You’ll end up with two files:

  • train.record which in our case is 70% of training examples
  • and val.record which contains information about 30% of training examples

Step 2. Each house has a foundation, download yours…

Do you know that TensorFlow provides a collection of detection models pre-trained on the e.g. COCO dataset and more? You can find a list here. Moreover, TensorFlow released an excellent resource to train your own object detection model for a large variety of pre-trained models for different machine learning tasks. Of course, you can either train your own model from scratch or save time and use a pre-trained one. How? You’ll use only the last layer, which has classes from the trained model and replaces it with your own classes. By doing this, you use all the feature detectors trained in the previous model and use these features to try to detect your new classes. Amazing, right? Do you want to know more fantastic information? Since we’re only retraining the last layer, you don’t need GPU (but it can speed things up).

To the presented problem, we used faster_rcnn_resnet101_coco model. After downloading, please extract files and move .ckpt files along with .config file Note:ject directory.

Note: You can use whatever model you want/need. Just be sure that you have a corresponding config file. Always check how a given file looks like and make sure that all parameters are set correctly (e.g. path to your files, maximum number of detected boxes per class).

Step 3. Ready, Steady, Retrain!

To start training, simply run the following command in your terminal:

python object_detection/ \
    -- logtostderr \
    -- train_dir=<any_dirname> \
    -- pipeline_config_path=<path_to_config_file>

If you observe that the loss obtained more or less constant value or has started rising, stop training by pressing Ctrl+C.

Note: Depending on your local machine quality, training process can take up to several hours and even days. If you’re impatient or just simply don’t want to wait so long, I’ll strongly recommend running calculations on some highly-powered platform (e.g. Google Cloud Platform) or at least use GPU (if possible).

Step 4. Export your new model and test it!

It is a really good practice to check your model every let’s say 4–5k steps to make sure that you’re on the right path. Just check out the train dir (defined in the command above) where you should see .ckpt files:

  • model.ckpt-STEP_NUMBER.index
  • model.ckpt-STEP_NUMBER.meta
To convert the checkpoint files into a frozen graph, run:
python object_detection/ \
    -- input_type image_tensor \
    -- pipeline_config_path <path_to_config_file> \
    -- trained_checkpoint_prefix model.ckpt-STEP_NUMBER \
    -- output_directory output_inference_graph

and you should see the new output_inference_graph directory with a frozen_inference_graph.pb file.

Yay! The end is coming so fast! Now it’s only fun. Testing time!

To validate the model’s performance, as test images we used randomly selected frames from the video, different than the training ones.

All you need is the following command to run your object detection model on all the images in the test_images directory and output the results in the output/test_images directory:

python object_detection/

Note: Check paths in the script!

It’s clear that the model isn’t ideal yet. Probably the amount of the training examples was insufficient. Though our detector was able to detect most of the tree crowns, depending on the height at which the drone was.


Step 5. Admire your masterpiece. You’re awesome!