wizzdev image

Tracking – OpenCV Basics #4

In the previous post, we learned a little bit about colour spaces and utilised the power of HSV thresholding to detect our desired objects. However, our detection method, like many others, had one major flaw: it only identified a type of object, not individual instances. This distinction can be extremely crucial in certain applications. To address this limitation, we need to integrate our preferred detection procedure with a tracking method. This approach will enable us to track selected objects across frames and depending on the implementation could also help us when detection fails for any reason.

There are many ways to track an object, but first, we’ll explore a very simple one that we can implement ourselves. We’ll use the centroid tracking algorithm. The concept is straightforward: we start with a list of detected bounding boxes. For each bounding box, we calculate its centroid, which is the centre point of the rectangle. Next, we take all the current centroids and the centroids from the previous frame, and we calculate the distances between each pair of centroids from the current and previous frames.With this information, we can update the positions of our objects. Based on the calculated distances, we assign the object IDs from the previous frame to the closest centroids in the current frame. If there are any unassigned centroids, we register them and assign new IDs. Additionally, if an object is not detected for a set number of consecutive frames, we need to handle its deletion.

 So now when we understand the concept of the algorithm we can add it to our colour detection code. We’ll base our implementation on a proven one from PyImageSearch and write our tracker as a completely separate class so that it’ll be much easier to use.

 

class CentroidTracker:
    def __init__(self, max_frames_for_deleting=24):
        self.next_id = 0
        self.detected_objects = OrderedDict()
        self.frames_missing = OrderedDict()
        self.max_frames_for_deleting = max_frames_for_deleting

 

From the initialization of the class, we can clearly see the components mentioned when explaining the algorithm. As an input parameter, we pass in the number of frames an object can be missing before we delete it from our tracker. Additionally, our tracker will store an ID number to assign to the next object that comes into the frame and two ordered dictionaries. In the first dictionary, we’ll store all of our currently detected objects, using their IDs as keys and their centre coordinates as values. In the second dictionary, we’ll store the number of frames each object has been missing.

def add_object(self, centroid):
        self.detected_objects[self.next_id] = centroid
        self.frames_missing[self.next_id] = 0
        self.next_id += 1

 

The method for adding a new object to our tracker takes in its centroid and adds it to our dictionary of detected objects using the next available ID in our tracker. Using the same ID, we also initialise the object in the second dictionary that tracks the number of frames it has been missing from our stream (starting at 0). Since we used the next available ID, we’ll need to increment it for the next object.

 

def delete_object(self, object_id):
        del self.detected_objects[object_id]
        del self.frames_missing[object_id]

 

The method for deleting an object takes its ID as a parameter and deletes it from both dictionaries.

 

def update_on_frame(self, bboxes):
        if len(bboxes) == 0:
            for object_id in list(self.frames_missing.keys()):
                self.frames_missing[object_id] += 1
                if self.frames_missing[object_id] > self.max_frames_for_deleting:
                    self.delete_object(object_id)

            return self.detected_objects

 

The most interesting activities happen in the method that updates our tracker for each frame. We start by checking if any bounding boxes were detected in the current frame. If none were detected, we add a missing frame count to each object in the dictionary, delete any objects that have surpassed our limit, and return the rest.

 

        new_centroids = np.zeros((len(bboxes), 2))
        for (i, (x, y, width, height)) in enumerate(bboxes):
            centroid_x = int((x + (x + width)) / 2)
            centroid_y = int((y + (y + height)) / 2)
            new_centroids[i] = (centroid_x, centroid_y)

 

If we detect objects in the current frame, we process each bounding box to calculate its centre point by finding the midpoint between its corner coordinates.

 

        if len(self.detected_objects) == 0:
            for i in range(0, len(new_centroids)):
                self.add_object(new_centroids[i])

 

 

Next, we check if there were any detected objects in the previous frame. If there were none, this pass of the algorithm is straightforward: we simply add all new objects to the tracker’s dictionaries.

 

            previous_object_ids = list(self.detected_objects.keys())
            previous_centroids = list(self.detected_objects.values())

            distances = distance.cdist(previous_centroids, new_centroids)

            rows = distances.min(axis=1).argsort()
            cols = distances.argmin(axis=1)[rows]

 

If there were objects detected in the previous frame, we compare them with the current ones. We separate the IDs and centroid values, then calculate all possible distances between them using the scipy.spatial.distance library. The output is a convenient NumPy array, allowing us to find the smallest value in each row and sort the indexes based on these minimum values. We do the same for the columns, sorting them based on the rows.

 

            used_rows = set()
            used_cols = set()

            for (row, col) in zip(rows, cols):
                if row in used_rows or col in used_cols:
                    continue

                object_id = previous_object_ids[row]
                self.detected_objects[object_id] = new_centroids[col]
                self.frames_missing[object_id] = 0
                used_rows.add(row)
                used_cols.add(col)

            unused_rows = set(range(0, distances.shape[0])).difference(used_rows)
            unused_cols = set(range(0, distances.shape[1])).difference(used_cols)

 

Now it’s time to actually match our objects. We start by initialising two sets to track rows and columns that we have already used. Then, looping through combinations of rows and columns, we first check if the combination has been used already. If not, the new input centroid has the smallest distance to an existing one, so we can update its entry in our dictionary of detected objects and set the frames it was missing to 0. After this loop, there might still be unused centroids, so we’ll add them to their own sets.

 

          if distances.shape[0] >= distances.shape[1]:
                for row in unused_rows:
                    object_id = previous_object_ids[row]
                    self.frames_missing[object_id] += 1

                    if self.frames_missing[object_id] > self.max_frames_for_deleting:
                        self.delete_object(object_id)

            else:
                for col in unused_cols:
                    self.add_object(new_centroids[col])

        return self.detected_objects

 

Next, we check whether the number of currently tracked objects is higher than the number of new centroids. If it is, we’ll go through the unused objects, increment their number of disappeared frames, and delete any that exceed the limit. If the number of new centroids is higher, any unused new centroids should be added as new objects.

 

With the implementation ready we can now add it to our detection by initialising it at the beginning of the programme and then easily updating it in every frame and visualising the results like this:

        objects = centroid_tracker.update_on_frame(bboxes)
        if objects is not None:
            for object_id, coords in objects.items():
                frame = cv2.circle(frame, (int(coords[0]), int(coords[1])), 5, (0, 0, 255), -1)
                cv2.putText(frame, f”Object {object_id}, (int(coords[0]) + 10, int(coords[1])), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0, 0, 255), 2)

 

Now we can identify 2 separate object during our processing:

 

This tracking implementation is simple and effectively demonstrates the main purpose of tracking. However, it has a significant downside: it requires bounding boxes from the detection algorithm for every frame. In more complex applications, detection methods can become quite complicated, making it infeasible to run them on every frame due to computing power constraints. Other popular tracking algorithms are typically less demanding on processing resources, can compensate for detection gaps in some frames, and still provide unique identities for the objects. Fortunately, OpenCV offers an extended tracking API with many ready-to-use algorithms. A simple example of using one of them would look like this:

        trackers = []
        for bbox in boxes:
            tracker = cv2.TrackerCSRT_create()
            tracker.init(frame, bbox)
            trackers.append(tracker)

        for idx, tracker in enumerate(trackers):
            success, bbox = tracker.update(frame)
            if success:
                p1 = (int(bbox[0]), int(bbox[1]))
                p2 = (int(bbox[0] + bbox[2]), int(bbox[1] + bbox[3]))
                cv2.rectangle(frame, p1, p2, (0, 255, 255), 2, 1)
                cv2.putText(frame, f’Object {idx+1}, (p1[0], p1[1] – 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)

 

Since there is a large library of algorithms to choose from, it is highly encouraged to experiment with them to determine which ones are best for specific applications. Some of these algorithms, in theory, can track objects completely on their own after receiving initial bounding box information. In practice, it’s usually best to periodically incorporate ground truth from detection, as trackers’ outputs tend to drift away from the actual objects over time.

Recent entries

WizzDev Image

Does the use of AI violate copyrights?

The rise of AI has created new challenges for most industries. In the ever-present pursuit of expense cutting the use of pre-trained AI is rapidly reshaping all industries.  As Mercedes-Benz’s CEO,

WizzDev Image

Does the use of AI violate copyrights?

The rise of AI has created new challenges for most industries. In the ever-present pursuit of expense cutting the use of pre-trained AI is rapidly reshaping all industries.  As Mercedes-Benz’s CEO,

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.