In our previous posts, we mainly focused on processing grayscale images. Even when our initial material has color, it often makes sense to convert it to grayscale to reduce the computational complexity of further manipulations. The achieved gains are substantial because we usually reduce dimensional complexity by three times (from three RGB channels to a single intensity channel). However, the color of an object can carry a lot of information depending on the use case, so it makes sense to not only retain it but also use it as a significant step in our processing. A common example of such a use case is keying colors from a video stream (like replacing a green screen), so we’ll focus on detecting our object based on its color in a live image from a web camera.
Let’s start by receiving a video stream from our camera using OpenCV:
import cv2 def main(): cap = cv2.VideoCapture(0) if not cap.isOpened(): print("Couldn't open the camera!") return while True: ret, frame = cap.read() if not ret: print("Couldn't read a frame!") break cv2.imshow('Video Stream', frame) if cv2.waitKey(1) & 0xFF == ord('q'): # break on 'q' key break cap.release() cv2.destroyAllWindows() if __name__ == "__main__": main() |
The first thing being done is choosing a camera with cv2.VideoCapture(index). Then, in a continuous while loop, we are reading and showing the current frame. The loop can be broken by pressing the ‘q’ key, which causes the capture to be released and the window to be destroyed.
With those basics out of the way, we can talk about color spaces. They are specific representations of color, and there are quite a large number of them, each serving its specific purpose better than others. The most basic and popular one is RGB. By additive color mixing, it describes what kind of light needs to be emitted to produce our chosen color. By mixing red, green, and blue in different proportions, our possibilities are quite extensive, and RGB neatly corresponds to how our computer displays show us the results of our work. If RGB itself isn’t enough, we can always fall back on a number of color spaces based on it, like RGBA, sRGB, Adobe RGB, etc. A great example of a space with a totally different purpose is CMYK (cyan, magenta, yellow, key/black). Created mainly for printing on white paper, it uses subtractive color mixing to determine what inks need to be used so that the light reflected from our page and through the inks will give us our desired color.
RGB is great and seems easy to conceptualize, but it gets trickier when we want to imagine specific colors using it. We know that red and green will give us yellow, but how do we represent a whole spectrum of yellows? Different tones and brightness levels are much tougher for us mere humans to represent. That’s where the HSV color space comes in. It has a cylindrical geometry where Hue corresponds to the angle around the central axis, Saturation to the distance from the axis, and Value (or sometimes “brightness”) to the distance along the axis. In practice, hue represents the type of color we want to describe. It starts at red at 0°, goes around the circle through all the colors, and wraps back to red at 360°. Knowing the degree values of each color, along with their saturation values, gives us a great idea of what kind of color is in question and makes it much easier to limit ourselves to a range of yellows, for example. The last component of the space corresponds to how bright or dark the color is, but in color detection, tuning it is not that useful as we usually want to find our color independent of the light conditions.
OpenCV provides a simple function to convert from BGR (OpenCV uses RGB with a different channel order) to HSV. To tune the values of an HSV-defined color, we can also use our favorite trackbars. With our tuned values, we can then create a binary mask using the thresholding function inRange:
cv2.namedWindow('trackbars') cv2.createTrackbar('low H', 'trackbars', 0, 179, trackbar_callback) cv2.createTrackbar('low S', 'trackbars', 0, 255, trackbar_callback) cv2.createTrackbar('low V', 'trackbars', 0, 255, trackbar_callback) cv2.createTrackbar('high H', 'trackbars', 179, 179, trackbar_callback) cv2.createTrackbar('high S', 'trackbars', 255, 255, trackbar_callback) cv2.createTrackbar('high V', 'trackbars', 255, 255, trackbar_callback) while True: ret, frame = cap.read() if not ret: print("Couldn't read a frame!") break hsv_img = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) low_h = cv2.getTrackbarPos('low H', 'trackbars') low_s = cv2.getTrackbarPos('low S', 'trackbars') low_v = cv2.getTrackbarPos('low V', 'trackbars') high_h = cv2.getTrackbarPos('high H', 'trackbars') high_s = cv2.getTrackbarPos('high S', 'trackbars') high_v = cv2.getTrackbarPos('high V', 'trackbars') low_limits = np.array([low_h, low_s, low_v]) high_limits = np.array([high_h, high_s, high_v]) mask = cv2.inRange(hsv_img, low_limits, high_limits) |
As a result of these simple operations, we obtained really great masks. The mask for the red car could be improved because the color red is a special case, appearing at both the beginning and the end of the hue range. Ideally, we would need two separate ranges for red (and then a combined mask), but the yellow car’s mask came out nearly perfect. To clean up and slightly extend our mask, we’ll use a morphological operation called dilation. Dilation adds pixels to the boundaries of our mask using a 3×3 kernel. This processed mask will give us much smoother contours, which we will achieve using the findContours function we discussed previously. For nice visualisation, we can also combine the mask with our initial frame.
mask = cv2.inRange(hsv_img, low_limits, high_limits) kernel = np.ones((3, 3), "uint8") mask = cv2.dilate(mask, kernel) contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) result = cv2.bitwise_and(frame, frame, mask=mask) cv2.drawContours(frame, contours, -1, (0, 255, 0), 3) |
The most well-known way of marking a detected object is by using bounding boxes. These are rectangles that fully enclose the shape of the object. Bounding boxes are usually described by the x and y coordinates of one of the corners, along with their width and height. Since we already have our external contours, OpenCV provides a straightforward way to convert them into bounding boxes using the boundingRect function. With an additional check for the contour area, the final piece of code will look something like this:
for _, contour in enumerate(contours): area = cv2.contourArea(contour) if (area > 400): x, y, w, h = cv2.boundingRect(contour) frame = cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 0, 255), 2) cv2.putText(frame, "Yellow Car", (x, y), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 0, 255)) |