Machine vision has seen significant growth in recent years. Advancements in hardware and improvements in algorithms have made image-based analysis much easier and more effective. While the term “AI” is quite popular nowadays, what often lies behind it are programs that don’t use large neural networks but rather clever methods of processing images or videos.
One of the most popular libraries for computer vision and related machine learning is Open Source Computer Vision, or OpenCV for short. It provides over 2,500 optimized algorithms compatible with multiple platforms, including desktop (Windows, Linux, macOS, FreeBSD, OpenBSD) and mobile (Android, iOS). Additionally, it is fairly easy to adopt, as it offers interfaces in multiple programming languages (C++, Python, Java, and MATLAB/OCTAVE), all of which are designed to work in both offline and real-time applications.
To showcase the power of this library, we can demonstrate how easily it can solve the problem of detecting lines on the road (a feature useful for various autonomous vehicles or robots). For simplicity of visualization, we’ll perform all processing on a static image, but the same process can be applied to a video stream as well. We’ll be using Python, and after installing OpenCV (with NumPy included for good measure) via pip, we can load our image with a single command.
import cv2 import numpy as np img = cv2.imread('opencv_1/road.jpg') gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) |
After reading our .jpg file, we converted it to the grayscale color space. Processing a color image requires more processing power, so it’s often better to work with black-and-white images. Of course, sometimes the information carried by color is very helpful, and it’s better to retain it, but it’s still worth considering using different color spaces (for example, HSV can be very useful as it represents the hue of the color in a single, separate value).
To make further processing easier, we can now apply a filtering algorithm. The most standard solution is to use a Gaussian filter, which applies blur to our image by passing a normal distribution mask over it. This process helps reduce background noise but unfortunately also negatively affects the sharpness of the edges of our lines. For this reason, it’s good to consider a bilateral filter. Normalization remains its main component, but it is only applied to pixels with values similar to the central pixel of the mask. This helps maintain sharp intensity changes, preserving the edges. OpenCV has ready methods for both of these filters so it’s quite easy to test them out on each use case.
bifilter = cv2.bilateralFilter(src=gray_img, d=7, sigmaColor=75, sigmaSpace=75) gauss = cv2.GaussianBlur(src=gray_img, ksize=(7, 7), sigmaX=0) |
Effects of bilateral (left) and Gaussian filters (right).
Image source: https://unsplash.com/photos/road-between-field-of-trees-G-YAJ61qIuU
Retaining our edges in the best possible shape will be helpful for the next step of the processing. For the main edge detection, we’ll use a Canny edge detector. As a result, we’ll receive a line 1 pixel wide at each edge. First, the algorithm utilizes gradient magnitudes to find the brightest points of each line. Next, it uses thresholding with hysteresis to determine which edges in our picture are bright enough to retain and which should be ignored.
Configuring the two values for the hysteresis correctly is part of fine-tuning our application. Fortunately, OpenCV has built-in functionality to build system windows with trackbars, making it much quicker to tune the algorithm for each unique use. Here is a snippet that makes it possible (in the final application, we’ll use just the Canny function with the values we choose).
def trackbar_callback(value): print(f'Current value: {value}') pass cv2.namedWindow('test_canny') cv2.createTrackbar('min', 'test_canny', 0, 255, trackbar_callback) cv2.createTrackbar('max', 'test_canny', 0, 255, trackbar_callback) while True: key_code = cv2.waitKey(10) if key_code == 27: # escape key break minvalue = cv2.getTrackbarPos('min', 'test_canny') maxvalue = cv2.getTrackbarPos('max', 'test_canny') test_canny = cv2.Canny(bifilter, minvalue, maxvalue) cv2.imshow('test_canny', test_canny) cv2.destroyAllWindows() |
While the road in the image used for this tutorial takes up most of the picture, that does not always need to be the case. If the placement in the picture remains consistent, we might want to use a bitwise operation with a mask to eliminate unwanted edges. We’ll use a logical AND operation between a mask with our Region of Interest (ROI) and the output of the Canny algorithm to focus on the edges on the road.
canny = cv2.Canny(bifilter, 40, 180) mask = np.zeros_like(canny) rows, cols = canny.shape[:2] bottom_left = [cols * 0, rows * 1] bottom_right = [cols * 1, rows * 1] top_left = [cols * 0.01, rows * 0.4] top_right = [cols * 0.99, rows * 0.4] vertices = np.array([[bottom_left, top_left, top_right, bottom_right]], dtype=np.int32) cv2.fillPoly(mask, vertices, 255) canny_bitwise = cv2.bitwise_and(canny, mask) |
Finally, at this step, we can detect the lines in our processed image. To do this, we’ll use the HoughLinesP function from OpenCV. The Hough Line Transform is a feature extraction technique used to detect shapes in an image (in this case, lines, but it’s also popular for circle detection). It works by transforming points from the image space to the parameter space, where each point in the image corresponds to a sinusoidal curve in the Hough parameter space.
For each edge detected by the Canny algorithm, the Hough Transform algorithm maps the points into lines in the parameter space. When multiple points intersect at the same parameters, it indicates the presence of a line in the image. The probabilistic variant, HoughLinesP, makes the whole process more efficient by randomly selecting points from the edge-detected image and then only processing those selected points, which significantly reduces the computational load.
As the output, the HoughLinesP function will return the start and end coordinates for each line that it detects. As with the Canny algorithm, we can adjust the parameters, which include the distance and angle resolutions, the threshold for the minimum number of votes required to accept a line, the minimum length of a line, and the maximum gap allowed between line segments for merging them into a longer line. Tuning these parameters is crucial and needs to be adjusted for different applications. Here, we will use parameters like these:
lines = cv2.HoughLinesP( image=canny_bitwise, rho=1, theta=np.pi/180, threshold=100, minLineLength=25, maxLineGap=75 ) |
Now we can parse through our lines and do whatever we need with them. Here we’ll visualize them by drawing them on our original color image like this:
lines_list = [] for line in lines: x1, y1, x2, y2 = line[0] cv2.line(img, (x1, y1), (x2, y2), (255, 255, 0), 2) lines_list.append([(x1, y1), (x2, y2)]) # use the lines for further processing |