Introduction
The goal of the project is to create a pipeline, being able to identify cars on the road. To achieve that, the following steps are performed and analyzed:
The following paragraphs address each point of the project. The code consists of the following python files:
**Top level files **: These files are executed on their own
vehicle_train_classifier
: Creates the features and trains an SVC classifier for vehicle detection, that is saved into a pickle filevehicle_tracking_pipeline
: The main pipeline for the vehicle detection using the classifier from the previous filevehicle_tracking_pipeline_image
: The same pipeline applied to a single imageFunctions
vehicle_detection_functions
: All the necessary functions required for the vehicle detection. Details follow throughout the reportheatmap_functions
: The heatmap function required for eliminating false positives and duplicates in the detectionThe current document.
In the vehicle_train_classifier
file, the training images for cars and notcars are first extracted. Then, from lines 32-55, the extract_features
function is creating the features array, consisting of hog, spatial and colour histogram features. The function is located in the vehicle_detection_functions
files, lines 101 - 151.
For the hog features, the code is located between lines 133 - 146 and is using the function get_hog_features
The color space YUV was used for channel 0. An implementation example of a test image is show below:
I tried several combinations of hog transformation most of which were proposed in the lectures. One important choice is the color channel one. RGB is not the most suitable and I would expect the HSV, LUV or YCrCb are more suitable. I tried many combinations, none provided a dominant performance over the others. I ended up with YUV for channel 0 only as it was performing relatively decently. I skipped the rest of the channels as well as the histogram of colours as they were not adding much improvement but they were costing in term of computation.
After extracting the features in the vehicle_train_classifier
, then the feature and label vectors are created. The feature ones are scaled, so that each component has similar contribution to the final result and an SVC classifier is applied. The code is located between lines 66-85. The resulting classifier is then saved to a pickle file and used afterwards for the video pipeline.
From now on, the file used is the vehicle_tracking_pipeline
one. I avoided using a fixed size window, as cars closer appear bigger, so I went for an approach of a multiscale window, using three scales, implemented between lines 31-51. There are smaller windows for distant cars, medium size and larger size ones for closer cars. An overlap of 0.5 for x and y was selected which is a reasonable trade-off between having too many windows and missing some detections.
The drawn boxes are shown in the following figure:
Using the 3 lavel scaling provides a total number of 165 windows to search. Executing the script vehicle_tracking_pipeline_image
, some representative results for the testing images are shown below. This is the final result that excludes the false positives. Details of intermediate steps are address in the next part. For optimising the performance, the small and more numerous windows occupy the upper half of the searching area so they are kept to a smaller level than expanding them throughout the image.
Here’s a link to my video result
The pipeline is implemented between lines 60-80 in the vehicle_tracking_pipeline
file. The script is performing the following steps.
Initially, the hot windows are extracted. These are the windows that the classifier is identifying as ones. The test images are looking like:
There are several succesful windows and a false positive. In order to unite the correct windows and eliminate the false positive, a heatmap is applied using a minimum thershold of two. The image looks as below:
That way the false positive is eliminated and the final results is the one shown in the previous diagram.
Certainly there can be improvements to the final result. I understand the process but there are some points where the current implementation shows weaknesses. For example when the cars are detected, the detection window is not always the same shape. Sometimes it can get larger and other times narrower. Also there are some dead intervals when the car is not detected also it is shown clearly. I attribute that failure to the choice of the transformation although I could see that failure with all the methods that I tried. The choice of the transformation parameters to extract the image features should also affect the results. I reckon, in a more robust implementation that history should also taken into account, meaning that when a car is detected in a frame and is not at the edges of an image, it should be detected with certainty at the next frame too.
Finally, I do not have a reference for the speed of the implementation but the current one for a real time application is slow. The current one in a relatively fast computer was achieving 4 it/s. Adding more windows or larger feature sets could frop that numer to 2 or even 1.7. A faster implemetation and a faster language such as C, should provide better processing speed.