Naviground is a navigation system implementable in manned and unmanned terrestrial vehicles. It allows navigation in structured and unstructured environments. I participated in the development of the perception system, especially in the detection of the environment using cameras.
Vision system
Although the navigation system had LIDAR and RADAR sensors, for several reasons it was desired to have a perception system formed only by cameras.
Although the price of LIDAR and RADAR has decreased a lot in recent years, it is still more expensive than the cameras.
LIDAR and RADAR sensors are active sensors (emit an electromagnetic wave and measure the reflection), so in a war environment they make the vehicle detectable.
As an autonomous vehicle, the processing cannot be done on a very powerful machine, so if the processing of the amount of data that LIDAR and RADAR generate can be eliminated, it is better.
To perform the detection of the environment, we used three types of neural networks:
Semantic segmentation networks
They classify what class each pixel of the image belongs to, obtaining a segmentation mask.
Object classification networks
They can detect objects in the image using a YOLO.
Depth estimation
A neural network can estimate the depth of each pixel of the image, so we can obtain the distance to each object.
Training
Our problem was that as it was a vehicle for structured and unstructured environments, the pre-trained networks did not suit us, so we had to make trainings of the segmentation and object classification networks.
Dataset
We had hours of videos recorded during tests in environments like this, so we created a dataset
We created an algorithm that, using an unsupervised classifier, created several clusters of images, where the images of each cluster were similar to each other. In this way, we stayed with a few images of each cluster, so we had a dataset with heterogeneous images.
Labeler
Labeling objects for YOLO, although it is tedious, it is a relatively fast and easy process
However, labeling images for semantic segmentation, where each pixel has to be labeled, is a slow and tedious process. As none of the labeling tools for segmentation convinced us, we built our own labeling tool. It was so good that it was reused in other projects and even talked about commercializing it.
Training images generation
One of the problems we had is that all the training images were day, with sun, without rain, etc. So to make the networks more robust we needed more images. But that means that someone has to go out at night, wait for it to rain to have images with rain, wait for it to snow, which is more complicated, etc.
At that time there were many good image generation networks, so we could generate images with new environmental conditions, but the problem was that they had to be labeled, and for segmentation it required a lot of time.
So I built a pipeline that, using generative AI, modified the environmental conditions of the images that we already had labeled, having images in different environmental conditions, but without having to lose time labeling them.
Optimization with TensorRT
As this had to work in a vehicle, it could not use a powerful computer. So a embedded device, a Jetson Orin, was used. So it was important to optimize the neural networks to make the inference as fast as possible.
I optimized them with TensorRT, making them run up to 40% faster in some cases.
Hugging Face spaces allow us to run models with very simple demos, but what if the demo breaks? Or if the user deletes it? That's why I've created docker containers with some interesting spaces, to be able to use them locally, whatever happens. In fact, if you click on any project view button, it may take you to a space that doesn't work.