In September of 2007, Axion Racing engaged Fixstars to assist with the introduction of a Sony PS3 running Yellow Dog Linux to replace one of the on-board Dell servers for realtime, stereoscopic vision. Fixstars' Bill Mueller had just ten days to pull off a very hard task ... and he did. The following is Bill's account of his work, start to finish.
Two logitech quickcam pro 5000's connected via USB to the PS3 running Yellow Dog Linux 5.0.2. The PS3 is connected to Spirit's Dell server rack over 100Mbit Ethernet. The PS3 captures the images, processes the data, and sends a message to Spirit indicating the presence, distance, and general direction of obstacle.
Stereo Vision Concept
The two cameras are placed side by side. Pictures from the right and left cameras are taken simultaneously. This data is fed into an algorithm that detects apperant amount of shift detected by objects in the pictures. The algorithm outputs a disparity map. The greater the shift, the higher the value in the disparity map. How is this useful?
The object is seen by both cameras. But because the cameras are seperated, they see slightly different views of the same object. As an object comes closer, the left and right cameras view is very different. As an object moves into the distance, the cameras see a greater portion of the front.
You can test this yourself by placing your hand 6 inches from your face. Alternate closing your left and right eye. Notice your hand appears to shift right and left. Now move your hand out far away and repeat. It doesn't shift as much now, right? Same idea here.
Now that we have the disparity map, we can calculate the relative distance of an object by the amount of disparity calculated.
Not really impressive in the raw state. So I overlayed the images and processed the disparity map (left). Two things become immediately apparent: a) The closest objects (girl and boy) have a very distinct outline in the disparity map. This is exactly the behavior we are looking to see; b) There is a lot of noise detected from the high contrasting garage and wall. This is partially due to the wall being only 3 ft behind the boy, but also demonstrates the importance of the physical camera positioning.
The logitech quickcam pro 5000 cameras were installed using the ucv V4L2 camera drivers.
The images were captured using fswebcam. The disparity map generation relied heavily on the previous work of Stan Birchfield.
The image manipulation and conversions came out of the latest release of Netpbm. The Many other utilities were tried, and will be the topic of howto's in the future.
Initial Lab Tests
The first tests were conducted in my home office, primary effort in testing all the available packages to find which one was closest to what was required. In the screenshot of the initial tests in capturing data and playing with format conversions, the live feed is from both the left and right cameras. Near the bottom left is the disparity map and line fitting experiments. On the right is a little script I wrote to capture images, do the conversion, and feed to the p2p algorithm. I first tried the line-fitting and edge detect portion of the p2p example code, but it really didn't give me what I needed.
Performance benchmarking yielded cycling at 2Hz:
.450 s to open a connection to the cameras
< .1 sec to process the images/disparity maps (PPU only)
The algorithm is highly vectorizable with little effort as it is one big matrix calculation. The camera is by far the weak link. We can add another 2 cameras to cut it in half, or try bursting data from the camera. But I chose to do single open, read, close transactions for stability. Also, removing the extra disk I/O at the conversion stages would speed it up as well.
The First Road Test
My first road test was fun. I rented a Dewalt generator, and tossed it into the bed of my truck. It gave me power to strap a piece of wood across the hood and place the PS3, cameras, and other required networking gadgets. I established a remote X session to my ibook, and I was off.
Adding Object Direction Detection
Once it was apparent the system was doing object detection, the next step was to get a direction and a distance measurement off the captured images. This was done by calculating the amount of disparity across the image and filtering out the ambiant noise. I tried streching the source image from the left camera across the bottom for reference.
The yellow map shows calculated distance in feet. The red area displays how good the data is (higher value == less noise detected). The blue area displays actual object detection. Above zero values means no object detected. Below zero means a collision is imminent.
As you can see, the backs of the chairs and my head is detected fairly well (as indicated by note 1). Note 2 shows where there was too much noise and we got no data. This is an image of completely raw data. Subsequent tests included a filter to smooth it out a little.
One point to note here, high contrasting background noise fakes out the algorithm a bit. To the far right, you can see a lot of bouncy data from where the edge of the closet is detected. As noted previously, this can be reduced with proper calibration of the cameras but not completely eliminated.
Camera mounting and calibrating is essential for this system to work. The mounts used for the first road test were the stock "balls" and could not be positioned well. So we ripped out the camera guts and built new mounts from aircraft aluminum... Basically just a bracket with a guard to protect the camera apperature. This design was chosen for easy mounting on the vehicle. The three set screws in the base allowed us to fine tune aim the camera on all three axises.
San Diego Road Test
I flew down to San Diego to meet the engineers behind Axion Racing and Spirit. I was impressed by both their technical skill and, well, coolness. Their passion for this vehicle was very evident and contagous. First item was to mount the PS3. It was installed on top of the server rack currently housing 6 Dell servers. Next, mount the cameras. To get a good view of the road and for additional protection, they were mounted on the underside of the trailer hitch. Two stablizing brackets on the hitch will protect them from harder impacts.
The rest of the time was spent interfacing to Spirit's existing control system, aiming the cameras, and tuning the filters to produce the cleanest signal. When I left, Spirit was recieving good data from the PS3 vision system but required further tuning before the race.
"Special thanks to Banks Integration Group for Field Testing Resources" for their support of Bill's adventures in robotic vision.