RobotVision
From jderobot
Visual Attention, 3D reconstruction from vision
People
- Julio Vega Pérez (julio [dot] vega [at] urjc [dot] es)
- José María Cañas Plaza (jmplaza [at] gsyc [dot] es)
- Eduardo Perdices (eperdes [at] gsyc [dot] es)
Development
- Jde Version: jde-4.2.1
- SVN Repository: source code
- Trac. It is the trac already used for JDE development, but all the relevant tickets to this project belongs to the "robotvision" milestone
- Source License: GPLv3
- Document License: Creative Commons Attribution-Share Alike 3.0 Unported License
- Tags: robot, guide, navigation, vision, recognition
- Technology: c, c++, jde suite, openGL
Documentation
- Gonzalo Abella's Master Thesis: 3D Visual Attention Project PDF
- Roberto Calvo's Master Thesis: 3D Visual Attention Project PDF
- Javier Martín's Master Thesis: Face Detection and Tracking PDF
- Julio Vega's Technical Report on VisualSonar: Floor 3D Recognition PDF
Hardware
(not available yet)
Greatest hits
(not available yet)
Timeline
- 2010.07.13
Log-polar images and movement analysis See more details
- 2010.03.26
Designed new GUI. See more details
- 2010.03.23
Pioneer following arrows from one side to the opposite. See more details
- 2010.03.23
Space recognition with a spiral movement, with a big threshold. See more details
- 2010.03.12
Space recognition with a spiral movement. See more details
- 2010.03.10
Security window. See more details
- 2010.02.26
Consistent 3D memory. See more details
- 2010.02.25
Pioneer around lab, recognizing environment. See more details
- 2010.02.17
Pioneer around lab following arrows. See more details
- 2010.02.04
Algorithm following arrows. See more details
- 2010.02.01
Algorithm detect and pay attention to arrows. See more details
- 2009.12.16
Algorithm detect and pay attention to faces and parallelograms. See more details
- 2009.11.26
Running with noise. See more details
- 2009.11.25
Parallelograms reconstruction. See more details
- 2009.11.24
Corridor clearer reconstruction. See more details
- 2009.11.20
Corridor corner reconstruction. See more details
- 2009.11.19
Corridor reconstruction. See more details
- 2009.11.11
Whole floor reconstruction using single segments. See more details
- 2009.11.10
First segment merge implementation. See more details
- 2009.11.02
Floor 3D reconstruction using lines. See more details
- 2009.10.28
Complete floor 3D reconstruction. See more details
- 2009.10.28
Camera extrinsics parameters manual adjustments. See more details
- 2009.10.22
Step by step camera extrinsics and intrinsics measurements. See more details
- 2009.10.09
Problem: pantilt oscillations. See more details
- 2009.10.08
3D Floor reconstruction with camera autocalibration. See more details
- 2009.10.02
Systematic floor reconstruction. See more details
Pioneer Visual Sonar
2010.07.13. Log-polar images and movement analysis
In order to do a faster attention system algorithm, we're gonna work with log-polar images. At first, we have a covert attention system guided through the movement. That way, we're simulating the human eye mechanism: retina and fovea, and the basic human eye behavior: we pay attention where we see movement.
![]() |
| |
|
|
|
|
2010.03.26. Designed new GUI
Now, we've a new GUI more useful and easy to see whatever you want to get. Here, I show a screenshot.
![]() |
| |
|
|
|
|
2010.03.23. Pioneer following arrows from one side to the opposite
On this video, we show the Pioneer behavior with our attention system. It has to follow arrows located above Robocup football field, with a movement similar to famous Ping-Pong game.
2010.03.23. Space recognition with a spiral movement, with a big threshold
Here, we do the same experiment as the previous one, but now we want to produce continuous lines. So we've had to increase the parallelism threshold in order to fusion themselves. But the main problem is the noise we get in this case.
2010.03.12. Space recognition with a spiral movement
On this video we see the robot spiral movement through the Robocup football court. How we can expect, it has odometry errors in its estimations. Anyway, the memory is coherent around the robot, correcting previous and erroneous estimations.
2010.03.10. Security window
Security Window has been incorporated to the robot. We can see the 3D representation.
![]() |
| |
|
|
|
|
2010.02.26. Consistent 3D memory
Now we can see how robot 3D memory is consistent with the real environment.
2010.02.25. Pioneer around lab, recognizing environment
Here we can see representation inside robot 3D-memory, while it has been moving a few minutes. It an amazing image because odometry errors aren't significant. Segments estimation and real segments fit properly.
![]() |
| |
|
|
|
|
2010.02.17. Pioneer around lab following arrows
In these videos we can see how our robot go around lab avoiding obstacles, following arrows and detecting faces. The attention system is well build, so robot only pay attention those things it considers important for its navigation. While robot is moving, it goes recovering information around environment and doing its memory. In the first video, we see Pioneer robot from a exterior camera. In the second one, we see the robot memory, from the on board camera.
2010.02.04. Algorithm following arrows
In this video, our algorithm is detecting all objects around robot. Now, arrows let robot knows where is the goal. When there are many arrows around the robot, only the nearest arrow is the main influence over robot navigation.
2010.02.01. Algorithm detect and pay attention to arrows
Now, algorithm's able to detect faces, parallelograms and arrows, with too much noise. That way, our robot will able to navigate following these landmarks.
![]() |
| |
|
|
|
|
2009.12.16. Algorithm detect and pay attention to faces and parallelograms
In this interesting video, we can see how our system reconstruct and follow detected faces and parallelograms. Now, we've only a set of attention system elements (parallelograms and faces) with saliency and liveliness dynamic. Both of them has a centroid point used to focus image towards them.
2009.11.26. Running with noise
Here, we can see the application behavior with too much noise. We've tried the parallelogram recognition with a very low line detection Hough Transform threshold, in order to show the robustness of our algorithm.
![]() |
| |
|
|
|
|
2009.11.25. Parallelograms reconstruction
We introduce the hypothesize concept in order to detect partial-seen parallelograms and reconstruct them. Here we can see lots of parallelograms over the floor and how the robot is able to detect and reconstruct some of them.
2009.11.24. Corridor clearer reconstruction
Now, we can see a clearer corridor reconstruction. We've only a single line with each side of the corridor. The horizontal lines belong with corridor doors.
![]() |
| |
|
|
|
|
2009.11.20. Corridor corner reconstruction
One more step. Today, Pioneer goes through the corridor with free rotational movement in order to reconstruct when it's turning around corner.
![]() |
| |
|
|
|
|
![]() |
| |
|
|
|
|
2009.11.19. Corridor reconstruction
Using last development, Pioneer is able to reconstruct the department corridor. The main difficult is in the reflections over floor, so we've modulated correctly many filter parameters in order to avoid them. We can see the result in the next two images.
![]() |
| |
|
|
|
|
![]() |
| |
|
|
|
|
2009.11.11. Whole floor reconstruction using single segments
Here we've improved the merge function in order to get the longest segment for every direction in the world. And robot's able to get the whole floor 3D reconstruction using single lines.
![]() |
| |
|
|
|
|
2009.11.10. First segment merge implementation
At this point, we've converted our world to segments. Furthermore, we want to merge and overlap repeated lines. The first implementation is a good step because we have: - Single and not repeated segments in the world. - We check parallel segments and we keep the segment memory correctly.
The result is the next. Now we want to improve the merge function in order to get the longest segment for every direction in the world.
![]() |
| |
|
|
|
|
2009.11.02. Floor 3D reconstruction using lines
Before we've been using points to represent the floor lines. Now, In order to decrease the time to process image, we want to work with lines. So we've segmented the image detected borders and we obtain a set of lines drawn by the OpenGL instruction GL_LINES.
![]() |
| |
|
|
|
|
2009.10.28. Complete floor 3D reconstruction
Using the last development, I've added robot Pioneer movement in order to reconstruct the whole floor.
![]() |
| |
|
|
|
|
On the next videos we can see:
- 1 The distance covered by the robot.
- 2 The 3D floor reconstruction on its virtual 3D-memory.
2009.10.28. Camera extrinsic parameters manual adjustments
Because of last not-perfect results, I've added new sliders on the frontera GUI; that way, I've manually adjusted camera extrinsic parameters and now the result is perfect.
![]() |
| |
|
|
|
|
We can see the correlations between the three points of view, each of them is painted with a different color.
2009.10.22. Step by step camera extrinsics and intrinsics measurements
As I said last time, we'd several problems with pantilt oscillations. Furthermore, deep estimations weren't very precise. So, we decided to extract camera extrinsics and intrinsics parameters, step by step. 1) Using extrinsics schema, and knowing the camera absolute position in the world, we determined the correct camera intrinsics parameters (u0, v0 and roll). At this moment, we realized about progeo coordinate system more info. Definitely, we use a different one. This is the result:
![]() |
| |
|
|
|
|
2) After that, we put the camera above the pantilt device, we measured its position again, and this is the result.
![]() |
| |
|
|
|
|
3) The third step was to use mathematical model, but only with a single RT matrix. We'd to correct pantilt position, the optical center and the tilt angle given by the pantilt encoders... we did that again and again until we got this result.
![]() |
| |
|
|
|
|
4) When we knew every parameter (PANTILT_BASE_HEIGHT, ISIGHT_OPTICAL_CENTER, TILT_HEIGHT, CAMERA_TILT_HEIGHT, PANTILT_BASE_X, PANTILT_BASE_Y) we continued with the rest of RT matrix's until we'd the whole system. This is the result.
![]() |
| |
|
|
|
|
5) Finally, in order to check the correct mathematical model, we moved pantilt with saccadic movements and we're able to check the pantilt oscillations problem that I told last time. We got these sequences.
![]() |
| |
|
|
|
|
![]() |
| |
|
|
|
|
![]() |
| |
|
|
|
|
![]() |
| |
|
|
|
|
2009.10.09. Problem: pantilt oscillations
We've realized that pantilt movements are not uniform. Each iteration, the pantilt is not on the same position as last iteration. On this figure, we can see the oscillations on pan axis (blue line) when it's moving forward left and right sides; the final adopted positions are different. On the other hand, the tilt axis (red line) isn't moving, so the position is always the same. The values are expressed on radians.
![]() |
| |
|
|
|
|
2009.10.08. 3D Floor reconstruction with camera autocalibration
Now, we've introduce the RT matrix concept in order to calculate relative positions. So, we can know the camera position in the world and its focus of attention (foa). We have the following RT-matrices:
- Robot position relative to world coordinates (translation on X & Y axis and rotation around Z axis)
- Pantilt base position relative to robot position (translation on Z axis)
- Tilt height position relative to pantilt base (translation on Z axis and rotation around Z axis)
- Tilt axis relative to tilt height (rotation around Y axis)
- Camera optical center (translation on X & Z axis)
- Focus of attention relative to camera position (translations on X axis)
Because of we don't know specifically where is the optical center on the Isight camera (about 100mm long size), I've test several positions in order to get the best match between real and virtual coordinates. The following images corresponds to different optical centers: -10, -20, -30, -40, -50, -60, -70, -80 and -90 mm from image plane until the bottom of the physical camera. Finally, we can conclude the best optical center estimation is in the -20 mm position.
-10 mm
![]() |
| |
|
|
|
|
-20 mm
![]() |
| |
|
|
|
|
-30 mm
![]() |
| |
|
|
|
|
-40 mm
![]() |
| |
|
|
|
|
-50 mm
![]() |
| |
|
|
|
|
-60 mm
![]() |
| |
|
|
|
|
-70 mm
![]() |
| |
|
|
|
|
-80 mm
![]() |
| |
|
|
|
|
-90 mm
![]() |
| |
|
|
|
|
2009.10.02. Systematic floor reconstruction
Here, we can see the three-views floor reconstruction. In this case, we've established three marks manually corresponding to the three different focus of attention (foa). That way, we can recalibrate camera for this three positions.
![]() |
| |
|
|
|
|
Previous works
You can see previous versions on guiderobot project website.
Nao Visual Sonar
2009.11.20. Ransac Algorithm
Another way to calculate the RT of the movement of the robot is using the homography and the RANSAC algorithm. This algorithm is able to calculate a great approximation of a set of data even when we have a lot of wrong matches and noise.
To test this algorithm, we have developed a simple program where we obtain a line from a set of points which determines the position of the points in the space. We have set a lot of points out of the main line to check if it affects the final result:
![]() |
| |
|
|
|
|
2009.11.10. Tracking points
Since we already know how to calculate a position in 3D from a pixel in the image, if we get a real point in the image and calculate his 3D relative position and after some time we get the same real point and calculate again his 3D relative position, we can know how many centimetres the robot has been moved comparing the two measures.
But to know if the point we are looking for is the same in both images, we need to track his movement in the image. This is not a trivial matter, so we have decided to use this feature from another project in the URJC Robotics Group, developed by Luis Miguel López Ramos (http://jde.gsyc.es/index.php/Lm.lopez-pfc-teleco).
If we only track one point, our measure will be able to have a lot of error and our possible movement won't be unique. For instance, if the point is moved from (10, 10) to (20, 20) in the image, we have been able to move in X-Y, but we have been able to turn a bit the robot too. So if we want a precise measure, we need to track all the points we can.
Once we have calculated the 3D position of X points, we need to calculate a RT matrix to know what has been the robot movement. There are too many ways of calculating this matrix, but, in the first place, we have used the SVD algorithm (Singular value decomposition).
In the next video, we show our first implementation. The tracking is already too poor, and some points can be lost when the robots is moving, but if we track enough points, we can calculate the robot movement:
2009.10.27. Painting environment
The next step in the Visual Sonar schema has been to paint the environment of the robot to detect the objects around him. The first approach of this new feature is to calculate the position of every edge in the image and show it inside an opengl simulated world.
To detect the characteristics points of the environment, we have used a canny filter to detect the edges of the image and we have calculated the relative position in 3D of every edge in the image.
In the next image we can see our new feature working after calculating the edges in several images:
![]() |
| |
|
|
|
|
We have programmed the robot nao to take an image in 9 different camera positions, as we can watch in the next video:
2009.10.23. Ball Position
We have developed a JdeRobot schema to calculate the relative position of a ball using only one camera in the Nao humanoid robot. We can select one camera from the robot (top or bottom) and calculate the position of an object with the robot odometry and the calibrated parameters of the camera.
In the next images, we use this schema and the naobody driver to calculate the ball position in webots using both cameras:
Top camera:
![]() |
| |
|
|
|
|
![]() |
| |
|
|
|
|
Bottom camera:
![]() |
| |
|
|
|
|
![]() |
| |
|
|
|
|
We can check how the positions of the ball with both cameras are the same. To find the ball, we have developed an algorithm that uses a double threshold color histogram in columns and rows.
If we use this algorithm in the real Nao, there are some problems because of the camera calibration parameters. To solve it, we have used "calibrator" and "extrinsics" schemas to obtain the intrinsic parameters of both cameras.
In the next video, you can watch the final version of the ball detector using the real bottom camera:
To know why there are some errors when we calculate the position of the ball, we have decided to put on top of the real image the theoretical image with all the lines that we should see on the image. This new feature has allowed us to calibrate the intrinsic parameters of the cameras in a better way. In the next video we show again the ball position with the new intrinsic parameters:
Intelligent Followface
2009.09.14. Systematic search
On this video, we've tested the systematic search around scene, in order to guarantee system will explore all scene around it. Thus, we'll search faces using random search with systematic search. Now, we're sure that any face will be out of range.
2009.09.09. Following faces around scene, with saliency and liveliness dynamics
Here, we can see a visual attention mechanism. Now, our algorithm chooses the next fixation point in order to track several objects around the robot simultaneously. This behavior is based on two related measurements, liveliness and saliency. The attention is shared among detected faces and new exploration points, when forced time to explore scene is out. Moreover, this time is depends on how many faces are detected. If we've several detected faces, this time will be large...
2009.09.03. Following faces around scene
Now, as we told last time, we have a continuous space in order to gaze the pan-tilt unit according to the major saliency object. Sometimes, we'll have to introduce some virtual faces to explore new zones... And when we find a face, we stop there watching it. Next step is instead of stopping, following that face...
2009.08.31. Following multiple faces from different scene perspectives
Here, you're the last version of this "intelligent followface". We've decided to change our point of view and now instead of having three parts on the scene, we'll have a continuous space in order to gaze the pan-tilt unit according to the major saliency object. Sometimes, we'll have to introduce some virtual faces to explore new zones...
2009.08.25. Following multiple faces
2009.08.19. Following one face
3D Visual Attention
(not available yet)
3D Visual Reconstruction
(not available yet)
Followface
(not available yet)










































