Development of an Uncalibrated Visual Space Mouse


Student Project

by Tobias Peter Kurpjuhn

Computer Vision and Robotics Laboratory
Beckman Institute
University of Illinois at Urbana-Champaign

August 1998

Advisors :
Prof. Seth Hutchinson, UIUC
Prof. Kevin Nickels, Trinity University
Dipl.-Ing. Alexa Hauck, LPR


Abstract

In many areas of our daily life we are faced with rather complex tasks that have to be done in circumstances unfavorable for human beings. For example, heavy weights may have to be lifted or the environment may be dangerous and, therefore, the assistance of a machine is needed. Some of these tasks, on the other hand, also need the presence of a human, because the complexity of the task is beyond the capability that an independent robot system is able to handle. Therefore, there is a need for a robot system controlled by a human.

A most intuitive controlling device would be a system that can be instructed by watching and imitating the human user, using the hand as the major controlling element. This would be a very comfortable interface that allows the user to move a robot system in the most natural way. This is called the visual space mouse.

The system of the visual space mouse can be divided into two main parts: image processing and robot control. The role of image processing is to perform operations on a video signal, received by a video camera, to extract desired information out of the video signal. The role of robot control is to transform electronic commands into movements of the manipulator.

Proposed approach

The purpose of this project was to develop a system that is able to control a robotic system by observing the human and directly converting hand gestures into movements of the manipulator. The hand serves as the primary controlling element to effect the actual motion and position of a robot gripper. For the observation of the user, one usual greyscale camera is used without any kind of calibration. The manipulator is a PUMA 560 robot with six degrees of freedom and a gripper.

We use the image processing language VEIL for image processings. A special feature of VEIL is blobs. These are defined as a brighter region in the image plane within a darker environment. The hand is detected and traced with the help of blobs. This blob contains the characteristic values of the image of the hand. The values of the blob are then passed to the control part of the program to affect the actual position of the manipulator.

In the mapping from the three-dimensional hand in the world to a blob existing in a two-dimensional plane, a lot of information is lost. In particular, rotations not lying in the image-plane cannot be resolved well. Any rotation with the rotation axis parallel to the image plane will just change the heigth and the width of the object. The sign of the rotation is especially to determined. This is a limitation of 2D image analysis in general. There are only three dimensions that are robustly detectable of an object in a plane: height, width and one rotation in the image plane.

The control task of a manipulator with six degrees of freedom is therefore very difficult or even impossible with just 3 values. To handle this problem, and to keep the user interface intuitive, a state machine was implemented.

The state machine consists of three different levels: two control levels and one transition level. The control levels are used to move the manipulator. The transition level connects the two control levels and affects the gripper of the robot arm.

Every time the flat hand is facing the camera, as shown above, the state machine of the controlling unit is in one of two control levels. In each control level the manipulator can be moved in a plane, by moving the hand in the up-down direction or forward-backward direction. The control levels differ in the orientation of the planes in which the manipulator can be moved in. The plane of control level 1 is orthogonal to the plane of control level 2.

To change the control levels the hand has to be turned, so that the side of the flat hand is facing the camera. In this mode the hand can be moved within the sight of the camera without effecting the manipulator. This mode is called the transition mode. If the hand is turned back so that the flat hand is facing the camera again, the state machine of the control unit moves back into the other control level.

With the use of the two planes, described previously, only a cubic space in front of the arm can be accessed. With the rotation along the z-axis this cube can be rotated and so the whole area around the manipulator is attainable. The rotation is initiated just by rotating the hand in the image-plane. Also, the gripping gesture is part of the transition level. Placing the gesture of the gripper in the transition level has the advantage that any movement of the hand has no effect on the manipulator itself, which will keep the gripper fixed during the gripping gesture.

Experiment

An experiment was performed to validate the functions of the system. The task was to assemble a house out of three randomly placed wooden pieces.

Several people have been chosen to perform this experiment without any training. Each person was able to successfully finish the task. The experiment showed that the state machine with its two separated control levels was no problem for the candidates. The biggest problem was the gesture for the gripping movement. It became obvious that the choses gripping gesture was nonnatural to perform.

Conclusion

The major attempt of this project was to combine an image processing unit with an control unit to achieve a convenient, image-based control system for a manipulator: the visual space-mouse. This intention was achieved successfully. As it was demonstrated by the experiment, a person is able to successfully manage to handle simple manipulation tasks by using the visual space-mouse-system developed as a remote tool.

Indeed, it became obvious that the possibilities of controlling a six dimensional manipulator just by using one greyscale camera as input is very limited, because only three dimensions can be robustly observed by the video output.


Download the entire report (compressed postscript, 2.3M).

Videos:

These videos are in the Quicktime movie format using Cinepak compression. They may be viewed with xanim with the Cinepak library built in.


Last modified: Mon Aug 17, 1998.