This page presents Nic_2, a Lego Mindstorms robot able to localize a sound source in space developped during a cooperation with Claude Baumann. It uses quasi simultaneous sound records with two microphones in combination with head movements to gain information about the direction of a unique sound source location. The robot uses only one RCX and an additional multiplexer and amplifier board for the rotation sensors and the microphone signals respectively. It samples sound with 36kHz on both channels of the amplifier during 8.33 ms (2x300 measurements) and then applies a correlation method to deduce the time difference of arrival (TDOA) of the two signals. It determines the so-called interaural time-difference. Using novel theorems (Binaural model for artificial sound localization based on interaural time delays and movements of the interaural axis, Kneip/Baumann), the robot is able to use only two measurements before and after a rotation of the interaural axis to deduce the exact direction of sound. Watch the video to get more details of this fascinating robot. Considering the fact that Lego materials (mechanics and processing unit) do not represent most professional and efficient equipement, one has to confess that the power of determinating the direction of incoming sound with such high precision certainly lies in the deterministic theorems developed in the scope of this project and presented in the previsouly mentioned JASA publication. The following video shoes the robot in action. Note that the sound source is situated on the left of the camera:

The invention of Lego robots able to localize the direction of sound has already a longer history at the Convict Episcopal of Luxembourg. A detailed listing of the previous work may be found here. However, none of these robots was able to deterministically localize the spatial direction of incoming sound of a continuous sound source. Nic_2 is the first, completely autonomous Lego robot able to solve this problem. The page also presents Nic_3, an analogon to Nic_2 whose mechanics have been built up using the new Lego NXT toolkit.

 

Hardware overview

Nic_2 is equipped with all three rotational degrees of freedom. The robot is able to rotate the interaural axis around the indicated x, y and z axis. Moreover, Nic_2 head-rotations conserve the origin point, which means the center between both microphones. The x- and the y-axis are driven by couples of shafts that are twisted against each other, so that the gears are submitted to torsion eliminating dead zones, if the directions of rotation are flipped. These careful measures reduce the error to less than 3° for any degree of freedom. But they also increase friction so that the mechanics have been geared down sufficiently. The LEGO rotation sensors are directly coupled to the motors. Nic_2 is equipped with two Electret microphones that point into the y-direction and that are separated at a distance of 16cm. With its omnidirectional characteristics, Nic_2 doesn not automatically solve the back-front ambiguity. Supposing that the signal's frequency is not too high, sound waves arriving from the rear will not be perceived weaker. A general threshold for the signal strength has been fixed by software.

The RCX offers only 3 input ports and other 3 output ports to the user. Because it was the declared goal to realize a low-cost device with one microconroller-system only, some analog electronics had to be added in order to assure the amplifying of both audio signals -wired to input ports 2 and 3- and the input multiplexing of the 3 rotation sensors that have thus been connected together to the remaining input port 1. (Using a microcontroller system that offers more input ports to the user makes multiplexing obsolete.) The three motors are directly controlled by the RCX. The multiplexer with amplifier may be regarded as a combination of the following two projects:

-microphone amplifier
-RCX rotation sensor multiplexer

 

Software overview

Nic_2’s functionality is described with: Turn towards the sound direction! As mentioned above, Nic_2 uses microphones with omnidirectional characteristics, which does not automatically solve the back-front ambiguity. Therefore, sound source locations are reduced to the front region. It allows to solve the problem applying only theorems A and C. What it essentially does is using a measurement before and after a rotation about the y-axis (with determination of interaural time delays) and then using Theorem C of the JASA article to mathematically determine the exact direction of the sound source, which means azimuth and elevation. The used correlation method to determine the phase shift between the amplified signals is a variant of the SAD algorithm (sum of absolute differences method).

The step by step program flow looks like the following:

-Robot initialized with x-axis in horizontal position
-Tilt around y = 30° // could be any angle
-Sample audio signals
-Determine ITD0 // operating the correlation method between both audio signals
-Tilt back around y=-30° // move x-axis back to horizontal position
-Sample audio signals
-Determine ITD1
-Calculate elevation // applying Theorem C
-Calculate azimuth // applying Theorem A
-Calculate altitude // see paper for more details
-Rotate around z=(90°- azimuth) // after rotation sound source is located in median plane
-Nod altitude // after nodding, direction coincides with y-axis

If the robot is finally asked to turn its head into the direction of sound, this means that it must rotate the head around the z-axis and the x-axis, with only 2 degrees of freedom being activated. The third degree of freedom has been used to effectuate the y-rotation and determine the elevation without ambiguity. The resulting head position fulfils the condition that the y-axis points towards the sound source.

The software has been realized with Ultimate Robolab, a great programming environment for graphical creation of powerful RCX firmware. This environment also provided the necessary mathematical library to evaluate the trigonometry of the theorems.

 

Experimental Results

Tests have been made in an ordinary reverberant room of 50m3 volume. The approximate distance to the sound source was 1m. The sound is emitted by a small music playing mono-radio, representing an immobile audio source. Depending on its relative position of the sound source to the robot head referential, the sound waves arrive at the microphones with different delays. The final results of the complete measurements for several sound source locations are shown below. As we can see, errors stay below 10 degrees, which is a very good performance regarding the hardware conditions of the system.

 

Notes

-The robot presented here does not solve the sound localization problem without any ambiguity in case of microphones with no directional sensitivity. The application of only Theorem C of the JASA article (rotation about the y-axis) leaves the "front-back"-ambiguity unsolved. A combination with a rotation about the z-axis could easily be used to solve the remaining ambiguity. Some other scenarios may be imagined with "up-down"-ambiguity solved (no sound source below ground level possible). A single rotation about the z-axis would then be sufficient. Some of these ideas as well as the use of translatory movements for a determination of the distance to the sound source are presented in the JASA paper as well, so in case of interests in sound localization issues, it is certainly worth having a look at it.

-A detailed error analysis is also provided in the paper, and it shows that a very decisive factor for the angular error is the sampling rate. This defines the time resolution for the determination of the interaural time difference.

-Another important thing about the robot presented here is that it only convinces for localizing static sound sources. The reason lies in the time the robot takes for the localization. Since it is quite considerable, the variation of the sound source location resulting out of a sound source movement can not be neglected. There are two main reasons for the long duration of the determination:

Movement of the interaural axis
Low processing power of the RCX

-Through its movement, the robot is obviously simply trying to imitate a multiple microphone setup. A simple rotation of a pair of microphones about its center is for example imitating a rectangular setup of 4 microphones. The multiple microphone solution would obviously have the advantage of being able to measure both interaural time delays at the same time, thus coming closer to a measurement of dynamical sound sources. A small device with four microphones, more processing power and higher sampling rate would certainly be able of following a moving sound source. The theorems presented in the JASA publication deliver the necessary mathematics for such a device. The fact that these equations are tightened to small embedded systems certainly motivates this idea.

 

back