NAOqi Audio - Overview | API

What it does

ALAudioSourceLocalization identifies the direction of any loud enough sound heard by NAO.

How it works

The sound wave emitted by a source close to NAO is received at slightly different times on each of its four microphones. For example, if someone talks to the robot on his left side, the corresponding signal will first hit the left microphones, few milliseconds later the front and the rear ones and finally the signal will be sensed on the right microphone.

These differences, known as ITD (Interaural Time Differences), can then be mathematically related to the current location of the emitting source. By solving this equation every time a noise is heard the robot is eventually able to retrieve the direction of the emitting source (azimutal and elevation angles) from ITDs measured on the 4 microphones.

The result of this computation is regularly updated in ALMemory on the key ALAudioSourceLocalization/SoundLocated formated as follows:

[ [time(sec), time(usec)],

  [azimuth(rad), elevation(rad), confidence],

  [Head Position[6D]]

Performances and Limitations


The angles provided by the NAO’s sound source localization engine match the real position of the source with an average accuracy of 20 degrees, which is satisfactory in many practical situations. Note that the maximum theoretical accuracy depends on the microphones spatial configuration and on the sample rate of the measured signal, and is about 10 degrees on NAO.

The distance separating NAO and a sound source successfully located can reach several meters depending on the situation (reverberation, background noise, etc...). Once launched, this feature uses 10% of the CPU constantly and up to 20% for few milliseconds when the location of a sound is being computed.


The performance of NAO’s sound source localization engine is limited by how clearly the sound source can be heard with respect to background noise. Noisy environments naturally tend to decrease the reliability of the module outputs. It will also detect and locate any loud sounds without being able by itself to filter out sound source that are not humans. Finally, only one sound source can be located at a time. The module can behave in a less reliable manner if NAO faces several loud noises at the same time. He will likely only output the direction of the loudest source.

Getting started

Use the Sound Tracker Choregraphe Box after having set NAO’s stiffness to 1 (to enable head movements).

Use Cases

Here are some possible applications (from the simplest to the more ambitious ones) that can be built from NAO’s ability to locate sound sources.

Case 1: Noisy event localization

Using the ALAudioSoundSourceLocalization to have a person enter the camera field of view (as shown in the above example). This allows subsequent vision based features to work on relevant images (images showing a person for example). This is consequently of interest for these specific tasks:

  • Human Detection, Tracking and Recognition
  • Noisy Objects Detection, Tracking and Recognition

Case 2: Audio Source Separation

ALAudioSoundSourceLocalization can be used to strengthen the Signal/Noise ratio in a specific direction - this is known as Audio Source Separation - and can critically enhance subsequent audio based algorithms such as:

  • Speech Recognition in a specific direction

Case 3: Multimodale applications

Theses possible applications can also be mixed together making NAO’s sound source localization the basic block for sophisticated applications such as:

  • Remote Monitoring / Security applications (NAO’s could track noises in an empty flat, take pictures and record sounds in relevant directions, etc...)
  • Entertainment applications (by knowing who speaks and understanding what is being said, NAO could easily take part in a great variety of games with humans.)