SoftBank Robotics documentation What's new in NAOqi 2.8?


NAOqi Emotion - Overview | API | Tutorials

What it does

When ALAutonomousLife is on, ALMood estimates the emotion of the human(s) in front of the robot, their attention towards the robot, and the ambiance around the robot.

As a user of the service, you can query the underlying representation of the emotional perception, to obtain a set of emotional descriptors (e.g. positivity, negativity, attention):

You can connect to ALMood’s signals and properties, to know if the human in front of the robot is positive, negative, neutral, if the user is unengaged, semiEngaged or fullyEngaged towards the robot, or the environmental ambiance is calm or excited.

How it works

ALMood works by estimating the mood of the focused user, provided by ALUserSession.

ALMood’s estimation of valence (i.e. positivity or negativity) builds upon various extractors and memory keys; in particular, it currently uses head angles from ALGazeAnalysis, expressions properties and smile information from ALFaceCharacteristics, acoustic voice emotion analysis from ALVoiceEmotionAnalysis and semantic information from the words spoken by the user.

ALMood’s estimation of attention (i.e. unengaged, semiEngaged or fullyEngaged) takes into account both the head orientation and eye gaze direction of the user to calculate the level of focus to the interaction. For example, attention will be high if the user’s head is tilted upwards but their eyes are looking downwards at the robot. The same features are used for estimation of attentionZone (i.e. LookingAtRobot, LookingLeft), allowing the robot to know where the user is looking, in fine detail.

ALMood’s estimation of ambiance (i.e. calm or excited) takes into account the general sound level from ALAudioDevice and the amount of movement in front of the robot.

ALMood retrieves information from the above extractors to combine them into high and low level extractors. Users can access the underlying representation of the emotional perception through a two-level key space: a consolidated information key (e.g. valence, attention) and more intermediate information key (e.g. smile, laugh).

The calculation of high-level keys takes into account the observation at a given moment, the previous emotional state and the ambient and social context (e.g. noisy environment, smile too long, user profile).

All emotional key values are associated with a confidence score between 0 and 1 to indicate how likely an estimation is.

Basic Emotions

You can query for a basic emotional reaction over 3 seconds.

The analysis starts when the method ALMood::getEmotionalReaction is called.

An emotional reaction value can be:

  • “positive”
  • “neutral”
  • “negative”
  • “unknown”

Emotional descriptors

Person emotion:

The following descriptors provide mood data about the focused user.

  • Valence: whether the person’s mood is positive or negative.
  • Attention: the amount of attention the focused person gives to the robot.


  • Excitement/Agitation: indicates the activity level of the environment, whether it’s calm or excited.

Perceived stimuli

In its present state, the module reacts to the following stimuli:

Person emotion:

  • Smile degree
  • Facial expressions (neutral, happy, angry, sad)
  • Head attitude (angles), relative to a robot
  • Gaze patterns (evasion, attention, diversion)
  • Utterance accoustic tone
  • Linguistic semantics of speech
  • Sensor touch


  • Energy level of noise
  • Movement detection

Activated during Autonomous Life

When ALAutonomousLife is on, ALMood updates once per second (1Hz). When ALAutonomousLife is off, ALMood data is not available.

This module subscribes to and incorporates data from various extractors:

Getting started

To discover ALMood, download and try the following Choregraphe behavior: sample_get_mood.crg

This sample shows notice the main steps to use ALMood.

  • The robot makes a joke or comment.
  • The robot detects the resulting mood (positive/negative/neutral) of the person during the 3 seconds that follow.


  • Please note that since most of the sources are taken from ALPeoplePerception extractors, the confidence of the emotional extractors will be low if the face is not seen correctly.
  • Please note that the ambiance descriptor is best used when the robot is not speaking.