ALTextToSpeech API

NAOqi Audio - Overview | API | Tutorial


Namespace : AL

#include <alproxies/altexttospeechproxy.h>

Methods

void ALTextToSpeechProxy::disableNotifications()

Deprecated since version 2.0: This method cannot be used anymore. Notifications are now always enabled as they are required.

Disables notifications publishing in ALMemory during the synthesis (disabled by default). Please refer to ALTextToSpeechProxy::enableNotifications() for further informations.

void ALTextToSpeechProxy::enableNotifications()

Deprecated since version 2.0: This method cannot be used anymore. Notifications are now always enabled as they are required.

Enables notifications publishing in ALMemory during the synthesis (disabled by default). Once enabled, the following notifications are generated:

  • ALTextToSpeech/CurrentBookMark: indicates the occurrence of the bookmarks that are placed (using “mrk=number” number being an integer [0 - 65535]) in the string that needs to be synthesized, see Acapela Mobility Text TAGS for further information.
  • ALTextToSpeech/CurrentSentence: indicates the sentence that is currently synthesized.
  • ALTextToSpeech/CurrentWord: indicates the word that is currently synthesized.
  • ALTextToSpeech/PositionOfCurrentWord: indicates the word that is currently synthesized.
  • ALTextToSpeech/TextStarted: indicates if a sentence is currently synthesized.
  • ALTextToSpeech/TextDone: indicates when the current sentence synthesis is done.
std::vector<std::string> ALTextToSpeechProxy::getAvailableLanguages()

Returns the list of the languages currently installed on the system.

Example: [‘French’, ‘Chinese’, ‘English’, ‘German’, ‘Italian’, ‘Japanese’, ‘Korean’, ‘Portuguese’, ‘Spanish’]

Returns:List of installed languages (language names are given in English)

altexttospeech_getavailablelanguages.py

import sys
from naoqi import ALProxy

if (len(sys.argv) < 2):
    print "Usage: 'python texttospeech_getavailablelanguages.py IP [PORT]'"
    sys.exit(1)

IP = sys.argv[1]
PORT = 9559
if (len(sys.argv) > 2):
    PORT = sys.argv[2]
try:
    tts = ALProxy("ALTextToSpeech", IP, PORT)
except Exception,e:
    print "Could not create proxy to ALTextToSpeech"
    print "Error was: ",e
    sys.exit(1)

lang = tts.getAvailableLanguages();
print "Available languages: " + str(lang)
std::vector<std::string> ALTextToSpeechProxy::getAvailableVoices()

Returns the list of the voices currently installed on the system. Each voice is given in English.

Returns:Voices Installed
std::string ALTextToSpeechProxy::getLanguage()

Returns the language currently used by the text to speech engine.

Example: ‘French’

Could be one of the available languages.

For further details, see: ALTextToSpeechProxy::getAvailableLanguages().

Returns:Current language used by the text to speech engine
float ALTextToSpeechProxy::getParameter(const std::string& parameter)

Returns the value of one of the text to speech engine parameters. The available parameters are: “pitchShift”, “doubleVoice”,”doubleVoiceLevel” and “doubleVoiceTimeShift”. Please refers to ALTextToSpeechProxy::setParameter() for details about this parameters.

Parameters:
  • parameter – Name of the parameter
Returns:

Value of the specified parameter

std::vector<std::string> ALTextToSpeechProxy::getSupportedLanguages()

Returns the list of all supported languages.

Example: [‘French’, ‘Chinese’, ‘English’, ‘German’, ‘Italian’, ‘Japanese’, ‘Korean’, ‘Portuguese’, ‘Spanish’]

Returns:List of supported languages (language names are given in English)
import sys
from naoqi import ALProxy

if (len(sys.argv) < 2):
    print "Usage: 'python texttospeech_getsupportedlanguages.py IP [PORT]'"
    sys.exit(1)

IP = sys.argv[1]
PORT = 9559
if (len(sys.argv) > 2):
    PORT = sys.argv[2]
try:
    tts = ALProxy("ALTextToSpeech", IP, PORT)
except Exception,e:
    print "Could not create proxy to ALTextToSpeech"
    print "Error was: ",e
    sys.exit(1)

lang = tts.getSupportedLanguages();
print "Supported languages: " + str(lang)
std::string ALTextToSpeechProxy::getVoice()

Returns the voice currently used by the text to speech engine.

Returns:Name of the current voice
float ALTextToSpeechProxy::getVolume()

Gets the current gain applied to the signal synthesized by the text to speech engine. The default value is 1.0.

Returns:Volume [0 - 1]
void ALTextToSpeechProxy::loadVoicePreference(const std::string& preferencesFileSuffix)

Loads a voice and the related set of voice parameters defined in a XML file contained in the preferences folder. The name of the XML file must be of the form ALTextToSpeech_Voice_preferencesFileSuffix. The official voice in each language is defined in this way. Please refers to Tutorial for further details.

Parameters:
  • preferencesFileSuffix – Name of the voice preference file

altexttospeech_loadvoicepreference.py

import sys
from naoqi import ALProxy

if (len(sys.argv) < 2):
    print "Usage: 'python texttospeech_loadvoicepreference.py IP [PORT]'"
    sys.exit(1)

IP = sys.argv[1]
PORT = 9559
if (len(sys.argv) > 2):
    PORT = sys.argv[2]
try:
    tts = ALProxy("ALTextToSpeech", IP, PORT)
except Exception,e:
    print "Could not create proxy to ALTextToSpeech"
    print "Error was: ",e
    sys.exit(1)
    
# Loads the set of voice parameters contained in the ALTextToSpeech_Voice_NaoOfficialVoiceEnglish.xml file
tts.loadVoicePreference("NaoOfficialVoiceEnglish")

tts.say("Voice preference loaded")
std::string ALTextToSpeechProxy::locale()

Returns the locale associated to the current language set on the robot. The format is xx_XX (examples: en_US, fr_FR, ja_JP, de_DE, ...)

Returns:The current locale associated to the current language.
void ALTextToSpeechProxy::say(const std::string& stringToSay)

Says the specified string of characters.

Uses the language defined using ALTextToSpeechProxy::setLanguage() if any, or the default language defined in the robot’s web page.

Parameters:
  • stringToSay – Text to say, encoded in UTF-8.

altexttospeech_say.py

import sys
from naoqi import ALProxy

if (len(sys.argv) < 2):
    print "Usage: 'python texttospeech_say.py IP [PORT]'"
    sys.exit(1)

IP = sys.argv[1]
PORT = 9559
if (len(sys.argv) > 2):
    PORT = sys.argv[2]
try:
    tts = ALProxy("ALTextToSpeech", IP, PORT)
except Exception,e:
    print "Could not create proxy to ALTextToSpeech"
    print "Error was: ",e
    sys.exit(1)
    
#Says a test std::string
tts.say("This is a sample text!")
void ALTextToSpeechProxy::say(const std::string& stringToSay, const std::string& language)

Says the specified string of characters in the specified language.

Parameters:
  • stringToSay – Text to say, encoded in UTF-8.
  • language – Language (English name).

altexttospeech_altexttospeech_say2.py

import sys
from naoqi import ALProxy

if (len(sys.argv) < 2):
    print "Usage: 'python texttospeech_altexttospeech_say2.py IP [PORT]'"
    sys.exit(1)

IP = sys.argv[1]
PORT = 9559
if (len(sys.argv) > 2):
    PORT = sys.argv[2]
try:
    tts = ALProxy("ALTextToSpeech", IP, PORT)
except Exception,e:
    print "Could not create proxy to ALTextToSpeech"
    print "Error was: ",e
    sys.exit(1)

#Sets the language to english
tts.setLanguage("English")

tts.say("Let me teach you some French words.")
tts.say("In French, we say")
tts.say("voiture", "French")
tts.say("for car")
void ALTextToSpeechProxy::sayToFile(const std::string& stringToSay, const std::string& fileName)

Works similarly to ALTextToSpeechProxy::say() but the synthesized signal is recorded into the specified file instead of being sent to the robot’s loudspeakers. The signal is encoded with a sample rate of 22050Hz (European languages) and 16000Hz (Asian languages), format S16_LE, 1 channel.

Parameters:
  • stringToSay – Text to be synthesized, encoded in UTF-8.
  • fileName – file where the synthesized signal should be recorded (can be either a .raw file or a .wav file).

altexttospeech_saytofile.py

import sys
from naoqi import ALProxy

if (len(sys.argv) < 2):
    print "Usage: 'python texttospeech_saytofile.py IP [PORT]'"
    sys.exit(1)

IP = sys.argv[1]
PORT = 9559
if (len(sys.argv) > 2):
    PORT = sys.argv[2]
try:
    tts = ALProxy("ALTextToSpeech", IP, PORT)
except Exception,e:
    print "Could not create proxy to ALTextToSpeech"
    print "Error was: ",e
    sys.exit(1)

#Says a test std::string, and save it into a file
tts.sayToFile("This is a sample text, written in a file!", "/tmp/sample_text.raw")

#Says a test std::string, and save it into a file
tts.sayToFile("This is another sample text", "/tmp/sample_text.wav")
int ALTextToSpeechProxy::sayToFileAndPlay(const std::string& stringToSay)

Deprecated since version 1.22.

due to technical improvements, there is no point to generate a file and playing it after. If you nevertheless need to do so you can use ALTextToSpeechProxy::sayToFile() and then ALAudioPlayerProxy::playFile().

Works similarly to ALTextToSpeechProxy::sayToFile() but sends also the synthesized signal to the robot’s loudspeakers.

Parameters:
  • stringToSay – Text to say, encoded in UTF-8.
Returns:

Id of the task. Can be used to interrupt it.

void ALTextToSpeechProxy::setLanguage(const std::string& language)

Sets the language currently used by the text to speech system. Each NAOqi restart will however reset that setting to the default language that can be set on the robot’s web page.

Parameters:
void ALTextToSpeechProxy::setLanguageDefaultVoice(const std::string& language, const std::string& voice)

Sets the voice to be used by default with a specified language.

Parameters:
  • language – the language among those available on your robot
  • voice – the voice among those available for this language on your robot
void ALTextToSpeechProxy::setParameter(const std::string& parameter, const float& value)

Sets parameters of the text to speech engine.

Parameters:
  • parameter – Name of the parameter
  • value – Value of the parameter

The available parameters are specific to the speech engine:

— All languages —

Parameters Description
pitchShift

applies a pitch shift to the voice. The value indicates the ratio between the new fundamental frequencies and the original one (examples: 2.0: an octave above, 1.5: a quint above).

Acceptable range is [1.0 - 4]. 0 disables the effect.

doubleVoice

adds a second voice to the first one. The value indicates the ratio between the second voice fundamental frequency and the first one.

Acceptable range is [1.0 - 4]. 0 disables the effect.

doubleVoiceLevel

sets the gain of the additional voice compared to the original one.

Acceptable range is [0 - 4]. 0 disables the effect.

doubleVoiceTimeShift

sets the delay (seconds) between the doubled voice and the original one.

Acceptable range is [0 - 0.5].

— Japanese only —

Parameters Acceptable range
volume [0.00001 - 2.0]
speed [0.5 - 4.0]
pitch [0.5 - 2.0]
emph [0.0 - 2.0]
pauseMiddle [80.0 and 300.0]
pauseLong [300.0 - 2000.0]
pauseSentence [80.0 and 10000.0]

altexttospeech_setparameter.py

import sys
from naoqi import ALProxy

if (len(sys.argv) < 2):
    print "Usage: 'python texttospeech_setparameter.py IP [PORT]'"
    sys.exit(1)

IP = sys.argv[1]
PORT = 9559
if (len(sys.argv) > 2):
    PORT = sys.argv[2]
try:
    tts = ALProxy("ALTextToSpeech", IP, PORT)
except Exception,e:
    print "Could not create proxy to ALTextToSpeech"
    print "Error was: ",e
    sys.exit(1)

#Applies a pitch shifting to the voice
tts.setParameter("pitchShift", 1.5)
#Deactivates double voice
tts.setParameter("doubleVoice", 0.0)

tts.say("Pitch shift and double voice changed")
void ALTextToSpeechProxy::setVoice(const std::string& voiceID)

Changes the voice used by the text-to-speech engine. The voice identifier must belong to the installed voices, that can be listed using the ALTextToSpeechProxy::getAvailableVoices() method.

Parameters:
  • voiceID – Name of the voice

altexttospeech_setvoice.py

import sys
from naoqi import ALProxy

if (len(sys.argv) < 2):
    print "Usage: 'python texttospeech_setvoice.py IP [PORT]'"
    sys.exit(1)

IP = sys.argv[1]
PORT = 9559
if (len(sys.argv) > 2):
    PORT = sys.argv[2]
try:
    tts = ALProxy("ALTextToSpeech", IP, PORT)
except Exception,e:
    print "Could not create proxy to ALTextToSpeech"
    print "Error was: ",e
    sys.exit(1)

#Changes the basic voice of the synthesis
tts.setVoice("Kenny22Enhanced")

tts.say("Voice changed to Kenny")
void ALTextToSpeechProxy::setVolume(const float& volume)

Sets the current gain applied to the signal synthesized by the text to speech engine. The default value is 1.0.

Parameters:
  • volume – Gain

altexttospeech_setvolume.py

import sys
from naoqi import ALProxy

if (len(sys.argv) < 2):
    print "Usage: 'python texttospeech_setvolume.py IP [PORT]'"
    sys.exit(1)

IP = sys.argv[1]
PORT = 9559
if (len(sys.argv) > 2):
    PORT = sys.argv[2]
try:
    tts = ALProxy("ALTextToSpeech", IP, PORT)
except Exception,e:
    print "Could not create proxy to ALTextToSpeech"
    print "Error was: ",e
    sys.exit(1)

#Changes the volume
tts.setVolume(0.5)
tts.say("Volume set to 50%")
void ALTextToSpeechProxy::stopAll()

This method stops the current and all the pending tasks immediately.

std::string ALTextToSpeechProxy::getLanguageEncoding(const std::string& languageName)

Deprecated since version 1.22: due to technical improvements, this method is not useful anymore. Do not use.

Events

Event: "ALTextToSpeech/CurrentBookMark"
callback(std::string eventName, int value, std::string subscriberIdentifier)

Indicates the occurrence of the bookmarks that are placed (using “mrk=number” number being an integer [0 - 65535]) in the string that needs to be synthesized.

For further information, see Acapela Mobility Text TAGS.

Parameters:
  • eventName (std::string) – “ALTextToSpeech/CurrentBookMark”
  • value – Current bookmark.
  • subscriberIdentifier (std::string) –
Event: "ALTextToSpeech/CurrentSentence"
callback(std::string eventName, std::string value, std::string subscriberIdentifier)

Indicates the sentence that is currently synthesized.

Parameters:
  • eventName (std::string) – “ALTextToSpeech/CurrentSentence”
  • value – Current sentence.
  • subscriberIdentifier (std::string) –
Event: "ALTextToSpeech/CurrentWord"
callback(std::string eventName, std::string value, std::string subscriberIdentifier)

Indicates the word that is currently synthesized.

Warning

Not available for Japanese engine.

Parameters:
  • eventName (std::string) – “ALTextToSpeech/CurrentWord”
  • value – Current word.
  • subscriberIdentifier (std::string) –
Event: "ALTextToSpeech/PositionOfCurrentWord"
callback(std::string eventName, int value, std::string subscriberIdentifier)

Indicates the word that is currently synthesized by its index in the current sentence. Index 0 refers to the first word of the sentence.

Warning

Not available for Japanese engine.

Parameters:
  • eventName (std::string) – “ALTextToSpeech/PositionOfCurrentWord”
  • value – Current word position.
  • subscriberIdentifier (std::string) –
Event: "ALTextToSpeech/Status"
callback(std::string eventName, AL::ALValue value, std::string subscriberIdentifier)

Raised when the status of a task changes.

Parameters:
  • eventName (std::string) – “ALTextToSpeech/Status”
  • value

    [idOfConcernedTask, status]

    Where:

    • idOfConcernedTask is the ID of the task concerned by the event.
    • status can be “enqueued”, “started”, “thrown”, “stopped” or “done”.
  • subscriberIdentifier (std::string) –
Event: "ALTextToSpeech/TextStarted"
callback(std::string eventName, bool value, std::string subscriberIdentifier)

Raised when the current sentence synthesis starts.

Parameters:
  • eventName (std::string) – “ALTextToSpeech/TextStarted”
  • value – True if the current speaking task is in progress.
  • subscriberIdentifier (std::string) –
Event: "ALTextToSpeech/TextDone"
callback(std::string eventName, bool value, std::string subscriberIdentifier)

Raised when the current sentence synthesis is done.

Parameters:
  • eventName (std::string) – “ALTextToSpeech/TextDone”
  • value – True if the current speaking task is done.
  • subscriberIdentifier (std::string) –
Event: "ALTextToSpeech/TextInterrupted"
callback(std::string eventName, bool value, std::string subscriberIdentifier)

Raised when the current sentence synthesis is interrupted, for example by ALTextToSpeechProxy::stopAll().

Parameters:
  • eventName (std::string) – “ALTextToSpeech/TextInterrupted”
  • value – True if the current speaking task is interrupted.
  • subscriberIdentifier (std::string) –

Signals

qi::Signal<qi::os::timeval> ALTextToSpeech::synchroTTS

During a dialog session, the robot switches between speaking phase and listening phase. This signal is raised only at the end of the last sentence of the speaking phase to indicate the beginning of the listening phase. Time value contains the remaining sound to be played before the actual end of the speech.