Reading non-ASCII text¶
Suppose you have your robot configure to speak in French, and you want it to say some sentences from a data file.
Doing so is a bit trickier than it sounds, because you have to take care of the encoding.
Example¶
First, download the following files and put them on the robot, in the same directory:
The coffee_en.txt
file contains the string “I like coffee”, the files
coffee_fr_utf-8.txt
and coffee_fr_latin9.txt
hold its French
translation: J’aime le café, so it’s best if you robot can speak French in
addition to English :)
Let’s have a closer look on the file
#! /usr/bin/env python
# -*- encoding: UTF-8 -*-
"""Example: Non ascii Characters"""
import qi
import argparse
import sys
import codecs
def say_from_file(tts_service, filename, encoding):
with codecs.open(filename, encoding=encoding) as fp:
contents = fp.read()
# warning: print contents won't work
to_say = contents.encode("utf-8")
tts_service.say(to_say)
def main(session):
"""
This example uses non ascii characters.
"""
# Get the service ALTextToSpeech.
tts_service = session.service("ALTextToSpeech")
try :
tts_service.setLanguage('French')
except RuntimeError:
print "No French pronunciation because French language is not installed. Pronunciation will be incorrect."
say_from_file(tts_service, 'coffee_fr_utf-8.txt', 'utf-8')
say_from_file(tts_service, 'coffee_fr_latin9.txt', 'latin9')
tts_service.setLanguage('English')
# the string "I like coffee" is encoded the exact same way in these three
# encodings
say_from_file(tts_service, 'coffee_en.txt', 'ascii')
say_from_file(tts_service, 'coffee_en.txt', 'utf-8')
say_from_file(tts_service, 'coffee_en.txt', 'latin9')
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--ip", type=str, default="127.0.0.1",
help="Robot IP address. On robot or Local Naoqi: use '127.0.0.1'.")
parser.add_argument("--port", type=int, default=9559,
help="Naoqi port number")
args = parser.parse_args()
session = qi.Session()
try:
session.connect("tcp://" + args.ip + ":" + str(args.port))
except RuntimeError:
print ("Can't connect to Naoqi at ip \"" + args.ip + "\" on port " + str(args.port) +".\n"
"Please check your script arguments. Run with -h option for help.")
sys.exit(1)
main(session)
First, notice how we do not use open
but codecs.open
, specifying the
encoding.
Also notice how we decode the result of the read from the file.
The object returned by fp.read
is a unicode
object, and we need to
encode it back to get a str
object encoded in i’UTF-8’, usable the TTS
proxy.
Trying to run print contents
won’t work because Python will try to decode
the string using the current locale of the robot, which is ‘ASCII’, leading to
this error:
Traceback (most recent call last):
File "non_ascii.py", line 22, in <module>
main()
File "non_ascii.py", line 18, in main
say_from_file(filename)
File "non_ascii", line 10, in say_from_file
print contents
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position
13: ordinal not in range(128)
Notice at last that regardless of the file encoding, everything gets encoded to ‘UTF-8’ before being sent to the text-to-speech proxy.
Going further¶
If you are not sure whereas your file is UTF-8 encoded, you can use something like:
with codecs.open(filename, encoding="utf-8") as fp:
try:
contents = fp.read()
except UnicodeDecodeError:
print filename, "is not UTF-8 encoded"
return