SoftBank Robotics documentation What's new in NAOqi 2.8?

Messaging system

DISCLAIMER This document describes the messaging system. I don’t know all the internals of that, so the knowledge described here is very sparse and maybe inaccurate.

High-level view

Sessions

To use the messaging layer, you need a Session. Sessions are connected to each other and make an undirected graph that we call a galaxy of sessions. On a galaxy of sessions, there is a list of services and the service with id 1 is the ServiceDirectory which contains that list.

The ServiceDirectory runs on a session which is the entry point for other sessions to connect. When a session exposes a service, it listens on some other port so that other sessions can connect to it when they need to contact the service.

                  +-------------+
        +---------+  Service    |
        |         |  Directory  |
        |         +------+------+
        |                |
        |                |
        |                |        +-------------+
        |                |        |             |
        |                +--------+ SomeService |
        |                |        |             |
        |                |        +-------------+
        |                |
+-------+------+         |        +--------------+
|   Client     |         +--------+ OtherService |
|   session    +------------------+              |
+--------------+                  +--------------+

There is no real difference between a client session and a service session. In this diagram, it is just that the Client session does not expose any service, but apart from that, it is a normal session. A session can call other services whether it hosts services or not.

Address system

There is a three-part address system in qimessaging used to call functions, trigger events, etc.

Services have a associated service ID. A service can give one or more objects (by returning them). When a service gives an object, this object gets the same service ID but a different object ID.

For example, the service directory is always service 1. If you have a service 2, that service is addressed as 2.1. If that service gives you an object, that object will have ID 2, which means its address is 2.2, the next will have ID 3. The object ID 1 is always reserved for the service itself.

There is a third part on addresses which is the function ID. Every function, event and property has an ID. User functions usually start at ID 100. IDs below 100 are hidden and reserved for qimessaging’s usage.

Exchanging objects

It is possible to exchange objects between a session and a service by returning it from a function call or by giving it as an argument. When doing that, the object will be wrapped by qimessaging objects.

You can see in the diagram below how an object is exposed from the server on the right to the client on the left. Keep in mind that what I call client or server in this section has no real importance as they can be swapped. You can pass objects from service to client or from client to service and you will reach the state described in this diagram.

                      Network
   Client                /       Service
                         \
+------------------+     /    +-----------------+
|                  |     \    |                 |
|  RemoteObject    +----------+  BoundObject    |
|                  |     /    |                 |
+---------^--------+     \    +--------+--------+
          |              /             |
          |              \             |
+---------+--------+     /    +--------v--------+
|                  |     \    |                 |
|   AnyObject      |     /    |   AnyObject     |
|                  |     \    |                 |
+------------------+     /    +-----------------+
                         \

When the object is exposed from the server, it will be wrapped in a BoundObject that will own a reference to the AnyObject to keep it alive.

When the object is received by the client, you get an AnyObject which points to a DynamicObject.

If the same object is returned twice for example, two BoundObjects will be created and two RemoteObjects will be created on the remote side. They will be independent and have different addresses.

However, several calls to qi::Session::service("MyService") on a client session will return shared references to the same RemoteObject. The client session communicates with the service directory only for the first call, and then returns the cached RemoteObject, as long as the service is not replaced.

Lifetime

As said before, BoundObject owns a reference to the AnyObject so that the RemoteObject can still contact it later.

There are three cases which cause that reference to be dropped:

  • When the RemoteObject dies, it will call the hidden method terminate() which will abort all pending calls and ultimately destroy the BoundObject and release the reference.
  • When the connection with the client is lost (legitimately or because of an error), all BoundObjects exposed to it will be destroyed, releasing their reference to the AnyObject.
  • When the service is unregistered, all addresses will become invalid, thus the BoundObjects will also be destroyed, releasing their reference.

Using a RemoteObject asynchronously

If the RemoteObject is a service obtained from the session, like the almotion object in the following example, it is kept alive by the session and the object can be used without caring about lifetime issues.

// return robot's pose with respect to ref
qi::Future<AL::Math::Pose2D> getRobotPose(qi::SessionPtr session,
                                          const AL::Math::Pose2D &refPose)
{
  qi::AnyObject almotion = session->service("ALMotion").value();
  // no need to handle `motion`'s lifetime during the async call,
  // `session` keeps it alive already.
  return almotion.async<void>("getRobotPosition").andThen(
      [refPose](const std::vector<float> &xya)
      {
        refPose.inverse() * AL::Math::Pose2D(xya);
      });
}

However, if the RemoteObject is obtained from a remote call, its lifetime must be specifically handled. This can be done using a continuation, as demonstrated in the following example, where the robotFrame object lifetime is extended for the duration of the computeTransform call. Beware with this pattern however, if the getRobotPose function is called often, and the computeTransform call is long to finish, RemoteObjects might be created faster than they are destroyed, their accumulation leading to loss of performance and responsiveness and an increase in memory consumption.

qi::Future<qi::geomety::TransformTime> getRobotPose(qi::SessionPtr session,
                                                    qi::AnyObject refFrame)
{
  qi::AnyObject actuation = session->service("Actuation").value();
  qi::AnyObject robotFrame = actuation.call<qi::AnyObject>("robotFrame");
  auto computingTf =
      robotFrame.async<qi::geomety::TransformTime>("computeTransform",
                                                   refFrame);
  // capture robotFrame in a no-op continuation in order to ensure it is
  // kept alive until the `computeTransform` remote call is finished.
  // Otherwise, the call would be aborted, and `computingTf`
  // would end up in "RemoteObject destroyed" error, even though the actual
  // remote AnyObject representing the robot frame is kept alive by the
  // Actuation service.
  computingTf.connect(
      [robotFrame](qi::Future<qi::geometry::TransformTime>) {});
  return computingTf;
}

Sending an object through a RemoteObject or BoundObject

RemoteObject and BoundObject both inherit from ObjectHost which is a class that has a list of BoundObject. They both inherit from it because you also need the RemoteObject to hold BoundObject-s when sending an object as an argument.

Since a RemoteObject does not have an ID to refer to (it is not a service), when sending an object through it, it must create an address. The service ID of the address it forges is the one of the service it sends the object to. To avoid having two objects with the same IDs on the service, it starts counting object IDs from 2^31.

I don’t know what happens when object IDs overflow, probably bad things that they only talk about in books.

Lower-level view

Messages

In the end, qimessaging all comes down to the exchange of qi::Message. A message is a structure with different fields:

  • a magic number to identify qimessaging packets
  • an ID and a type which will be detailed later
  • a destination address with Service.Object.Function
  • a variable-size signature to describe the payload of the message
  • a buffer which contains the payload

The last two parts are the only variable part of the message, the rest is part of the fixed-size header.

Messages types and IDs

Messages have different types described in message.hpp. Message IDs are always incremented when sending a new request, for example a call (and maybe other types).

Then the response to that call may be Reply, Error or Canceled. In whichever case, the response will have the same ID as the Call message to associate them together.

Exchanging messages

At the lowest level, we use Boost.Asio to handle the TCP sockets. This is wrapped in the class TcpTransportSocket which inherits from the virtual class TransportSocket. It is possible to implement the messaging over something else by specializing that class, for example to use UNIX sockets, pipes on the file system (why not!), etc.

I don’t know the exact path of messages through the classes, but for a call, first there is a messageReady signal triggered in TransportSocket. There is a class that receives that signal (probably Server) and forwards it to the concerned object by calling its onMessage method. Then the object can handle the call and reply later by doing a send on the socket.

If the message is a Reply, I think the RemoteObject is directly connected to the TransportSocket and handles the messages. There is a MessageDispatcher class somewhere that probably does something.

Middle-level view (serialization)

StreamContext

TcpTransportSocket inherits from StreamContext which, as the name suggests, is a context associated to a stream. It is used to hold various information to know how to serialize things.

For example, when sending an object, we send the whole MetaObject with it. But when getting a service twice, the MetaObject (which is a heavy structure) is still the same and there is no need to send it again. For that, there is a MetaObject cache system and we keep a bit of information in the StreamContext which is “Have we already sent this MetaObject on this stream?”. That way we avoid sending it twice, and the remote side does not expect to receive it a second time.

The other thing it is used for is capabilities. Capabilities are there to support protocol evolutions. For example, when the MetaObject cache system was added, a new capability was added so that we can identify processes using the old protocol and not use that system with them. These capabilities are associated with a remote host, so they are stored in the StreamContext.

Serializing

Serialization is done by the class BinaryEncoder in binarycodec.cpp. That class is a visitor that has access to the StreamContext and visits the type to serialize. It fills-in a buffer with its associated signature so that they can be sent in the Message.

Deserializing

Deserialization is a done in a very similar way. When a message is received, the associated signature is used to instantiate an empty AnyValue with the type interfaces corresponding to it. This is done in Message::value.

Then the BinaryDecoder visitor visits this empty value and fills it in with the data from the buffer.

What has gone bad (and why shared memory is half broken)?

This technique is not the best one (IMHO). Values are default initialized and then updated, and this pattern seems broken to me.

When implementing shared memory, we did not change the signature protocol and we handle shared buffers the same way as normal buffers in signatures. When deserializing, because of that, we need to change the type of the AnyValue while we read the message buffer. It is only at that moment that we know that the message does not contain the buffer but only shared memory coordinates and that we must use the shared memory type interface for the AnyValue.

As it is implemented right now, the function deserialize updates the AnyReference and replaces the type of it.

There is a problem with that when the shared buffer is part of a tuple (StructTypeInterface) which has a static list of subtypes. This type interface is instantiated from the signature of the message, and thus with a normal BufferTypeInterface. When we reach that part when we need to instantiate the shared buffer type interface, it is too late to replace the one in the structure which was created at the beginning of the deserialization.

My solution to this, that I find more elegant and that solves the problem, is to rewrite that part and instantiate the whole tree only once with the correct values (not default initializing and then assigning). This implies writing another visitor to visit the signature directly and not reusing the one on AnyValue. The system would instantiate the type interfaces from the bottom-up.