Connecting Nao the robot to the Google Clould

........

Connecting NAO robot it's been a nice exercise we did for the A.I.B. Meetup August edition.

Vision API + Translate API + Text-to-Speech + NAO

I've used the Google Vision API to label the objects in the photo took by NAO camera, Google Transation API to translate the labels to Romanian and Google Text-toSpeech API to make NAO speak the object names (actually, we cheated a bit on text to speech as Romanian speech was not available, so I had to fake a browser POST request to https://translate.google.com/ in order to generate a mp3 file with the speech).

Here's how it works :

Programming NAO is pretty straighforward, using the Coreographe Suite from Aldebaran. It's nice that every block shows the Python code behind, and there is also a full Python code block.

The flow is pretty simple, touching Nao head triggers a photo, which is posted on vision.googleapis.com as BASE64 encoded picture, the JSON response is processed on NAO, labels extracted and posted on translation.googleapis.com to get Romanian transation, and then a HTTP POST browser call is faked on translation.google.com/translate_tts to get the MP3 file with the voice saying object names.

NAO is instructed to take a photo using the head camera.

And some Python code, self-explanatory. Full code here - https://github.com/viorelspinu/nao

And finally faking the browser POST request :

Google Assistant + NAO

Google Assistant was a bit tricky to setup, mainly due to harware limitations on NAO side. Nao processor just couldn't keep up with sending the audio chuncks fast enough, so I had to prerecord everything and then send the audio file as a whole. The cost was that there was no conversation state management.

The push-to-talk Google Assistant sample code was a great start. To make everything simple, I've used a Docker container on my laptop which exposes a basic Python webserver. The webserver receives an MP3 file, is using ffmpeg to convert the MP3 file to the proper format for Google Assistant (2 channels, 11025 KHz MP3) and then sends to file to the Google Assistant server. The response is received as MP3, and returned to NAO as response to the upload request.

The NAO programming blocks.