Fresh from the oven: PoseNet based slides remote control

........
Here's something I've build several month ago to help me control my slides flow during a live presentation (instead of using a regular USB presenter).

You can see me using this here, just at the very beginning of the film - https://www.viorelspinu.com/2020/03/speaking-on-serverless-clouds-at.html .

I'm using the PoseNet neural network to detect my gestures (more specific, raising right or left hand), and such the slides on my computer goes forward of backward.

Github for this project is here - https://github.com/viorelspinu/remote-gesture-control

The architecture is below.




All the magic is happening in the browser, with tensorflowjs and PoseNet (the cyan boxes marked as "browser webcam".





The code that handles the human pose recognition and sending the websocket events to the server is here - https://github.com/viorelspinu/remote-gesture-control/tree/master/browser-posenet/browser. camera.js is just slight adaptation of the official PoseNet example. websocket.js takes care of server communication. process_pose.js is the main file that converts the movements to relevant gestures.

You can see a blue / green rectangle top left whenever a significant event (a hand is raise) is happening.

Whenever such an event is happening, events such as "__EVENT__RIGHT", "__EVENT__LEFT", "__EVENT__IDLE" are sent to the server over the websocket.

The server (its code is here - https://github.com/viorelspinu/remote-gesture-control/tree/master/browser-posenet/server) simply broadcasts all those events to all clients currently connected (and also filters the data a bit).

The client that control the slides flow is here - https://github.com/viorelspinu/remote-gesture-control/tree/master/browser-posenet/client. It's just a basic Python script which uses Lomond to listen to the server for significant events and pynput.keyboard to trigger keyboard events (simulates pressing the LEFT or RIGHT arrow keys to move the current slide).

Yes, I'm aware the current architecture only supports one user at a time :) That's perfectly fine for me so far, feel free to adapt the code if you feel like doing that.

You still need to keep the browser window running the PoseNet network in front (or it will not run). On MacOS, a simple way to do that is by using this to allow Chrome window to be really small, and then running both windows in the same maximised space, one next to the other.

Here's a full demo:



How to use this:

A. use a web server for serving the HTML file
      checkout the project (on an machine with a public IP), install npm, go to the /browser-posenet/browser folder and run './start_web_server.sh'; this will open a development webserver on port 1234; (of course you can generate the artifacts first and host just the artifacts in nginx or similar)

B. start the websocket server
            checkout the project (on an machine with a public IP), go to /browser-posenet/server, install the python requirements (pip install -r requirements.txt), then run ./start.sh; this will open a gunicorn server on port 8181

C. start the local client
             on the machine you're going to use for the slides, checkout the project, go to /browser-posenet/client, install the python requirements and run ./start.sh; this will start a websocket client which will trigger arrow keys presses when required
                   start your slides, make sure the slides has the focus

D. open the webpage hosting the poseNet neural network in a browser
                open in Chrome SERVER_IP:1234 (where SERVER_IP is the IP of the server from step A)
                configure the parameters until you're happy with the detections
               the machine you're using at step D does not have to be the same machine that you're going to use for the slides !







Comments