Programophone: Kinect

Tuesday, 1 March 2016

How to "iterate" over Kinect files and extract RGB video stream

If you have ever tried to extract data from a Microsoft Kinect Extended Event File (XEF), you may have had some trouble getting your hands on the RGB video stream that it contains, like some of the users in this MSDN thread. As mentioned in some of my previous blog posts, I've found it possible to play an XEF file in Kinect Studio and then record data as it is being played, as if it were a live stream. This makes it relatively easy to collect data on the bodies and their joint positions but difficult with the other types of streams due to the amount of data processing that each frame requires.

Incoming RGB video frames need to be written directly to file otherwise you heavily increase the amount of resources needed to store them in memory and risk crashing your computer but not all machines can perform enough I/O operations to keep up with the 30 fps playback rate of the XEF file, which makes extracting this data very difficult for most users. Being able to extract the RGB video data would be really useful when trying to synchronise the data with audio I am recording from a different source; something that is integral to the data analysis in my PhD studies and hard to achieve with the skeleton data alone. It wasn't until recently where I tried pressing the "step file" button in Kinect Studio to see whether each frame could be individually sent to the Kinect Service and, consequently, my data extraction Python applications. To my surprise: it worked!

Close up of the "step file" button

It seems like it would be possible to manually iterate over the file by clicking the "step file" button but for long files this would be very time consuming (a 5 minute file at 30 fps contains around 9,000 frames). Using the PyAutoGUI module I was able to set up an automated click every 0.5 seconds on the "step file" button, which could be specified by hovering over it and pressing the Return key, and iterate over the file automatically and allow me to extract and store the RGB video data successfully. I tried to implement the automated click to press as soon as the frame was processed but got some Windows Errors and will hopefully fix this in future to make the process faster, but right now it's at least a bit easier!

I am also hoping to find a bit of time to write up a simple README for the application, which is available at my GitHub here: https://github.com/FoxDot/PyKinectXEF

Please feel free to use it and give feedback - bad or good - and I look forward to hearing any suggestions!

- Ryan

Thursday, 11 February 2016

Synchronizing Externally Recorded Audio With Kinect Data

One of the biggest downsides to recording musical performances with the Microsoft Kinect V2 is the lack of a high quality microphone. It does contain six very low quality microphones, though, but when I extracted and accumulated the SubAudioFrame data from the Kinect playback, the results were not pretty (but audible, surprisingly) as you can see...

It is possible to get a more accurate waveform but it requires a hefty amount of noise removal and it's almost as useless as the one you see above. To be able to compare Kinect Data to a sound file, you are going to have to record it from a different source. I decided to try recording a few bars of "Eight Days a Week" by the Beatles with a friend using my smartphone, but any real recording should be performed with a much, much better piece of kit.

To synchronise the audio and the visuals I decided to start recording using both my smartphone and the Kinect and then clearly clap, so that I can line up the onset of the clap sound in the audio file, and the frame in which the hands make contact with each other. Unfortunately, to do this just using Python (what I've been writing my scripts in so far) would be a boat load of work, so I used the Python API for OpenCV and PyGame to make a work around. Instead of playing the frame data back to me using the PyGame package, I was able to save the pixel array as a frame of video and store that. (The code I'm working on will be on my GitHub soon - I just have to make sure there's absolutely no way any recorded data can end up there!)

Once I had my audio track clipped, I can compare the waveform and the recordings from the start, or from any point I choose. Next step is to automate the production of a spectogram that will run underneath (or be overlayed by) a graph that plots the performer's movements. Here is a little mock up using 20 seconds of data from the Beatles song.

You can see just from this graph there are some similarities between each of the lines, and also between the lines and the spectogram (created using Sonic Visualiser) that it's on top of. I'll need to get brushing up on my statistics soon to get more detailed analysis out of these sorts of graphs, but things are looking promising.

Links:

OpenCV - http://opencv.org/
Python - https://www.python.org/
PyGame - http://pygame.org/
Sonic Visualiser - http://www.sonicvisualiser.org/
PyKinect2 - https://github.com/Kinect/PyKinect2 (Not mine, but used in my code)
PyKinectXEF - https://github.com/FoxDot/PyKinectXEF (My prototype code)