Tuesday, 13 September 2016

Quantity of Motion Analysis of Ensemble Performers

One of my interests when starting my PhD was in motion capture and the analysis of three dimensional time-series data. I was recording a group of singers perform songs and I collected a large body of data. I didn't (and to some extent still don't) really know what I'd like to do with the data, as there are almost too many avenues to pursue at this point, and I needed something to help point me in the right direction. I had a lot of questions like "what's going to be happening in the data?" and "what relationships should I start looking at first?" - this last question was quite daunting as I didn't want to waste hours on something unfruitful.

So I decided to just do some extremely basic computer image analysis to look at the amount of overall movement of a group of performers and then look at what stood out in this very much reduced data set. This didn't involve any fancy 3D motion capture but just a video recording of the performance - which is something the Microsoft Kinect automatically collected.

Quantity of motion has been used in musical gesture analysis on a number of occasions but it has been calculated in various ways. One of the most simple was making use of image subtraction - literally subtracting the grey-scale values of video frames from one another. Do calculate the quantity of motion at frame f you need to look at the previous N frames (motion exists through time) and we'll set N as 30. The summation of subtracting consecutive frames from f-N to f  is essentially what gives your value for frame f. The best way of understanding this might be to look at some Python code* that will calculate this:

*This is not a very efficient way to program, but it explains the logic in the most easy to understand way

import cv2
import numpy as np

class load_file:
    def __init__(self, filename, window_size):
        # Read file
        self.file = cv2.VideoCapture(filename)
        self.window_size = window_size

        # Get file attributes
        self.width      = self.file.get(3)
        self.height     = self.file.get(4)
        self.num_frames = self.file.get(7)

    def get_qom(self, frame):

        # Create a blank image

        img = np.zeros((self.width, self.height), dtype=np.uint8)

        # Iterate over the rest of the window
        for i in range(frame - self.window_size, frame):

            # Set the file to the frame we are looking at

            self.file.set(1, i)

            # Get FrameA and convert to grayscale

            ret, frame = self.file.read()

            A = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

            # Get FrameB and convert to grayscale

            ret, frame = self.file.read()

            B = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

            # Perform image subtraction on the two frames

            dif = cv2.absdiff(B, A)

            # Add the different to

            img = cv2.add(img, dif)

        # Threshold the final image

        r, img = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)

        return img

if __name__ == "__main__":

    # Load the video file
    fn = "path/to/file.wav"
    video = load_file(fn, N=30)

    data = []
    # Iterate over the frames
    for f in range(1, video.num_frames + 1):
        # 'img' is a 2D array that has a visualisation of the quantity of motion
        img = video.get_qom(f)
        # NumPy can count any nonzero values
        val = np.count_nonzero(img)
        # Store the values. The list 'data' can then be plotted over time

The Python program uses some very excellent libraries: OpenCV and NumPy which, if you don't have already, you need to install if you want to program with Python. The output of this script, the Python list called 'data', can be used to plot the quantity of motion over time:

This gave me an idea of where the most movement was occurring - it seemed periodic but not frequent enough to be at the bar level, so I added lines where the  phrase boundaries occurred:

I found that the movement and phrasing of the piece were correlated and gave me a good platform to investigate this further. Please feel free to use the script above, or if you are interested in a much more efficient way of doing it - please get in contact and I'm happy to discuss.

Monday, 2 May 2016

My first Algorave performance

So on Friday I performed at my first ever Algorave; an event where digital artists get together to perform music and create visual spectacles using computer code. The music is created using a form of composition called Live Coding where music is algorithmically programmed. I'd been interested in Live Coding ever since my Masters in Computer Music but found the area-specific language, such as SuperCollider and Tidal, a bit difficult to grasp and musical ideas slow to develop. This prompted me to start development on my own system, FoxDot.

FoxDot is a Python based language that takes an object-oriented approach to Live Coding and makes music by creating Player Objects that are given instructions such as the notes to play and their respective durations. The sounds themselves are synthesised in SuperCollider - for which FoxDot was originally designed as an abstraction.

The music at Algoraves comes in a variety of forms but mainly with the intention to make people dance. I was playing alongside some artists of whom I've watched countless videos and even written essays about, so I was very honoured to do so. I was very nervous as it was the first time I'd used my FoxDot language in a public setting and I think it showed in my performance. I noticed that many performers would stand (I chose to sit) and move rhythmically with the music and even spend some time away from the keyboard. By doing this I think  they not only could take a moment to enjoy the occasion, but also have a think about their next 'move' in terms of their sound. I was typing almost constantly and I think that  had a detrimental effect on the overall performance; the set was varied and had  too many lulls - but I did see some people dancing so I can't be completely disappointed.

Here's the whole event on YouTube (I start around 10 min)!

I'm next performing at the International Conference on Live Interfaces at the end of June at the University of Sussex and have given myself the challenge of including a computer vision aspect to the performance and I can't wait!

Monday, 7 March 2016

TOPLAP Leeds: First public outing of "FoxDot"

Outside of my PhD research into nonverbal communication in musicians I also have a passion for Live Coding music and programming my own music-making system, FoxDot. I'm quite new on the Live Coding "scene" and have been wanting to get more involved for the last year or so but never really plucked up the courage to do something about it. Luckily for me I was asked by the Live Coding pioneer, Dr. Alex McLean, to help set up TOPLAP Leeds, a node extension of the established Live Coding group TOPLAP, along with a few other students.

What this means, I don't really know, but Alex encouraged me to put on a small demonstration of the code I've been working on to get some feedback from my peers. It was the first time I had used FoxDot in a public setting that wasn't YouTube and I was actually really nervous. I started working on the system last year for a module in composition as part of my Computer Music MA and from there it grew - but I had never really thought I would use it to perform. The feedback was really positive and the group gave me some great ideas to put into practice but my PhD comes first and I'm trying to limit the amount of time I spend on FoxDot to only weekday evenings and Sunday - but it's just so fun! 

FoxDot is a pre-processed Python based language that talks to a powerful sound synthesis engine called SuperCollider and let's you create music-playing objects that can work together or independently to make music and I'm really hoping to promote it over the next few years amongst the Live Coding community. It's generally in a working state but I have so many ideas for it that it seems like it's still very much in its infancy. The reason for writing this blog post about it is to try and make it all a bit more real. For so long I've been working on it in a private way; only letting the public see it in brief clips on YouTube - it's time I actually started to put it out there and letting it loose on the world. If you're interested in Live Coding, I urge you to check out http://toplap.org and also my FoxDot website https://sites.google.com/site/foxdotcode/ and see what you can do. I'm performing at the ODI Leeds on Friday 29th April and you can get tickets here.

I hope this is the first step of a long and rewarding journey.

Tuesday, 1 March 2016

How to "iterate" over Kinect files and extract RGB video stream

If you have ever tried to extract data from a Microsoft Kinect Extended Event File (XEF), you may have had some trouble getting your hands on the RGB video stream that it contains, like some of the users in this MSDN thread. As mentioned in some of my previous blog posts, I've found it possible to play an XEF file in Kinect Studio and then record data as it is being played, as if it were a live stream. This makes it relatively easy to collect data on the bodies and their joint positions but difficult with the other types of streams due to the amount of data processing that each frame requires.

Incoming RGB video frames need to be written directly to file otherwise you heavily increase the amount of resources needed to store them in memory and risk crashing your computer but not all machines can perform enough I/O operations to keep up with the 30 fps playback rate of the XEF file, which makes extracting this data very difficult for most users. Being able to extract the RGB video data would be really useful when trying to synchronise the data with audio I am recording from a different source; something that is integral to the data analysis in my PhD studies and hard to achieve with the skeleton data alone. It wasn't until recently where I tried pressing the "step file" button in Kinect Studio to see whether each frame could be individually sent to the Kinect Service and, consequently, my data extraction Python applications. To my surprise: it worked!

Close up of the "step file" button

It seems like it would be possible to manually iterate over the file by clicking the "step file" button but for long files this would be very time consuming (a 5 minute file at 30 fps contains around 9,000 frames). Using the PyAutoGUI module I was able to set up an automated click every 0.5 seconds on the "step file" button, which could be specified by hovering over it and pressing the Return key, and iterate over the file automatically and allow me to extract and store the RGB video data successfully. I tried to implement the automated click to press as soon as the frame was processed but got some Windows Errors and will hopefully fix this in future to make the process faster, but right now it's at least a bit easier!

I am also hoping to find a bit of time to write up a simple README for the application, which is available at my GitHub here: https://github.com/FoxDot/PyKinectXEF

Please feel free to use it and give feedback - bad or good - and I look forward to hearing any suggestions!

- Ryan

Thursday, 11 February 2016

Synchronizing Externally Recorded Audio With Kinect Data

One of the biggest downsides to recording musical performances with the Microsoft Kinect V2 is the lack of a high quality microphone. It does contain six very low quality microphones, though, but when I extracted and accumulated the SubAudioFrame data from the Kinect playback, the results were not pretty (but audible, surprisingly) as you can see...

It is possible to get a more accurate waveform but it requires a hefty amount of noise removal and it's almost as useless as the one you see above. To be able to compare Kinect Data to a sound file, you are going to have to record it from a different source. I decided to try recording a few bars of "Eight Days a Week" by the Beatles with a friend using my smartphone, but any real recording should be performed with a much, much better piece of kit.

To synchronise the audio and the visuals I decided to start recording using both my smartphone and the Kinect and then clearly clap, so that I can line up the onset of the clap sound in the audio file, and the frame in which the hands make contact with each other. Unfortunately, to do this just using Python (what I've been writing my scripts in so far) would be a boat load of work, so I used the Python API for OpenCV and PyGame to make a work around. Instead of playing the frame data back to me using the PyGame package, I was able to save the pixel array as a frame of video and store that. (The code I'm working on will be on my GitHub soon - I just have to make sure there's absolutely no way any recorded data can end up there!)

Once I had my audio track clipped, I can compare the waveform and the recordings from the start, or from any point I choose. Next step is to automate the production of a spectogram that will run underneath (or be overlayed by) a graph that plots the performer's movements. Here is a little mock up using 20 seconds of data from the Beatles song.

You can see just from this graph there are some similarities between each of the lines, and also between the lines and the spectogram (created using Sonic Visualiser) that it's on top of. I'll need to get brushing up on my statistics soon to get more detailed analysis out of these sorts of graphs, but things are looking promising.


OpenCV - http://opencv.org/
Python - https://www.python.org/
PyGame - http://pygame.org/
Sonic Visualiser - http://www.sonicvisualiser.org/
PyKinect2 - https://github.com/Kinect/PyKinect2 (Not mine, but used in my code)
PyKinectXEF - https://github.com/FoxDot/PyKinectXEF (My prototype code)