Tuesday, 13 September 2016

Quantity of Motion Analysis of Ensemble Performers

One of my interests when starting my PhD was in motion capture and the analysis of three dimensional time-series data. I was recording a group of singers perform songs and I collected a large body of data. I didn't (and to some extent still don't) really know what I'd like to do with the data, as there are almost too many avenues to pursue at this point, and I needed something to help point me in the right direction. I had a lot of questions like "what's going to be happening in the data?" and "what relationships should I start looking at first?" - this last question was quite daunting as I didn't want to waste hours on something unfruitful.

So I decided to just do some extremely basic computer image analysis to look at the amount of overall movement of a group of performers and then look at what stood out in this very much reduced data set. This didn't involve any fancy 3D motion capture but just a video recording of the performance - which is something the Microsoft Kinect automatically collected.

Quantity of motion has been used in musical gesture analysis on a number of occasions but it has been calculated in various ways. One of the most simple was making use of image subtraction - literally subtracting the grey-scale values of video frames from one another. Do calculate the quantity of motion at frame f you need to look at the previous N frames (motion exists through time) and we'll set N as 30. The summation of subtracting consecutive frames from f-N to f  is essentially what gives your value for frame f. The best way of understanding this might be to look at some Python code* that will calculate this:

*This is not a very efficient way to program, but it explains the logic in the most easy to understand way


import cv2
import numpy as np

class load_file:
    def __init__(self, filename, window_size):
        # Read file
        self.file = cv2.VideoCapture(filename)
        self.window_size = window_size

        # Get file attributes
        self.width      = self.file.get(3)
        self.height     = self.file.get(4)
        self.num_frames = self.file.get(7)

    def get_qom(self, frame):

        # Create a blank image

        img = np.zeros((self.width, self.height), dtype=np.uint8)

        # Iterate over the rest of the window
        
        for i in range(frame - self.window_size, frame):

            # Set the file to the frame we are looking at

            self.file.set(1, i)

            # Get FrameA and convert to grayscale

            ret, frame = self.file.read()

            A = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

            # Get FrameB and convert to grayscale

            ret, frame = self.file.read()

            B = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

            # Perform image subtraction on the two frames

            dif = cv2.absdiff(B, A)

            # Add the different to

            img = cv2.add(img, dif)

        # Threshold the final image

        r, img = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)

        return img

if __name__ == "__main__":

    # Load the video file
    fn = "path/to/file.wav"
    video = load_file(fn, N=30)

    data = []
    # Iterate over the frames
    for f in range(1, video.num_frames + 1):
        # 'img' is a 2D array that has a visualisation of the quantity of motion
        img = video.get_qom(f)
        # NumPy can count any nonzero values
        val = np.count_nonzero(img)
        # Store the values. The list 'data' can then be plotted over time
        data.append(val)


The Python program uses some very excellent libraries: OpenCV and NumPy which, if you don't have already, you need to install if you want to program with Python. The output of this script, the Python list called 'data', can be used to plot the quantity of motion over time:


This gave me an idea of where the most movement was occurring - it seemed periodic but not frequent enough to be at the bar level, so I added lines where the  phrase boundaries occurred:


I found that the movement and phrasing of the piece were correlated and gave me a good platform to investigate this further. Please feel free to use the script above, or if you are interested in a much more efficient way of doing it - please get in contact and I'm happy to discuss.