So I decided to just do some extremely basic computer image analysis to look at the amount of overall movement of a group of performers and then look at what stood out in this very much reduced data set. This didn't involve any fancy 3D motion capture but just a video recording of the performance - which is something the Microsoft Kinect automatically collected.
Quantity of motion has been used in musical gesture analysis on a number of occasions but it has been calculated in various ways. One of the most simple was making use of image subtraction - literally subtracting the grey-scale values of video frames from one another. Do calculate the quantity of motion at frame f you need to look at the previous N frames (motion exists through time) and we'll set N as 30. The summation of subtracting consecutive frames from f-N to f is essentially what gives your value for frame f. The best way of understanding this might be to look at some Python code* that will calculate this:
*This is not a very efficient way to program, but it explains the logic in the most easy to understand way
import cv2
import numpy as np
class load_file:
def __init__(self, filename, window_size):
# Read file
self.file = cv2.VideoCapture(filename)
self.window_size = window_size
# Get file attributes
self.width = self.file.get(3)
self.height = self.file.get(4)
self.num_frames = self.file.get(7)
def get_qom(self, frame):
# Create a blank image
img = np.zeros((self.width, self.height), dtype=np.uint8)
# Iterate over the rest of the window
for i in range(frame - self.window_size, frame):
# Set the file to the frame we are looking at
self.file.set(1, i)
# Get FrameA and convert to grayscale
ret, frame = self.file.read()
A = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Get FrameB and convert to grayscale
ret, frame = self.file.read()
B = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Perform image subtraction on the two frames
dif = cv2.absdiff(B, A)
# Add the different to
img = cv2.add(img, dif)
# Threshold the final image
r, img = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)
return img
if __name__ == "__main__":
# Load the video file
fn = "path/to/file.wav"
video = load_file(fn, N=30)
data = []
# Iterate over the frames
for f in range(1, video.num_frames + 1):
# 'img' is a 2D array that has a visualisation of the quantity of motion
img = video.get_qom(f)
# NumPy can count any nonzero values
val = np.count_nonzero(img)
# Store the values. The list 'data' can then be plotted over time
data.append(val)
The Python program uses some very excellent libraries: OpenCV and NumPy which, if you don't have already, you need to install if you want to program with Python. The output of this script, the Python list called 'data', can be used to plot the quantity of motion over time:
This gave me an idea of where the most movement was occurring - it seemed periodic but not frequent enough to be at the bar level, so I added lines where the phrase boundaries occurred:
I found that the movement and phrasing of the piece were correlated and gave me a good platform to investigate this further. Please feel free to use the script above, or if you are interested in a much more efficient way of doing it - please get in contact and I'm happy to discuss.

