One of the biggest downsides to recording musical performances with the Microsoft Kinect V2 is the lack of a high quality microphone. It does contain six very low quality microphones, though, but when I extracted and accumulated the SubAudioFrame data from the Kinect playback, the results were not pretty (but audible, surprisingly) as you can see...
It is possible to get a more accurate waveform but it requires a hefty amount of noise removal and it's almost as useless as the one you see above. To be able to compare Kinect Data to a sound file, you are going to have to record it from a different source. I decided to try recording a few bars of "Eight Days a Week" by the Beatles with a friend using my smartphone, but any real recording should be performed with a much, much better piece of kit.
To synchronise the audio and the visuals I decided to start recording using both my smartphone and the Kinect and then clearly clap, so that I can line up the onset of the clap sound in the audio file, and the frame in which the hands make contact with each other. Unfortunately, to do this just using Python (what I've been writing my scripts in so far) would be a boat load of work, so I used the Python API for OpenCV and PyGame to make a work around. Instead of playing the frame data back to me using the PyGame package, I was able to save the pixel array as a frame of video and store that. (The code I'm working on will be on my GitHub soon - I just have to make sure there's absolutely no way any recorded data can end up there!)
Once I had my audio track clipped, I can compare the waveform and the recordings from the start, or from any point I choose. Next step is to automate the production of a spectogram that will run underneath (or be overlayed by) a graph that plots the performer's movements. Here is a little mock up using 20 seconds of data from the Beatles song.
You can see just from this graph there are some similarities between each of the lines, and also between the lines and the spectogram (created using Sonic Visualiser) that it's on top of. I'll need to get brushing up on my statistics soon to get more detailed analysis out of these sorts of graphs, but things are looking promising.
OpenCV - http://opencv.org/
Python - https://www.python.org/
PyGame - http://pygame.org/
Sonic Visualiser - http://www.sonicvisualiser.org/
PyKinect2 - https://github.com/Kinect/PyKinect2 (Not mine, but used in my code)
PyKinectXEF - https://github.com/FoxDot/PyKinectXEF (My prototype code)