There are a few ways to do this, it works similar to having a lipsync type system...
im guessing just have animations for differnt types of movements,and depending on the different "sounds" (parts int he music int his case) it would play that animation.. now this is probably not hwo they did it, just my little "theory" :P