Hello,
In your paper, you explain that you chunk the inputs in chunks of 512 notes to feed it to the model. (I'm not sure that 512-chunking part of the code has been released yet, has it?)
As I understand, from the performance MIDI, you take 512 consecutive notes (unlikely to get 2 simultaneous notes, so no ambiguity), which makes sense. However, how do you get the corresponding notes from the XML file? I get that the performance and the XML are beat-aligned, but I'm not sure how you deal with the (likely partially filled) first and last beats of the XML chunk.