Hi, I notice that this library uses rdtsc to get the cycle. According to Pitfalls of TSC usage and implementation in linux kernel, rdtsc may encounter Out-of-Order Execution Issues. I'm not sure if this has been considered before — is this something that should be fixed?