renderBitmapWithTrans() func use for loop to add up all bitmap buffers instead of __arch64_memcopy, because this func need dealing with trans data, memcopy() will overwrite the trans data.
but it still can use simd inst to add up all bitmap buffers.
will optimize it in next version.