Skip to content

Conversation

@kisg
Copy link
Contributor

@kisg kisg commented Sep 5, 2022

The current 2D rendering algorithm assumes that the OpenGL driver
supports buffer orphaning. However, this is not always the case, e.g.
the Oculus Quest 2 does not.

This change adds an option to switch to adaptive multibuffer rendering
where a separate buffer is used for each rendering batch to avoid
implicit synchronization. The number of buffers is set adaptively based
on the number of batches required for rendering the previous frame.

Companion proposal: godotengine/godot-proposals#5348

The current 2D rendering algorithm assumes that the OpenGL driver
supports buffer orphaning. However, this is not always the case, e.g.
the Oculus Quest 2 does not.

This change adds an option to switch to adaptive multibuffer rendering
where a separate buffer is used for each rendering batch to avoid
implicit synchronization. The number of buffers is set adaptively based
on the number of batches required for rendering the previous frame.
// pre fill index buffer, the indices never need to change so can be static
glGenBuffers(1, &bdata.gl_index_buffer);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, bdata.gl_index_buffer);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, index_buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to do this when using multiple buffers I think. The index buffer can be reused, only the vertex buffer needs multiple versions.

}

bool multiple_buffer_batching = GLOBAL_GET("rendering/batching/options/multiple_buffer_batching");
if (multiple_buffer_batching) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GLOBAL_GET should not be called every frame, you can load in constructor / initialize and keep a member variable.

hysteresis = (float)new_size;
}
resize(new_size);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there hysteresis in GLES3 but not in GLES2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will add hysteresis for GLES2 as well.

@lawnjelly
Copy link
Member

Overall imo it's not bad at all in terms of implementation, but it would be good to test it with some input data that won't batch.

If you feed it a pathological situation with e.g. lights / custom shaders / text it could potentially create thousands of buffers, and go horribly wrong. This would be well worth trying out, and it may be worth having an upper limit to number of buffers that can be created, just in case. To be fair, afaik orphaning also effectively creates thousands of buffers in play but it may not be the same implementation wise in the driver etc.

That said once tidied up it could potentially be used as a stepping stone to a two-pass implementation as I described in the proposal. 🙂

If we can get the other version working, it would likely effectively replace this (as the less the buffer uploads, the better), and we could probably remove this version. That said, I wouldn't necessarily want to hold this PR up though, because it's difficult to give timescales on the 2 pass approach as I have a lot of other work on currently.

Often the official policy is not to merge stuff if it is likely to be replaced but personally I don't have a problem with potential stop gaps, especially if they are easy to untangle as this is. These are all imo of course, I'm sure @akien-mga and @clayjohn will have opinions on whether a good idea. 😁

@kisg
Copy link
Contributor Author

kisg commented Sep 6, 2022

@lawnjelly Thank you very much for your review and your proposal in the proposal :) . We will definitely add an upper bound to the number of buffers to this PR and will try to collect some hard data with different test cases. Is there a test suite / test projects that you usually use for testing 2D drawing performance?

@clayjohn
Copy link
Member

clayjohn commented Sep 7, 2022

In 4.0 I have tried two different approaches (mind you we do things a little different in 4.0 and barely use any vertex buffers, most data is passed in packed UBOs). The first approach was to give each batch its own UBO from an ever increasing pool of UBOs. I used a fence to check whether a UBO was still in use, If I came back around to the beginning of my circular buffer of UBOs and the last UBO was still in use I would allocate a new one and insert it. In the end I never had more than a few hundred UBOs at once. This approach worked really well on newer hardware, but fell short on older hardware and mobile.

The approach I have tested, but still haven't merged is to allocate a few giant UBOs (at least one per frame), then run through draw commands until I have enough data to fill the UBO (or I run out of commands) and I record batch index start and end positions within the UBO. Then, fill the UBO and render the batches. If I filled that UBO, I move to another large UBO rather than orphaning and starting again. This is essentially what @lawnjelly describes as the "two-pass" approach. I found it was about 2-3 times as fast as the above approach on lower-end hardware and just as fast on newer hardware.

For the 3.x branch, I think the gains will be similar despite the fact that we would be allocating a large VBO instead of a UBO

@lawnjelly lawnjelly modified the milestones: 3.6, 3.7 Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants