pWaitDstStageMask and sync-obj impl.

Thu May 16 12:40:21 UTC 2024

Hello,

I was trying to see how Mesa on Linux (intel_hasvk specifically) manages
waiting on a semaphore at the beginning of a particular stage. The
example tried was from vkcube (Vulkan-Tools cube.c), where the
VkQueueSubmit is requested to wait on a semaphore at the
VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT stage.

At least on my system, the VkSemaphore is found to have been
implemented through a drm-syncobj.

The vkspec says:
"Implementations may not support synchronization at every pipeline
stage for every synchronization operation...... If a pipeline stage that
an implementation does not support synchronization for appears in a
destination stage mask, it may substitute any logically earlier stage in
its place for the second synchronization scope."

>From the behaviour of the QueueSubmit implementation, it seems that
the common vulkan runtime provided by Mesa, with the help from the
kernel drm-syncobj framework, chooses to wait right at the beginning
before submitting any work to the GPU, even though the
dst-stage-mask allows the GPU to proceed with the commands until it
reaches the color-attach-output stage.

Although this behaviour of waiting right at the beginning is allowed
according to the spec snippet pasted above, is it correct to assume that
Mesa ignores the dst-stage-mask in such cases?

In general, do GPUs have such stages in hardware, and such wait/signal
implemented in hardware?

Do GPUs have any mechanism to prevent its fragment shader from
running until reaching a particular state: for e.g. a state where all of
the geometry has been processed and only the fragment shader and later
stages are to be processed; or a stage where some of the geometry is
processed, as limited by the sizes of internal caches that store the
intermediate result of the geometry processing?

The fragment shader might be forced to wait on some global memory
variable until it signals that it is safe for the shader to export the
pixels, though this might be very inefficient use of the GPU, not to
mention additional sync overhead.

For GPUs that support tile-based rendering and that does not proceed
with tile-rendering (of the current frame) before the tile-binning for
*all* tiles (for the current frame) is complete, the GPU can possibly
move ahead with the tile-binning phase, and only wait after the binning
is complete.

Thank you,
Amol