[Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult

Mon Jul 6 10:13:04 PDT 2015

I'm only really familiar with nouveau, but I think all GPU hardware
works in roughly the same way. Basically you have some way of
inserting "write X to address Y" into the command stream (aka a
"fence"), after which you insert "write X+1 to address Y" and so on.
If you want the CPU to wait on a given fence, you just do "while
(*address < x);". If you have multiple GPU processing queues, you can
usually also insert a "stall this queue until the value at address Y
is at least X" command into the command stream.

DRM uses implicit fences, so it knows which BOs are used for
particular commands. So the flow goes something like "submit bunch of
commands; submit fence write and attach that fence id to the BOs in
the previous bunch of comands". Then to wait for a bo to become ready,
you just wait until the GPU writes the appropriate number to memory
address Y (from above).

The mesa drivers can sometimes use clever tricks that avoid this
sync'ing because it knows exactly how it emits the commands and
perhaps it waits on something related earlier whereby it knows the
other thing will be ready. No idea if that's the case here.

Hope this helps,

  -ilia

On Mon, Jul 6, 2015 at 1:05 PM, Vyacheslav Gonakhchyan
<ytriffy at gmail.com> wrote:
> Ilia, thanks for the gallium link.
> Do you know any links to high level info with broad strokes about how this
> sync works? Frankly I do not know driver terminology and wanted to know more
> about how this sync is performed for my research. I'm using mesa as a
> reference because it has open implementation code. Occlusion query
> functionality probably waits for z-buffer to become ready. Problem is that
> usual synchronization techniques do not apply here. I'm thinking that driver
> code gets notifications about state change. I want to know what kind of
> notifications are available? Can query be performed in parallel with another
> frame being processed or does it need complete GPU pipeline flush?
>
> Thanks,
> Vyacheslav
>
> On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>
>> On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan
>> <ytriffy at gmail.com> wrote:
>> > Hi, everyone.
>> >
>> > Trying to understand method radeonQueryGetResult (more broadly GPU-CPU
>> > sync).
>> >
>> > static void radeonQueryGetResult(struct gl_context *ctx, struct
>> > gl_query_object *q)
>> > {
>> >     struct radeon_query_object *query = (struct radeon_query_object *)q;
>> >         uint32_t *result;
>> >     int i;
>> >
>> >     radeon_print(RADEON_STATE, RADEON_VERBOSE,
>> >             "%s: query id %d, result %d\n",
>> >             __func__, query->Base.Id, (int) query->Base.Result);
>> >
>> >     radeon_bo_map(query->bo, GL_FALSE);
>> >         result = query->bo->ptr;
>> >
>> >     query->Base.Result = 0;
>> >     for (i = 0; i < query->curr_offset/sizeof(uint32_t); ++i) {
>> >         query->Base.Result += LE32_TO_CPU(result[i]);
>> >         radeon_print(RADEON_STATE, RADEON_TRACE, "result[%d] = %d\n", i,
>> > LE32_TO_CPU(result[i]));
>> >     }
>> >
>> >     radeon_bo_unmap(query->bo);
>> > }
>> >
>> > I don't know which part is responsible for blocking behavior (waiting
>> > for
>> > response from GPU). I suspect that radeon_bo_map does this magic.
>> > Can someone point in the right direction?
>>
>> The radeon_bo_map defined in
>> src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic.
>> However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c
>> references the radeon_bo_map in libdrm, which does not appear to wait.
>>
>> FWIW for nouveau, nouveau_bo_map will also implicitly do a
>> nouveau_bo_wait, but that does not appear to be the case for radeon.
>>
>> Cheers,
>>
>>   -ilia
>
>