[Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult

Mon Jul 6 10:54:22 PDT 2015

That's right. Except really what might have happend was

occl query;
write X;
more drawing;
write X+1;

and then on the CPU, you see X+1. So the tests are always for >= X.
And if you have more than 2^32 submits, you cry, because I'm *sure*
that nothing implements wraparound properly :)

On Mon, Jul 6, 2015 at 1:45 PM, Vyacheslav Gonakhchyan
<ytriffy at gmail.com> wrote:
> Ilia, thanks a lot for the info.
>
> So basically if I submit to GPU's command stream:
> perform occlusion query,
> write X to Y.
> I know that query is completed when after reading Y address I get X.
>
> Regards,
> Vyacheslav
>
> On Mon, Jul 6, 2015 at 9:13 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>
>> I'm only really familiar with nouveau, but I think all GPU hardware
>> works in roughly the same way. Basically you have some way of
>> inserting "write X to address Y" into the command stream (aka a
>> "fence"), after which you insert "write X+1 to address Y" and so on.
>> If you want the CPU to wait on a given fence, you just do "while
>> (*address < x);". If you have multiple GPU processing queues, you can
>> usually also insert a "stall this queue until the value at address Y
>> is at least X" command into the command stream.
>>
>> DRM uses implicit fences, so it knows which BOs are used for
>> particular commands. So the flow goes something like "submit bunch of
>> commands; submit fence write and attach that fence id to the BOs in
>> the previous bunch of comands". Then to wait for a bo to become ready,
>> you just wait until the GPU writes the appropriate number to memory
>> address Y (from above).
>>
>> The mesa drivers can sometimes use clever tricks that avoid this
>> sync'ing because it knows exactly how it emits the commands and
>> perhaps it waits on something related earlier whereby it knows the
>> other thing will be ready. No idea if that's the case here.
>>
>> Hope this helps,
>>
>>   -ilia
>>
>>
>> On Mon, Jul 6, 2015 at 1:05 PM, Vyacheslav Gonakhchyan
>> <ytriffy at gmail.com> wrote:
>> > Ilia, thanks for the gallium link.
>> > Do you know any links to high level info with broad strokes about how
>> > this
>> > sync works? Frankly I do not know driver terminology and wanted to know
>> > more
>> > about how this sync is performed for my research. I'm using mesa as a
>> > reference because it has open implementation code. Occlusion query
>> > functionality probably waits for z-buffer to become ready. Problem is
>> > that
>> > usual synchronization techniques do not apply here. I'm thinking that
>> > driver
>> > code gets notifications about state change. I want to know what kind of
>> > notifications are available? Can query be performed in parallel with
>> > another
>> > frame being processed or does it need complete GPU pipeline flush?
>> >
>> > Thanks,
>> > Vyacheslav
>> >
>> > On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin <imirkin at alum.mit.edu>
>> > wrote:
>> >>
>> >> On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan
>> >> <ytriffy at gmail.com> wrote:
>> >> > Hi, everyone.
>> >> >
>> >> > Trying to understand method radeonQueryGetResult (more broadly
>> >> > GPU-CPU
>> >> > sync).
>> >> >
>> >> > static void radeonQueryGetResult(struct gl_context *ctx, struct
>> >> > gl_query_object *q)
>> >> > {
>> >> >     struct radeon_query_object *query = (struct radeon_query_object
>> >> > *)q;
>> >> >         uint32_t *result;
>> >> >     int i;
>> >> >
>> >> >     radeon_print(RADEON_STATE, RADEON_VERBOSE,
>> >> >             "%s: query id %d, result %d\n",
>> >> >             __func__, query->Base.Id, (int) query->Base.Result);
>> >> >
>> >> >     radeon_bo_map(query->bo, GL_FALSE);
>> >> >         result = query->bo->ptr;
>> >> >
>> >> >     query->Base.Result = 0;
>> >> >     for (i = 0; i < query->curr_offset/sizeof(uint32_t); ++i) {
>> >> >         query->Base.Result += LE32_TO_CPU(result[i]);
>> >> >         radeon_print(RADEON_STATE, RADEON_TRACE, "result[%d] = %d\n",
>> >> > i,
>> >> > LE32_TO_CPU(result[i]));
>> >> >     }
>> >> >
>> >> >     radeon_bo_unmap(query->bo);
>> >> > }
>> >> >
>> >> > I don't know which part is responsible for blocking behavior (waiting
>> >> > for
>> >> > response from GPU). I suspect that radeon_bo_map does this magic.
>> >> > Can someone point in the right direction?
>> >>
>> >> The radeon_bo_map defined in
>> >> src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic.
>> >> However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c
>> >> references the radeon_bo_map in libdrm, which does not appear to wait.
>> >>
>> >> FWIW for nouveau, nouveau_bo_map will also implicitly do a
>> >> nouveau_bo_wait, but that does not appear to be the case for radeon.
>> >>
>> >> Cheers,
>> >>
>> >>   -ilia
>> >
>> >
>
>