[Mesa-dev] Fwd: GPU-CPU sync during radeonQueryGetResult

Mon Jul 6 14:17:20 PDT 2015

Am 06.07.2015 um 19:54 schrieb Ilia Mirkin:
> That's right. Except really what might have happend was
> 
> occl query;
> write X;
> more drawing;
> write X+1;
> 
> and then on the CPU, you see X+1. So the tests are always for >= X.
> And if you have more than 2^32 submits, you cry, because I'm *sure*
> that nothing implements wraparound properly :)
That's why 64bit counters are used right? :-).

Roland

> 
> On Mon, Jul 6, 2015 at 1:45 PM, Vyacheslav Gonakhchyan
> <ytriffy at gmail.com> wrote:
>> Ilia, thanks a lot for the info.
>>
>> So basically if I submit to GPU's command stream:
>> perform occlusion query,
>> write X to Y.
>> I know that query is completed when after reading Y address I get X.
>>
>> Regards,
>> Vyacheslav
>>
>> On Mon, Jul 6, 2015 at 9:13 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>
>>> I'm only really familiar with nouveau, but I think all GPU hardware
>>> works in roughly the same way. Basically you have some way of
>>> inserting "write X to address Y" into the command stream (aka a
>>> "fence"), after which you insert "write X+1 to address Y" and so on.
>>> If you want the CPU to wait on a given fence, you just do "while
>>> (*address < x);". If you have multiple GPU processing queues, you can
>>> usually also insert a "stall this queue until the value at address Y
>>> is at least X" command into the command stream.
>>>
>>> DRM uses implicit fences, so it knows which BOs are used for
>>> particular commands. So the flow goes something like "submit bunch of
>>> commands; submit fence write and attach that fence id to the BOs in
>>> the previous bunch of comands". Then to wait for a bo to become ready,
>>> you just wait until the GPU writes the appropriate number to memory
>>> address Y (from above).
>>>
>>> The mesa drivers can sometimes use clever tricks that avoid this
>>> sync'ing because it knows exactly how it emits the commands and
>>> perhaps it waits on something related earlier whereby it knows the
>>> other thing will be ready. No idea if that's the case here.
>>>
>>> Hope this helps,
>>>
>>>   -ilia
>>>
>>>
>>> On Mon, Jul 6, 2015 at 1:05 PM, Vyacheslav Gonakhchyan
>>> <ytriffy at gmail.com> wrote:
>>>> Ilia, thanks for the gallium link.
>>>> Do you know any links to high level info with broad strokes about how
>>>> this
>>>> sync works? Frankly I do not know driver terminology and wanted to know
>>>> more
>>>> about how this sync is performed for my research. I'm using mesa as a
>>>> reference because it has open implementation code. Occlusion query
>>>> functionality probably waits for z-buffer to become ready. Problem is
>>>> that
>>>> usual synchronization techniques do not apply here. I'm thinking that
>>>> driver
>>>> code gets notifications about state change. I want to know what kind of
>>>> notifications are available? Can query be performed in parallel with
>>>> another
>>>> frame being processed or does it need complete GPU pipeline flush?
>>>>
>>>> Thanks,
>>>> Vyacheslav
>>>>
>>>> On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin <imirkin at alum.mit.edu>
>>>> wrote:
>>>>>
>>>>> On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan
>>>>> <ytriffy at gmail.com> wrote:
>>>>>> Hi, everyone.
>>>>>>
>>>>>> Trying to understand method radeonQueryGetResult (more broadly
>>>>>> GPU-CPU
>>>>>> sync).
>>>>>>
>>>>>> static void radeonQueryGetResult(struct gl_context *ctx, struct
>>>>>> gl_query_object *q)
>>>>>> {
>>>>>>     struct radeon_query_object *query = (struct radeon_query_object
>>>>>> *)q;
>>>>>>         uint32_t *result;
>>>>>>     int i;
>>>>>>
>>>>>>     radeon_print(RADEON_STATE, RADEON_VERBOSE,
>>>>>>             "%s: query id %d, result %d\n",
>>>>>>             __func__, query->Base.Id, (int) query->Base.Result);
>>>>>>
>>>>>>     radeon_bo_map(query->bo, GL_FALSE);
>>>>>>         result = query->bo->ptr;
>>>>>>
>>>>>>     query->Base.Result = 0;
>>>>>>     for (i = 0; i < query->curr_offset/sizeof(uint32_t); ++i) {
>>>>>>         query->Base.Result += LE32_TO_CPU(result[i]);
>>>>>>         radeon_print(RADEON_STATE, RADEON_TRACE, "result[%d] = %d\n",
>>>>>> i,
>>>>>> LE32_TO_CPU(result[i]));
>>>>>>     }
>>>>>>
>>>>>>     radeon_bo_unmap(query->bo);
>>>>>> }
>>>>>>
>>>>>> I don't know which part is responsible for blocking behavior (waiting
>>>>>> for
>>>>>> response from GPU). I suspect that radeon_bo_map does this magic.
>>>>>> Can someone point in the right direction?
>>>>>
>>>>> The radeon_bo_map defined in
>>>>> src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic.
>>>>> However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c
>>>>> references the radeon_bo_map in libdrm, which does not appear to wait.
>>>>>
>>>>> FWIW for nouveau, nouveau_bo_map will also implicitly do a
>>>>> nouveau_bo_wait, but that does not appear to be the case for radeon.
>>>>>
>>>>> Cheers,
>>>>>
>>>>>   -ilia
>>>>
>>>>
>>
>>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=Vjtt0vs_iqoI31UfJxBl7yv9I2FeiaeAYgMTLKRBc_I&m=WLJtDrNOS1MO4md0Q2dXG1RDHVFkoqdi6-ZeojTw0l8&s=ZgIqDkkPvOUZvUp4VGqWC8rnvcv-tBNOIB6Dqpkh2uU&e= 
>