<div dir="ltr"><div><div>Ilia, thanks a lot for the info.<br><br></div>So basically if I submit to GPU's command stream:<br></div><div>perform occlusion query,<br></div><div>write X to Y.<br></div><div>I know that query is completed when after reading Y address I get X.<br></div><div><div><br></div>Regards,<br></div>Vyacheslav<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 6, 2015 at 9:13 PM, Ilia Mirkin <span dir="ltr"><<a href="mailto:imirkin@alum.mit.edu" target="_blank">imirkin@alum.mit.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I'm only really familiar with nouveau, but I think all GPU hardware<br> works in roughly the same way. Basically you have some way of<br> inserting "write X to address Y" into the command stream (aka a<br> "fence"), after which you insert "write X+1 to address Y" and so on.<br> If you want the CPU to wait on a given fence, you just do "while<br> (*address < x);". If you have multiple GPU processing queues, you can<br> usually also insert a "stall this queue until the value at address Y<br> is at least X" command into the command stream.<br> <br> DRM uses implicit fences, so it knows which BOs are used for<br> particular commands. So the flow goes something like "submit bunch of<br> commands; submit fence write and attach that fence id to the BOs in<br> the previous bunch of comands". Then to wait for a bo to become ready,<br> you just wait until the GPU writes the appropriate number to memory<br> address Y (from above).<br> <br> The mesa drivers can sometimes use clever tricks that avoid this<br> sync'ing because it knows exactly how it emits the commands and<br> perhaps it waits on something related earlier whereby it knows the<br> other thing will be ready. No idea if that's the case here.<br> <br> Hope this helps,<br> <br> -ilia<br> <div class="HOEnZb"><div class="h5"><br> <br> On Mon, Jul 6, 2015 at 1:05 PM, Vyacheslav Gonakhchyan<br> <<a href="mailto:ytriffy@gmail.com">ytriffy@gmail.com</a>> wrote:<br> > Ilia, thanks for the gallium link.<br> > Do you know any links to high level info with broad strokes about how this<br> > sync works? Frankly I do not know driver terminology and wanted to know more<br> > about how this sync is performed for my research. I'm using mesa as a<br> > reference because it has open implementation code. Occlusion query<br> > functionality probably waits for z-buffer to become ready. Problem is that<br> > usual synchronization techniques do not apply here. I'm thinking that driver<br> > code gets notifications about state change. I want to know what kind of<br> > notifications are available? Can query be performed in parallel with another<br> > frame being processed or does it need complete GPU pipeline flush?<br> ><br> > Thanks,<br> > Vyacheslav<br> ><br> > On Mon, Jul 6, 2015 at 8:32 PM, Ilia Mirkin <<a href="mailto:imirkin@alum.mit.edu">imirkin@alum.mit.edu</a>> wrote:<br> >><br> >> On Mon, Jul 6, 2015 at 11:29 AM, Vyacheslav Gonakhchyan<br> >> <<a href="mailto:ytriffy@gmail.com">ytriffy@gmail.com</a>> wrote:<br> >> > Hi, everyone.<br> >> ><br> >> > Trying to understand method radeonQueryGetResult (more broadly GPU-CPU<br> >> > sync).<br> >> ><br> >> > static void radeonQueryGetResult(struct gl_context *ctx, struct<br> >> > gl_query_object *q)<br> >> > {<br> >> > struct radeon_query_object *query = (struct radeon_query_object *)q;<br> >> > uint32_t *result;<br> >> > int i;<br> >> ><br> >> > radeon_print(RADEON_STATE, RADEON_VERBOSE,<br> >> > "%s: query id %d, result %d\n",<br> >> > __func__, query->Base.Id, (int) query->Base.Result);<br> >> ><br> >> > radeon_bo_map(query->bo, GL_FALSE);<br> >> > result = query->bo->ptr;<br> >> ><br> >> > query->Base.Result = 0;<br> >> > for (i = 0; i < query->curr_offset/sizeof(uint32_t); ++i) {<br> >> > query->Base.Result += LE32_TO_CPU(result[i]);<br> >> > radeon_print(RADEON_STATE, RADEON_TRACE, "result[%d] = %d\n", i,<br> >> > LE32_TO_CPU(result[i]));<br> >> > }<br> >> ><br> >> > radeon_bo_unmap(query->bo);<br> >> > }<br> >> ><br> >> > I don't know which part is responsible for blocking behavior (waiting<br> >> > for<br> >> > response from GPU). I suspect that radeon_bo_map does this magic.<br> >> > Can someone point in the right direction?<br> >><br> >> The radeon_bo_map defined in<br> >> src/gallium/winsys/radeon/drm/radeon_drm_bo.c indeed has this magic.<br> >> However the code in src/mesa/drivers/dri/radeon/radeon_queryobj.c<br> >> references the radeon_bo_map in libdrm, which does not appear to wait.<br> >><br> >> FWIW for nouveau, nouveau_bo_map will also implicitly do a<br> >> nouveau_bo_wait, but that does not appear to be the case for radeon.<br> >><br> >> Cheers,<br> >><br> >> -ilia<br> ><br> ><br> </div></div></blockquote></div><br></div>