[virglrenderer-devel] renderer: fix memory corruption when using glBufferSubData

Thu Jun 14 04:26:47 UTC 2018

On 13 June 2018 at 01:57, Gert Wollny <gert.wollny at collabora.com> wrote:
> Am Montag, den 11.06.2018, 11:47 -0400 schrieb Ramin Azarmehr:
>> Bug: if in vrend_renderer.c the "use_sub_data" is set to 1 (default
>> is 0) to use glBufferSubData() for transferring IOVs, some memory
>> corruptions and artifacts occur.
>>
>> Reason: the second parameter in glBufferSubData() is the offset, but
>> in vrend_read_from_iovec_cb() function in iov.c, the "count" is
>> passed to it causing to possibly write beyond the buffer boundary (or
>> at wrong offset). The following patch fixes the problem. Also, I've
>> made simple changes to other functions in iov.c to make the code
>> consistent.
> It would probaly have been better to separate the chunk that fixes the
> bug from the changes that make it more consistent to make it easier to
> figure out what to focus on in the patch.
>
> Apart from that the patch is
> Reviewed-By: Gert Wollny <gert.wollny at collabora.com>

I actually split this, because it seemed safer for bisection purposes.
>
>
>> P.S.: our experiments on both Intel Atom and aarch64 platforms shows
>> that glBufferSubData() performs faster in function
>> vrend_renderer_transfer_write_iov() than using the combination of
>> glMapBufferRange()+memcpy(). This improvement could be attributed to
>> better cache utilization (write-combining), NEON-assisted mem-copying
>> in aarch64, and etc. You may set use_sub_data=1 at your discretion.

Interesting this, I'm not sure how this lands on x86, I remember experimenting
with nvidia many years ago and the path we use now being better, but I suspect
if the driver can do better than memcpy on ARM then memcpy could be done
faster.

There is also a threshold and API overhead here. If you have a iov with say 20
ios in it, then each of those has to call BufferSubData with any overheads
that entails vs a single map/memcpy, so I'm not sure there is truly one better
way but it's possibly dependant on number of factors, unless of course memcpy
is just a lot slower.

Dave.