<div dir="ltr"><div><div><div>I would go farther than requiring the returned bo->size to be covered by backing pages.<br></div>There really should be backing pages for every page mapped by drm_intel_gem_bo_map.<br></div>
No mapped addresses should be affecting memory outside of an object's backing pages.<br><br></div>If tiling can result in access to unallocated memory it might even lead to corruption of data in pages<br>used by different processes. That would break GL_ARB_robustness requirements.<br>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Apr 15, 2014 at 10:43 AM, Courtney Goeltzenleuchter <span dir="ltr"><<a href="mailto:courtney@lunarg.com" target="_blank">courtney@lunarg.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">What timing, we were just looking at this exact issue - intermittent glxgears rendering issues when using multisample buffers.<div>
<br></div><div>What's the plan for DRM? Seems like it's broken if writing to the buffer size indicated (<span style="white-space:pre-wrap">bo->size) </span>causes us to clobber other BOs.</div><span class="HOEnZb"><font color="#888888">
<div><br></div><div>Courtney</div></font></span></div><div class="gmail_extra"><br><br><div class="gmail_quote"><div><div class="h5">On Mon, Apr 14, 2014 at 8:54 PM, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>></span> wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"><div>On 04/14/2014 05:33 PM, Eric Anholt wrote:<br>
</div><div>> This manifested as rendering failures or sometimes GPU hangs in<br>
> compositors when they accidentally got MSAA visuals due to a bug in the X<br>
> Server. Today we decided that the problem in compositors was equivalent<br>
> to a corruption bug we'd noticed recently in resizing MSAA-visual<br>
> glxgears, and debugging got a lot easier.<br>
><br>
> When we allocate our MCS MT, libdrm takes the size we request, aligns it<br>
> to Y tile size (blowing it up from 300x300=900000 bytes to 384*320=122880<br>
> bytes, 30 pages), then puts it into a power-of-two-sized BO (131072 bytes,<br>
> 32 pages). Because it's Y tiled, we attach a 384-byte-stride fence to it.<br>
> When we memset by the BO size in Mesa, between bytes 122880 and 131072 the<br>
> data gets stored to the first 20 or so scanlines of each of the 3 tiled<br>
> pages in that row, even though only 2 of those pages were allocated by<br>
> libdrm.<br>
<br>
</div>What?<br>
<br>
I get that drm_intel_bo_alloc/drm_intel_bo_alloc_tiled might return a<br>
drm_intel_bo where bo->size is larger than what you asked for, due to<br>
the BO cache. But...what you're saying is, it doesn't actually allocate<br>
enough pages to back the whole bo->size it gives you? So, if you write<br>
bytes 0..(bo->size - 1), you'll randomly clobber memory in a way that's<br>
really difficult to detect?<br>
<br>
I never realized that. It seems pretty scary.<br>
<br>
There are other places where we memset an entire BO using bo->size. For<br>
example, your INTEL_DEBUG=shader_time code does exactly that (though it<br>
isn't tiled).<br>
<br>
Could we change libdrm to set bo->size to the actual usable size of the<br>
buffer, rather than the bucket size? That seems much safer, and would<br>
prevent mistakes like this in the future. If libdrm needs to know the<br>
bucket size internally, we could store that as a field in the private<br>
drm_intel_gem_bo structure, and not expose it to consumers of the<br>
library. I can't imagine users of libdrm want the current value.<br>
<div><div><br>
> In the glxgears case, the missing 3rd page happened to<br>
> consistently be the static VBO that got mapped right after the first MCS<br>
> allocation, so corruption only appeared once window resize made us throw<br>
> out the old MCS and then allocate the same BO to back the new MCS.<br>
><br>
> Instead, just memset the amount of data we actually asked libdrm to<br>
> allocate for, which will be smaller (more efficient) and not overrun.<br>
> Thanks go to Kenneth for doing most of the hard debugging to eliminate a<br>
> lot of the search space for the bug.<br>
><br>
> Cc: "10.0 10.1" <<a href="mailto:mesa-stable@lists.freedesktop.org" target="_blank">mesa-stable@lists.freedesktop.org</a>><br>
> Bugzilla: <a href="https://bugs.freedesktop.org/show_bug.cgi?id=77207" target="_blank">https://bugs.freedesktop.org/show_bug.cgi?id=77207</a><br>
> ---<br>
><br>
> Ken: how I eventually figured this out was I thought "well, I'm<br>
> clearing the whole surface at the start of rendering glxgears, so I<br>
> don't *really* need to memset because I'll be initializing the whole<br>
> buffer during fallback glClear() anyway, so let me just drop the whole<br>
> memset step to definitely eliminate that as a potential problem. Huh,<br>
> the problem's gone."<br>
><br>
> The worst is I remember briefly thinking about this code when it was<br>
> landing, and thought "nope, seems safe."<br>
><br>
> src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +-<br>
> 1 file changed, 1 insertion(+), 1 deletion(-)<br>
><br>
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c<br>
> index 5996a1b..59700ed 100644<br>
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c<br>
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c<br>
> @@ -1219,7 +1219,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw,<br>
> * Note: the clear value for MCS buffers is all 1's, so we memset to 0xff.<br>
> */<br>
> void *data = intel_miptree_map_raw(brw, mt->mcs_mt);<br>
> - memset(data, 0xff, mt->mcs_mt->region->bo->size);<br>
> + memset(data, 0xff, mt->mcs_mt->region->height * mt->mcs_mt->region->pitch);<br>
> intel_miptree_unmap_raw(brw, mt->mcs_mt);<br>
> mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;<br>
<br>
</div></div>This does seem to fix the KWin problem, as well as the glxgears problem.<br>
<br>
I agree this is the correct amount of data to memset, and even if we<br>
make the libdrm change I suggested, this seems worth doing. bo->size<br>
may have been rounded up beyond what we need, and memsetting that extra<br>
space is wasteful (even if it did work).<br>
<br>
Reviewed-by: Kenneth Graunke <<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>><br>
<br>
Thanks a ton for your help on this, Eric. I was really stumped.<br>
<br>
<br></div></div><div class="">_______________________________________________<br>
mesa-stable mailing list<br>
<a href="mailto:mesa-stable@lists.freedesktop.org" target="_blank">mesa-stable@lists.freedesktop.org</a><br>
<a href="http://lists.freedesktop.org/mailman/listinfo/mesa-stable" target="_blank">http://lists.freedesktop.org/mailman/listinfo/mesa-stable</a><br>
<br></div></blockquote></div><div class=""><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Courtney Goeltzenleuchter<br><div>LunarG</div><div><img src="http://media.lunarg.com/wp-content/themes/LunarG/images/logo.png" height="65" width="96"><br>
</div></div>
</div></div>
<br>_______________________________________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
<a href="http://lists.freedesktop.org/mailman/listinfo/mesa-dev" target="_blank">http://lists.freedesktop.org/mailman/listinfo/mesa-dev</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br><br> Mike Stroyan - Software Architect<br> LunarG, Inc. - The Graphics Experts<br> Cell: (970) 219-7905<br> Email: Mike@LunarG.com<br> Website: <a href="http://www.lunarg.com">http://www.lunarg.com</a>
</div>