[Mesa-dev] [PATCH 2/3] i965/gen7: Set MOCS L3 cacheability for IVB/BYT

Thu Aug 15 10:04:02 PDT 2013

On 08/15/2013 09:11 AM, Ville Syrjälä wrote:
> On Thu, Aug 15, 2013 at 08:08:12AM -0700, Chad Versace wrote:

>> I would like such a cache-control ioctl, as long the ioctl can also
>> be used to change the object's cacheing policy in addition to
>> setting it at object creation. This would be
>> needed when an object's usage oscillates between texture surface
>> and render target.
>
> We do have the set_caching ioctl. It's enough to flip the PTEs to UC and
> let MOCS manage things. I actually did a few experiments on my IVB. I
> made all Mesa's buffers UC via PTEs by patching libdrm to change the
> cache mode of each bo after allocation. Then I fiddled with the MOCS
> LLC bits in various ways. It definitely has an effect, sometimes making
> things slower, sometimes faster. xonotic again seemed to benefit. IIRC
> leaving everything LLC uncached was actually the fastest (w/ high quality
> at least) so we may be thrashing the LLC a bit there. But eg. reaction
> quake regressed quite a lot if most things were left as UC.
>
> I should probably run through a few MOCS combinations and collect a bit
> more data. But it's looking like some sensible heuristic has to be
> involved since different benchmarks show conflicting results. Maybe
> your LLC overcommit prevention approach would be the one. Are you
> planning to continue with that work?

I do plan to continue that work. I plan to return to it the week of
Aug 26, because I need to first make more progress on Broadwell.

My simple heuristic that prevents overcommit of the
LLC, in its current form, gives varying results too. Some benchmarks benefit, some harmed.
In each experiment, I set the LLC commit threshhold to 0.80,
1.00, or 1.25. (That is, for a given draw call, Mesa stops putting objects
in the LLC when the draw call has filled that ratio of the LLC).

Hopefully, to get consistent benefit across all apps, all we need is to choose a significantly
higher or lower threshold than I've previously chosen. What I fear, though, is that
since the GPU shares the LLC with the CPU (GPU-LLC=CPU-L3), to find a heuristic that's
near-globally beneficial, we may need to consider the CPU load to intelligently choose the LLC commit threshold