[Intel-gfx] [RFC] drm/i915/tgl: Advanced preparser support for GPU relocs

Fri Aug 23 15:56:54 UTC 2019

On 8/23/19 8:52 AM, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2019-08-23 16:39:14)
>>
>>
>> On 8/23/19 8:28 AM, Chris Wilson wrote:
>>> Quoting Chris Wilson (2019-08-23 16:10:48)
>>>> Quoting Daniele Ceraolo Spurio (2019-08-23 16:05:45)
>>>>>
>>>>>
>>>>> On 8/23/19 7:26 AM, Chris Wilson wrote:
>>>>>> Quoting Chris Wilson (2019-08-23 08:27:25)
>>>>>>> Quoting Daniele Ceraolo Spurio (2019-08-23 03:09:09)
>>>>>>>> TGL has an improved CS pre-parser that can now pre-fetch commands across
>>>>>>>> batch boundaries. This improves performances when lots of small batches
>>>>>>>> are used, but has an impact on self-modifying code. If we want to modify
>>>>>>>> the content of a batch from another ring/batch, we need to either
>>>>>>>> guarantee that the memory location is updated before the pre-parser gets
>>>>>>>> to it or we need to turn the pre-parser off around the modification.
>>>>>>>> In i915, we use self-modifying code only for GPU relocations.
>>>>>>>>
>>>>>>>> The pre-parser fetches across memory synchronization commands as well,
>>>>>>>> so the only way to guarantee that the writes land before the parser gets
>>>>>>>> to it is to have more instructions between the sync and the destination
>>>>>>>> than the parser FIFO depth, which is not an optimal solution.
>>>>>>>
>>>>>>> Well, our ABI is that memory is coherent before the breadcrumb of *each*
>>>>>>> batch. That is a fundamental requirement for our signaling to userspace.
>>>>>>> Please tell me that there is a context flag to turn this off, or we else
>>>>>>> we need to emit 32x flushes or whatever it takes.
>>>>>>
>>>>> Are you referring to the specific case where we have a request modifying
>>>>> an object that is then used as a batch in the next request? Because
>>>>> coherency of objects that are not executed as batches is not impacted.
>>>>
>>>> "Fetches across memory sync" sounds like a major ABI break. The batches
>>>> are a hard serialisation barrier, with memory coherency guaranteed prior
>>>> to the signaling at the end of one batch and clear caches guaranteed at
>>>> the start of the next.
>>>
>>> We have relocs, oa and sseu all using self-modifying code. I expect we
>>> will have PTE modifications and much more done via the GPU in the near
>>> future. All rely on the CS_STALL doing exactly what it says on the tin.
>>> -Chris
>>>
>>
>> I guess the easiest solution is then to keep the parser off outside of
>> user batches. We can default to off and then restore what the user has
>> programmed before the BBSTART. It's not a breach of contract if we say
>> that if you opt-in to the parser then you need to make sure your batches
>> are not self-modifying, right?
> 
> Is it just the MI_ARB_ONOFF bits, and is that still a privileged
> command? i.e. can userspace change mode by itself, or it is a
> context-param?

It's the ARB_CHECK, not the ARB_ONOFF, so yes, it is not privileged and 
userspace can modify it itself. It would've been easier if it was a 
context param :)

Daniele

> 
>> BTW the CS_STALL does not guarantee on pre-gen12 gens that
>> self-modifying code works within the same batch/ring because the
>> pre-parser is already pre-fetching across memory sync points, it just
>> stops at the next arb point.
> 
> Ok, we still uphold our contract if they can't execute any code in the
> window where they would see someone else's data.
> -Chris
>