[igt-dev] [PATCH i-g-t v1] Data Port Cache Coherency tests.

Lis, Tomasz tomasz.lis at intel.com
Thu Jul 19 17:34:34 UTC 2018



On 2018-07-18 15:22, Tvrtko Ursulin wrote:
>
> On 17/07/2018 14:47, Lis, Tomasz wrote:
>>
>>
>> On 2018-07-16 14:48, Joonas Lahtinen wrote:
>>> Quoting Lis, Tomasz (2018-06-21 14:54:47)
>>>> On 2018-06-21 07:53, Joonas Lahtinen wrote:
>>>>> Quoting Tomasz Lis (2018-06-20 18:14:38)
>>>>>> This adds a new test binary, containing tests for the Data Port 
>>>>>> Coherency
>>>>>> option. The tests check whether the option value is stored 
>>>>>> properly on the
>>>>>> kernel side, but also whether is is correctly set to proper GPU 
>>>>>> register.
>>>>> I'm fairly sure there already was review feedback that simply 
>>>>> checking
>>>>> the register state is not a good IGT test.
>>>>>
>>>>> IGT tests make sure that whatever uAPI the kernel has, the 
>>>>> promised effects
>>>>> of that uAPI do not get broken. And the promise for cache 
>>>>> coherency uAPI
>>>>> for surely isn't that some register value gets written, it's that the
>>>>> cache coherency is traded with some performance. The chicken bits 
>>>>> or the
>>>>> whole implementation of the feature might be turned upside down, 
>>>>> but as
>>>>> long as the userspace is still working, userspace should not care.
>>>>>
>>>>> So the chicken bit setting should be scrapped, and actual cache
>>>>> coherency observed. It might be a worthy kernel selftest that 
>>>>> register
>>>>> writes stick and remain over sleep states as a generic thing, but not
>>>>> here.
>>>>>
>>>>> If you don't address the feedback given, but hammer the mailing list
>>>>> without addressing it, don't expect further feedback.
>>>>>
>>>>> Regards, Joonas
>>>> Thank you for your feedback, both now and in the previous round.
>>>>
>>>> Developing a test which checks whether the hardware is really doing 
>>>> what
>>>> is should is possible, but also it is quite problematic.
>>>> On the UMD side, there are Khronos OCL 2.1 tests which verify 
>>>> coherency;
>>>> but they have graphics pipeline fully configured, and it is a lot 
>>>> easier
>>>> to verify it on the user level.
>>>>
>>>> We could write a test which does the following:
>>>> - Allocates a very big buffer
>>>> - Executes a shader which fills the whole buffer over and over, with
>>>> increasing numbers
>>>> - Does it long enough for the IGT to notice changes in the buffer 
>>>> while
>>>> the shader is being executed
>>>> - If there were increasing changes in buffer content - test passes
>>>>
>>>> But there are issues with this approach:
>>>> - We need a lot of memory to do the test; the faster the GPU is, the
>>>> more we need
>>>> - We need to write a shader, in assembly, for each platform
>>>> - A new shader will need to be written to test future platforms
>>>> - We need to configure the graphics pipeline within IGT test, which is
>>>> not trivial
>>>> - The test results may be unreliable because even if coherency is
>>>> disabled, there is no guarantee that the memory will not be updated
>>>> - Actually the buffer probably will be updated while the shader is
>>>> executing without coherency, but details may depend on platform,
>>>> pipeline settings and buffer size
>>>> - The test results may be unreliable due to timing, so it will likely
>>>> produce sporadics
>>>> - The test would require a separate pre-silicon part
>>>> - Test execution time would be considerable
>>>>
>>>> After reading the answers in previous review, I agree that to achieve
>>>> the IGTs goal testing hardware should be included. But I don't think
>>>> it's a good idea here.
>>> Without testing what is actually the promised effect for userspace, we
>>> just can't succeed maintaining the feature. There surely are use-cases
>>> where it's noticed if the feature is not working, and those would be
>>> the prime candidates for extracting the testing logic. We should
>>> definitely be able to accumulate execution time if we run a little bit
>>> of the test in each CI run.
>>>
>>> Just looking at register values is simply not going to make it, the
>>> registers could be just sitting there, not effecting the HW 
>>> operation on
>>> a future gen.
>>>
>>> Hopefully this clarifies enough the expected kind of test.
>>>
>>> Regards, Joonas
>> Note that it still won't be possible to test future gens, because the 
>> test will have to be re-developed for each gen.
>> But I did used that argument already. I don't have any more arguments.
>>
>> Since I wasn't able to convince you, I will start estimating effort 
>> required to develop the test within I-G-T which verifies coherency 
>> using a shared buffer.
>
> Sounds like the dilemma is whether we should be testing the driver or 
> hardware. We can of course test both, or even test the driver 
> indirectly via testing the hardware.
>
> It sounds reasonable to attempt to test the hardware (that it is 
> really coherent when enabled, or vice versa). It is simpler in a way 
> since then you don't need to embed knowledge on what the register and 
> bit are per Gen. But yeah, to test coherency I do understand you also 
> need some gen dependant code. Existing rendercopy copy wouldn't work 
> for this?
>
> But I don't understand the low-level workings of the feature well 
> enough to know why it is so difficult to trigger non-coherency from a 
> test.
>
> We do have stress tests which hammer on something and rely only on 
> occasional failures to flag up problems, so not being to trigger the 
> problem in 100% of runs per-se is not a showstopper.
>
> If it is indeed very difficult, would it be easier to start with 
> trying to detect two levels of performance between non-coherent and 
> coherent modes? That way we would know setting the bit has some effect 
> at least and is not completely ignored by the hardware.
>
Here is possible inspiration for the test - comments from Khronos OCL 
tests for the feature:

test_fine_grain_sync_buffers.cpp:
// Goals: demonstrate use of SVM's atomics to do fine grain 
synchronization between the device and host.
// Concept: a device kernel is used to search an input image for regions 
that match a target pattern.
// The device immediately notifies the host when it finds a target (via 
an atomic operation that works across host and devices).
// The host is then able to spawn a task that further analyzes the 
target while the device continues searching for more targets.

test_fine_grain_memory_consistency.cpp:
// This tests for memory consistency across devices and the host.
// Each device and the host simultaneously insert values into a single 
hash table.
// Each bin in the hash table is a linked list.  Each bin is protected 
against simultaneous
// update using a lock free technique.  The correctness of the list is 
verfied on the host.
// This test requires the new OpenCL 2.0 atomic operations that 
implement the new seq_cst memory ordering.

-Tomasz
> Regards,
>
> Tvrtko



More information about the igt-dev mailing list