[igt-dev] [PATCH i-g-t v1] Data Port Cache Coherency tests.

Joonas Lahtinen joonas.lahtinen at linux.intel.com
Mon Jul 16 12:48:26 UTC 2018


Quoting Lis, Tomasz (2018-06-21 14:54:47)
> 
> On 2018-06-21 07:53, Joonas Lahtinen wrote:
> > Quoting Tomasz Lis (2018-06-20 18:14:38)
> >> This adds a new test binary, containing tests for the Data Port Coherency
> >> option. The tests check whether the option value is stored properly on the
> >> kernel side, but also whether is is correctly set to proper GPU register.
> > I'm fairly sure there already was review feedback that simply checking
> > the register state is not a good IGT test.
> >
> > IGT tests make sure that whatever uAPI the kernel has, the promised effects
> > of that uAPI do not get broken. And the promise for cache coherency uAPI
> > for surely isn't that some register value gets written, it's that the
> > cache coherency is traded with some performance. The chicken bits or the
> > whole implementation of the feature might be turned upside down, but as
> > long as the userspace is still working, userspace should not care.
> >
> > So the chicken bit setting should be scrapped, and actual cache
> > coherency observed. It might be a worthy kernel selftest that register
> > writes stick and remain over sleep states as a generic thing, but not
> > here.
> >
> > If you don't address the feedback given, but hammer the mailing list
> > without addressing it, don't expect further feedback.
> >
> > Regards, Joonas
> Thank you for your feedback, both now and in the previous round.
> 
> Developing a test which checks whether the hardware is really doing what 
> is should is possible, but also it is quite problematic.
> On the UMD side, there are Khronos OCL 2.1 tests which verify coherency; 
> but they have graphics pipeline fully configured, and it is a lot easier 
> to verify it on the user level.
> 
> We could write a test which does the following:
> - Allocates a very big buffer
> - Executes a shader which fills the whole buffer over and over, with 
> increasing numbers
> - Does it long enough for the IGT to notice changes in the buffer while 
> the shader is being executed
> - If there were increasing changes in buffer content - test passes
> 
> But there are issues with this approach:
> - We need a lot of memory to do the test; the faster the GPU is, the 
> more we need
> - We need to write a shader, in assembly, for each platform
> - A new shader will need to be written to test future platforms
> - We need to configure the graphics pipeline within IGT test, which is 
> not trivial
> - The test results may be unreliable because even if coherency is 
> disabled, there is no guarantee that the memory will not be updated
> - Actually the buffer probably will be updated while the shader is 
> executing without coherency, but details may depend on platform, 
> pipeline settings and buffer size
> - The test results may be unreliable due to timing, so it will likely 
> produce sporadics
> - The test would require a separate pre-silicon part
> - Test execution time would be considerable
> 
> After reading the answers in previous review, I agree that to achieve 
> the IGTs goal testing hardware should be included. But I don't think 
> it's a good idea here.

Without testing what is actually the promised effect for userspace, we
just can't succeed maintaining the feature. There surely are use-cases
where it's noticed if the feature is not working, and those would be
the prime candidates for extracting the testing logic. We should
definitely be able to accumulate execution time if we run a little bit
of the test in each CI run.

Just looking at register values is simply not going to make it, the
registers could be just sitting there, not effecting the HW operation on
a future gen.

Hopefully this clarifies enough the expected kind of test.

Regards, Joonas

> 
> But maybe there is an easier way to test the functionality which I 
> didn't noticed? Or maybe some of my arguments are invalid?
> 
> We do have gem_gpgpu_fill test which does execute a simple shader. The 
> shader needs to be written for each gen to re-enable the test, but it 
> works.
> 
> -Tomasz
> 
> Adding the last mail in previous review, for reference:
> 
> Quoting Lis, Tomasz (2018-03-20 18:27:44)
>  >
>  >
>  > On 2018-03-20 16:36, Chris Wilson wrote:
>  > > Quoting Chris Wilson (2018-03-20 15:30:59)
>  > >> Quoting Tomasz Lis (2018-03-20 15:12:32)
>  > >>> From: "Lis, Tomasz" <tomasz.lis at intel.com>
>  > >>>
>  > >>> This adds a new test binary, containing tests for the Data Port
>  > >>> Coherency option. The tests check whether the option is correctly
>  > >>> set to proper GPU register.
>  > >> But where's the test that it *does* anything? I.e. what's the
>  > >> expected change in user visible behaviour? (What does the flag
>  > >> really mean, now and for years to come?)
>  >
>  > With a test that "does anything" (which I understand as - submitting 
> large workload and checking the coherency while the workload is 
> executing) we would be testing the hardware, not the kernel.
>  > I prepared a test binary designed to test the kernel change I 
> proposed. This is what I consider good practices.
>  > Do we have another approach when developing IGT?
> 
> What we try to avoid is "visible userspace breakage". So when some added 
> kernel feature introduces a behavioral change for userspace, the 
> accompanying tests should make sure that change is observed with the 
> feature bit flipped. That's pretty much the only way we can make sure 
> the written userspace applications using the feature will keep working 
> when going forward.
> 
> Writing to MMIO address X so that the bits Y and Z stick will only 
> guarantee that there is some hardware register in there. It is not very 
> valuable of a test because making sure the feature keeps working will 
> implicitly test the same thing.
> 
> Hopefully this answers the question.
> 
> Regards, Joonas
> 
>  >
>  > > To be clear; this test illustrates that it is a context register
>  > > being tweaked and not an execbuf property.
>  > > -Chris
>  > I agree the hardware flag is within hardware context. Both the kernel
>  > patch and the IGT test are accessing that bit, so they both illustrate
>  > it is a context register.
>  > But both patches also implement UMD interface to that flag via execbuf
>  > flag. What do you mean when you write this is "not an execbuf property"?
>  >
>  >
> 


More information about the igt-dev mailing list