[PATCH i-g-t v3 00/10] tests/intel/xe_svm: Add tests for Shared Virtual Memory (SVM)

Fri May 24 03:12:02 UTC 2024

Hi Matt,

> -----Original Message-----
> From: Brost, Matthew <matthew.brost at intel.com>
> Sent: Thursday, May 23, 2024 1:27 PM
> To: Zeng, Oak <oak.zeng at intel.com>
> Cc: Bommu, Krishnaiah <krishnaiah.bommu at intel.com>; igt-
> dev at lists.freedesktop.org; intel-xe at lists.freedesktop.org
> Subject: Re: [PATCH i-g-t v3 00/10] tests/intel/xe_svm: Add tests for Shared
> Virtual Memory (SVM)
> 
> On Wed, May 22, 2024 at 10:53:05AM -0600, Zeng, Oak wrote:
> > Hi Matt,
> >
> > We will investigate your tests and see whether we can achieve the same
> coverage with our approach.
> >
> > See also comments inline.
> >
> > > -----Original Message-----
> > > From: Brost, Matthew <matthew.brost at intel.com>
> > > Sent: Wednesday, May 22, 2024 7:42 AM
> > > To: Bommu, Krishnaiah <krishnaiah.bommu at intel.com>
> > > Cc: igt-dev at lists.freedesktop.org; Brost, Matthew
> > > <matthew.brost at intel.com>; intel-xe at lists.freedesktop.org; Zeng, Oak
> > > <oak.zeng at intel.com>
> > > Subject: Re: [PATCH i-g-t v3 00/10] tests/intel/xe_svm: Add tests for
> Shared
> > > Virtual Memory (SVM)
> > >
> > > On Wed, May 22, 2024 at 11:38:18AM +0000, Matthew Brost wrote:
> > > > On Fri, May 17, 2024 at 05:16:48PM +0530, Bommu Krishnaiah wrote:
> > > > > Introduce helper functions for object creation, binding, submission,
> > > > > and destruction, applicable for SVM and other tests
> > > > >
> > > > > xe-basic test is validating the helper function introduced in
> 'lib/xe/xe_util:
> > > helper function'
> > > > >
> > > > > svm test cases:
> > > > > svm-basic-malloc
> > > > > svm-basic-mmap
> > > > > svm-random-access
> > > > > svm-huge-page
> > > > > svm-atomic-access
> > > > > svm-atomic-access
> > > > > svm_invalid_va
> > > > > svm-mprotect
> > > > > svm-benchmark
> > > > > svm-sparse-access
> > > > >
> > > >
> > > > There's a lot here I dislike. Let me be direct.
> > > >
> > > > Firstly, this isn't an SVM test; it explicitly binds userptrs and BOs. So
> using
> > > > SVM prefixes in the test names doesn't make sense.
> >
> > It is SVM test. Those test use BO for command (batch) buffer. But the
> buffer GPU read/write is malloc'ed or mmap'ed. So it does test SVM/system
> allocator functionality.
> >
> 
> I did miss that read/write addresses are malloc'd or mmap'd.
> 
> However, I don't see a huge bind anywhere in the series. 

Yes, I pointed out the same in the code review. We need to fix it.

Buffer Objects
> (BOs) are only unmapped, not returned to the system allocator bindings,
> creating holes in the system allocator space. Additionally, BO addresses
> are statically assigned, which does not align with the system allocator
> design of calling malloc to reserve address space. This was the source
> of my confusion.

Good points. We need to fix the issues.

> 
> > Malloc'ed and mmap'ed memory can also be used for command buffer. We
> can add a test case.
> >
> >
> > And having RBs with
> > > those
> > > > names makes even less sense. With that; This email is directed at
> > > EVERYONE
> > > > replying to this series.
> > > >
> > >
> > > Didn't send everyone, CCing interested parties.
> > >
> > > > At a high level, I've expressed my views on generic helpers hindering
> the
> > > > ability to create truly powerful tests that provide the necessary
> coverage
> > > for a
> > > > complex driver. See my thoughts here [1] [2]. I don't see anything here
> > > that
> > > > comes close to addressing my concerns.
> >
> > I replied again to [1] with more details. Please take a look.
> >
> >
> > These tests are so simple they
> > > won't find
> > > > any bugs in the KMD beyond the implementation being completely
> broken.
> > > >
> >
> > Agree those tests in this series are simple. But this is just a start. We plan to
> do more complex tests which are documented in
> https://jira.devtools.intel.com/browse/VLK-59767
> >
> 
> So we agree that my concerns raised in [1] [2] about only being able to
> build simple tests with this design were not addressed.
> 
> Having plans is not the same as addressing concerns. If you believe you
> can build complex tests achieving good coverage using this design,
> great. Please post that. Until then, I remain unconvinced.
> 
> I briefly read the Jira; a lot of it is covered by the code shared. What
> is not can likely be added in a similar way to the HUGE_TLB example I
> shared in the previous email.
> 
> One thing I noticed in Jira is that it's written basically as unit tests
> (e.g., test feature A, test feature B). Unit tests are okay, but truly
> powerful tests do a lot of things all at the same time, in parallel.
> This is the advantage of the style of tests I write. Six lines of code
> in my example (data path code, that is; tables and documents added more)
> spawned over 100 tests doing this in the aforementioned way.

Yes, I had the same thoughts. 

They are basically two categories of tests:

unit tests. We will try to cover as much as possible in the simple unit test.

integrated test or stress test. 

I tend to think we want to have both. Of course we might have some duplications, such as some functions are tested from both categories. Hopefully this is still acceptable. 

> 
> > We will also look at your series to see whether we can have the same
> coverage.
> >
> >
> > > > This is what I think an SVM test should look like [3] (for the record, I
> have
> > > > shared an eariler baseline of this code too). While it has some complex
> > > coding
> > > > patterns, once understood by a developer, it's quite easy to extend for
> a
> > > new
> > > > test case. Let me give an example...
> > > >
> > > > Take a look at this diff [4]. It's a few lines of code that add HUGE_TLB
> > > > testing. This simple change spawns over 100 tests of various varieties
> (e.g.,
> > > a
> > > > single exec with HUGE_TLB, two execs, many execs, multi-threaded
> > > versions,
> > > > multi-process versions, etc...).
> > > >
> > > > Before change:
> > > > ./build/tests/xe_exec_system_allocator --l | wc
> > > >     938     938   32872
> > > >
> > > > After change:
> > > > ./build/tests/xe_exec_system_allocator --l | wc
> > > >    1070    1070   37852
> > > >
> > > > Being extendable like this ultimately reduces maintenance costs over
> time.
> > > >
> > > > The test I've shared creates true coverage and is just a starting point. It
> will
> > > > need to become more complex over time as the SVM uAPI grows.
> > > >
> > > > I suggest investing the time to learn how the code I've shared works
> and
> > > develop
> > > > within that framework. If the same level of coverage/extendability can
> be
> > > > achieved with helpers, great. Anything less, in either area, I don't think
> is
> > > > acceptable.
> >
> > There are two fundamental design difference b/t your approach and the
> approach we applied here:
> >
> > 1) Whether to create/use helper functions:
> >
> > Our helper functions created concept of xe-buffer, xe-command-buffer. I
> still think with helper functions, we can simply test writing.
> >
> > 2) use one big test to test many functionalities vs many small tests, each
> test one functionality
> >
> > It is obvious, our approach is much easier to maintain and understand.
> >
> 
> This is an opinion, disrespectful, and rude.

I sincerely apologize if those words hurt you. It is not intended to.

> 
> I've provided a concrete example of how the style of tests I've written
> in Xe are quite powerful in this email thread. In my opinion, it has
> been responded to with speculative arguments and opinions rather than
> code.
> 
> I have very, very valid reasons for being concerned about testing. This
> series [5] was posted in a completely non-functional, untested state. As
> reminding and to record this publicly, this was developed off a basline
> which was tested with userptr SVM faulting working. I provided the
> working baseline and a flavor of the test I have shared.
> 
> Going from a tested working KMD with an IGT to untested broken KMD
> plus a promise of 'obvious, our approach is much easier to maintain and
> understand', well I'll let everyone decide how that looks.
> 
> [5] https://patchwork.freedesktop.org/series/132229/
> 
> > So the key is, whether we can achieve the same coverage as your series. As
> said, we will take a closer look of your series and give it a try.
> >
> 
> I highly suggest you study the code that I shared.

Yes, will do. Let's see what we can come up.

Oak
> 
> Matt
> 
> > Oak
> >
> > > >
> > > > Matt
> > > >
> > > > I can't find PW links for [1][2] so instead have shared the email thread
> > > subject
> > > > lines. Everyone on this email was on those chains as well.
> > > >
> > > > [1] [PATCH RFC i-g-t 0/2] helper function
> > > > [2] [PATCH] lib/xe/xe_util: Creating the helper functions
> > > > [3]
> > >
> https://patchwork.freedesktop.org/patch/594807/?series=133846&rev=1
> > > > [4] https://pastebin.com/jUN5BwUy
> > > >
> > > > > svm kernel implimentation:
> > > > > https://gitlab.freedesktop.org/oak/xe-kernel-driver-svm.git
> > > > > branch: origin/drm-xe-next-svm-unify-userptr
> > > > >
> > > > > v2 igt patch: https://patchwork.freedesktop.org/series/133096/
> > > > >
> > > > > Note: xe-basic test is validated without SVM. Remaining tests are not
> > > validated because the SVM driver code is still under development
> > > > >
> > > > > Bommu Krishnaiah (10):
> > > > >   lib/xe/xe_util: Introduce helper functions for buffer creation and
> > > > >     command submission etc
> > > > >   tests/intel/xe_svm: basic xe-basic test
> > > > >   tests/intel/xe_svm: Add SVM basic tests using malloc and mmap
> > > > >   tests/intel/xe_svm: add random access test for SVM
> > > > >   tests/intel/xe_svm: add huge page access test for SVM
> > > > >   tests/intel/xe_svm: Add support for GPU atomic access test for svm
> > > > >   tests/intel/xe_svm: Add svm-invalid-va test to verify SVM
> > > > >     functionality with invalid address access
> > > > >   tests/intel/xe_svm: Add svm-benchmark test to measure SVM
> > > performance
> > > > >     with a simple benchmark
> > > > >   tests/intel/xe_svm: Add svm-mprotect test to verify SVM
> functionality
> > > > >     with read-only memory access
> > > > >   tests/intel/xe_svm: Add svm-sparse-access test to verify sparsely
> > > > >     accessing two memory locations with SVM
> > > > >
> > > > >  include/drm-uapi/xe_drm.h |   1 +
> > > > >  lib/xe/xe_util.c          | 214 +++++++++++++++++
> > > > >  lib/xe/xe_util.h          |  40 ++++
> > > > >  tests/intel/xe_svm.c      | 476
> > > ++++++++++++++++++++++++++++++++++++++
> > > > >  tests/meson.build         |   1 +
> > > > >  5 files changed, 732 insertions(+)
> > > > >  create mode 100644 tests/intel/xe_svm.c
> > > > >
> > > > > --
> > > > > 2.25.1
> > > > >