[PATCH v6 2/8] drm/ttm: Add ttm_bo_access

Mon Nov 25 15:29:53 UTC 2024

On Fri, Nov 15, 2024 at 10:27:59AM -0800, Matthew Brost wrote:
> On Wed, Nov 13, 2024 at 12:42:35PM +0100, Christian König wrote:
> > Am 13.11.24 um 11:44 schrieb Thomas Hellström:
> > > On Wed, 2024-11-13 at 09:37 +0100, Christian König wrote:
> > > > Am 12.11.24 um 17:33 schrieb Thomas Hellström:
> > > > > [SNIP]
> > > > > > > > This has been extensively discussed already, but was expected
> > > > > > > > to
> > > > > > > > really
> > > > > > > > only be needed for low-on-memory scenarios. However it now
> > > > > > > > seems
> > > > > > > > like
> > > > > > > > the need is much earlier due to the random userptr page
> > > > > > > > joining
> > > > > > > > by
> > > > > > > > core
> > > > > > > > mm.
> > > > > > > Just to clarify here:
> > > > > > > In Long-Running mode with recoverable pagefaults enabled we
> > > > > > > don't
> > > > > > > have
> > > > > > > any preempt-fences, but rather just zap the PTEs pointing to
> > > > > > > the
> > > > > > > affected memory and flush TLB. So from a memory resource POW a
> > > > > > > breakpoint should be safe, and no mmu notifier nor shrinker
> > > > > > > will be
> > > > > > > blocked.
> > > > > > That sounds like a HMM based approach which would clearly work.
> > > > > > 
> > > > > > But where is that? I don't see any HMM based approach anywhere in
> > > > > > the
> > > > > > XE
> > > > > > driver.
> > > > > This is a mode that uses recoverable pagefaults to fault either
> > > > > full
> > > > > userptr or full bos, and used with DRM_XE_VM_CRATE_FLAG_FAULT_MODE.
> > > > > (not SVM)!
> > > > > 
> > > > > userptrs in xe are bo-less, and using the vm's resv, but otherwise
> > > > > using hmm similar to amdgpu: xe_hmm.c
> > > > Yeah, I have seen that one.
> > > > 
> > > > > fault servicing:
> > > > > xe_gt_pagefault.c
> > > > > 
> > > > > PTE zapping on eviction and notifier:
> > > > > xe_vm_invalidate_vma(), xe_vm.c
> > > > Ah, that was the stuff I was missing.
> > > > 
> > > > So the implementation in xe_preempt_fence.c is just for graphics
> > > > submissions? That would make the whole thing much easier to handle.
> > > Actually it's not, it's intended for long-running mode, but as a
> > > consequence the debugger would be allowed only in fault mode.
> > 
> > Make sense, yes.
> > 
> > > > The only remaining question I can then see is if long running
> > > > submissions with DRM_XE_VM_CRATE_FLAG_FAULT_MODE could potentially
> > > > block
> > > > graphics submissions without this flag from accessing the hardware?
> > > Yes and no. We have a mechanism in place that allows either only fault
> > > mode jobs or non-faulting jobs on the same, what we call "engine
> > > group".
> > > A pagefault on an engine group would block or hamper progress of other
> > > jobs on that engine group.
> > > 
> > > So let's say a dma-fence job is submitted to an engine group that is
> > > currently running a faulting job. We'd then need to switch mode of the
> > > engine group and, in the exec ioctl we'd (explicitly without preempt-
> > > fences) preempt the faulting job before submitting the dma-fence job
> > > and publishing its fence. This preemption will incur a delay which is
> > > typically the delay of servicing any outstanding pagefaults. It's not
> > > ideal, but the best we can do, and it doesn't affect core memory
> > > management nor does it affect migration blits.
> > > 
> > > In the debugger case, this delay could be long due to breakpoints, and
> > > that's why enabling the debugger would sit behind a flag and not
> > > something default (I think this was discussed earlier in the thread).
> > > Still, core memory management would be unaffected, and also ofc the
> > > migration blits are completely independent.
> > 
> > Yeah, that sounds totally sane to me.
> > 
> 
> Nice, glad to see this part of the thread resolved.
> 
> Setting aside the peek/poke and FD PID duplication issues (which seem to
> be part of a larger discussion, with Joonas as the point of contact for
> that), we have another use case for this helper in my current series.
> 
> We use this interface to read a BO marked with a dumpable flag during a
> GPU hang in our error capture code. This is an internal KMD feature, not
> directly exposed to user space. Would adding this helper be acceptable
> for this use case? I can add kernel indicating the current restrictions
> of the helper (do not directly expose to user space) too if that would
> help.
> 

Christian - ping on above.

> Matt
> 
> > Sorry for the noise then. I didn't realized that you have two separate modes
> > of operation.
> > 
> > Going to reply on the other open questions separately.
> > 
> > Regards,
> > Christian.
> > 
> > > 
> > > /Thomas
> > > 
> > > > Thanks a lot for pointing this out,
> > > > Christian.
> > > > 
> > > > > Thanks,
> > > > > Thomas
> > > > > 
> > > > > > Regards,
> > > > > > Christian.
> > > > > > 
> > > > > > > Nor will there be any jobs with published dma-fences depending
> > > > > > > on
> > > > > > > the
> > > > > > > job blocked either temporarily by a pagefault or long-term by a
> > > > > > > debugger breakpoint.
> > > > > > > 
> > > > > > > /Thomas
> > > > > > > 
> > > > > > > 
> > > > > > > > If that is done and the memory pre-empt fence is serviced
> > > > > > > > even
> > > > > > > > for
> > > > > > > > debuggable contexts, do you have further concerns with the
> > > > > > > > presented
> > > > > > > > approach
> > > > > > > > from dma-buf and drm/sched perspective?
> > > > > > > > 
> > > > > > > > Regards, Joonas
> > > > > > > > 
> > > > > > > > > Regards,
> > > > > > > > > Christian.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >            This means that a breakpoint or core dump doesn't
> > > > > > > > > halt
> > > > > > > > > GPU
> > > > > > > > > threads, but
> > > > > > > > >            rather suspends them. E.g. all running wave data
> > > > > > > > > is
> > > > > > > > > collected into a state
> > > > > > > > >            bag which can be restored later on.
> > > > > > > > > 
> > > > > > > > >            I was under the impression that those long
> > > > > > > > > running
> > > > > > > > > compute
> > > > > > > > > threads do
> > > > > > > > >            exactly that, but when the hardware can't switch
> > > > > > > > > out
> > > > > > > > > the
> > > > > > > > > GPU thread/process
> > > > > > > > >            while in a break then that isn't the case.
> > > > > > > > > 
> > > > > > > > >            As long as you don't find a way to avoid that
> > > > > > > > > this
> > > > > > > > > patch
> > > > > > > > > set is a pretty
> > > > > > > > >            clear NAK from my side as DMA-buf and TTM
> > > > > > > > > maintainer.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >        I believe this is addressed above.
> > > > > > > > > 
> > > > > > > > >        Matt
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >            What might work is to keep the submission on the
> > > > > > > > > hardware
> > > > > > > > > in the break state
> > > > > > > > >            but forbid any memory access. This way you can
> > > > > > > > > signal
> > > > > > > > > your
> > > > > > > > > preemption fence
> > > > > > > > >            even when the hardware isn't made available.
> > > > > > > > > 
> > > > > > > > >            Before you continue XE setups a new pre-emption
> > > > > > > > > fence
> > > > > > > > > and
> > > > > > > > > makes sure that
> > > > > > > > >            all page tables etc... are up to date.
> > > > > > > > > 
> > > > > > > > >            Could be tricky to get this right if completion
> > > > > > > > > fence
> > > > > > > > > based
> > > > > > > > > submissions are
> > > > > > > > >            mixed in as well, but that gives you at least a
> > > > > > > > > direction
> > > > > > > > > you could
> > > > > > > > >            potentially go.
> > > > > > > > > 
> > > > > > > > >            Regards,
> > > > > > > > >            Christian.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                Regards, Joonas
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                    Regards,
> > > > > > > > >                    Christian.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                        Some wash-up thoughts from me below,
> > > > > > > > > but
> > > > > > > > > consider them fairly irrelevant
> > > > > > > > >                        since I think the main driver for
> > > > > > > > > these
> > > > > > > > > big
> > > > > > > > > questions here should be
> > > > > > > > >                        gdb/userspace.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                            Quoting Christian König (2024-11-
> > > > > > > > > 07
> > > > > > > > > 11:44:33)
> > > > > > > > > 
> > > > > > > > >                                Am 06.11.24 um 18:00 schrieb
> > > > > > > > > Matthew
> > > > > > > > > Brost:
> > > > > > > > > 
> > > > > > > > >                                      [SNIP]
> > > > > > > > > 
> > > > > > > > >                                      This is not a generic
> > > > > > > > > interface
> > > > > > > > > that anyone can freely access. The same
> > > > > > > > >                                      permissions used by
> > > > > > > > > ptrace
> > > > > > > > > are
> > > > > > > > > checked when opening such an interface.
> > > > > > > > >                                      See [1] [2].
> > > > > > > > > 
> > > > > > > > > [1]
> > > > > > > > > https://patchwork.freedesktop.org/patch/617470/?series=136572&r
> > > > > > > > > e
> > > > > > > > > v=2
> > > > > > > > > [2]
> > > > > > > > > https://patchwork.freedesktop.org/patch/617471/?series=136572&r
> > > > > > > > > e
> > > > > > > > > v=2
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                Thanks a lot for those
> > > > > > > > > pointers,
> > > > > > > > > that
> > > > > > > > > is exactly what I was looking for.
> > > > > > > > > 
> > > > > > > > >                                And yeah, it is what I
> > > > > > > > > feared. You
> > > > > > > > > are
> > > > > > > > > re-implementing existing functionality,
> > > > > > > > >                                but see below.
> > > > > > > > > 
> > > > > > > > >                            Could you elaborate on what this
> > > > > > > > > "existing
> > > > > > > > > functionality" exactly is?
> > > > > > > > >                            I do not think this functionality
> > > > > > > > > exists at
> > > > > > > > > this time.
> > > > > > > > > 
> > > > > > > > >                            The EU debugging architecture for
> > > > > > > > > Xe
> > > > > > > > > specifically avoids the need for GDB
> > > > > > > > >                            to attach with ptrace to the CPU
> > > > > > > > > process or
> > > > > > > > > interfere with the CPU process for
> > > > > > > > >                            the debugging via parasitic
> > > > > > > > > threads or
> > > > > > > > > so.
> > > > > > > > > 
> > > > > > > > >                            Debugger connection is opened to
> > > > > > > > > the
> > > > > > > > > DRM
> > > > > > > > > driver for given PID (which uses the
> > > > > > > > >                            ptrace may access check for now)
> > > > > > > > > after
> > > > > > > > > which the all DRM client of that
> > > > > > > > >                            PID are exposed to the debugger
> > > > > > > > > process.
> > > > > > > > > 
> > > > > > > > >                            What we want to expose via that
> > > > > > > > > debugger
> > > > > > > > > connection is the ability for GDB to
> > > > > > > > >                            read/write the different GPU VM
> > > > > > > > > address
> > > > > > > > > spaces (ppGTT for Intel GPUs) just like
> > > > > > > > >                            the EU threads would see them.
> > > > > > > > > Note
> > > > > > > > > that
> > > > > > > > > the layout of the ppGTT is
> > > > > > > > >                            completely up to the userspace
> > > > > > > > > driver
> > > > > > > > > to
> > > > > > > > > setup and is mostly only partially
> > > > > > > > >                            equal to the CPU address space.
> > > > > > > > > 
> > > > > > > > >                            Specifically as part of
> > > > > > > > > reading/writing the
> > > > > > > > > ppGTT for debugging purposes,
> > > > > > > > >                            there are deep flushes needed:
> > > > > > > > > for
> > > > > > > > > example
> > > > > > > > > flushing instruction cache
> > > > > > > > >                            when adding/removing breakpoints.
> > > > > > > > > 
> > > > > > > > >                            Maybe that will explain the
> > > > > > > > > background. I
> > > > > > > > > elaborate on this at the end some more.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                              kmap/vmap are
> > > > > > > > > used
> > > > > > > > > everywhere in the DRM subsystem to access BOs, so I’m
> > > > > > > > >                                              failing to see
> > > > > > > > > the
> > > > > > > > > problem with adding a simple helper based on existing
> > > > > > > > >                                              code.
> > > > > > > > > 
> > > > > > > > >                                          What#s possible and
> > > > > > > > > often
> > > > > > > > > done is to do kmap/vmap if you need to implement a
> > > > > > > > >                                          CPU copy for
> > > > > > > > > scanout for
> > > > > > > > > example or for copying/validating command buffers.
> > > > > > > > >                                          But that usually
> > > > > > > > > requires
> > > > > > > > > accessing the whole BO and has separate security
> > > > > > > > >                                          checks.
> > > > > > > > > 
> > > > > > > > >                                          When you want to
> > > > > > > > > access
> > > > > > > > > only
> > > > > > > > > a few bytes of a BO that sounds massively like
> > > > > > > > >                                          a peek/poke like
> > > > > > > > > interface
> > > > > > > > > and we have already rejected that more than once.
> > > > > > > > >                                          There even used to
> > > > > > > > > be
> > > > > > > > > standardized GEM IOCTLs for that which have been
> > > > > > > > >                                          removed by now.
> > > > > > > > > 
> > > > > > > > >                            Referring to the explanation at
> > > > > > > > > top:
> > > > > > > > > These
> > > > > > > > > IOCTL are not for the debugging target
> > > > > > > > >                            process to issue. The peek/poke
> > > > > > > > > interface
> > > > > > > > > is specifically for GDB only
> > > > > > > > >                            to facilitate the emulation of
> > > > > > > > > memory
> > > > > > > > > reads/writes on the GPU address
> > > > > > > > >                            space as they were done by EUs
> > > > > > > > > themselves.
> > > > > > > > > And to recap: for modifying
> > > > > > > > >                            instructions for example
> > > > > > > > > (add/remove
> > > > > > > > > breakpoint), extra level of cache flushing is
> > > > > > > > >                            needed which is not available to
> > > > > > > > > regular
> > > > > > > > > userspace.
> > > > > > > > > 
> > > > > > > > >                            I specifically discussed with
> > > > > > > > > Sima on
> > > > > > > > > the
> > > > > > > > > difference before moving forward with this
> > > > > > > > >                            design originally. If something
> > > > > > > > > has
> > > > > > > > > changed
> > > > > > > > > since then, I'm of course happy to rediscuss.
> > > > > > > > > 
> > > > > > > > >                            However, if this code can't be
> > > > > > > > > added,
> > > > > > > > > not
> > > > > > > > > sure how we would ever be able
> > > > > > > > >                            to implement core dumps for GPU
> > > > > > > > > threads/memory?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                          If you need to
> > > > > > > > > access
> > > > > > > > > BOs
> > > > > > > > > which are placed in not CPU accessible memory then
> > > > > > > > >                                          implement the
> > > > > > > > > access
> > > > > > > > > callback
> > > > > > > > > for ptrace, see amdgpu_ttm_access_memory for
> > > > > > > > >                                          an example how to
> > > > > > > > > do
> > > > > > > > > this.
> > > > > > > > > 
> > > > > > > > >                            As also mentioned above, we don't
> > > > > > > > > work
> > > > > > > > > via
> > > > > > > > > ptrace at all when it comes
> > > > > > > > >                            to debugging the EUs. The only
> > > > > > > > > thing
> > > > > > > > > used
> > > > > > > > > for now is the ptrace_may_access to
> > > > > > > > >                            implement similar access
> > > > > > > > > restrictions
> > > > > > > > > as
> > > > > > > > > ptrace has. This can be changed
> > > > > > > > >                            to something else if needed.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                      Ptrace access via
> > > > > > > > > vm_operations_struct.access → ttm_bo_vm_access.
> > > > > > > > > 
> > > > > > > > >                                      This series renames
> > > > > > > > > ttm_bo_vm_access to ttm_bo_access, with no code changes.
> > > > > > > > > 
> > > > > > > > >                                      The above function
> > > > > > > > > accesses
> > > > > > > > > a BO
> > > > > > > > > via kmap if it is in SYSTEM / TT,
> > > > > > > > >                                      which is existing code.
> > > > > > > > > 
> > > > > > > > >                                      This function is only
> > > > > > > > > exposed to
> > > > > > > > > user space via ptrace permissions.
> > > > > > > > > 
> > > > > > > > >                            Maybe this sentence is what
> > > > > > > > > caused the
> > > > > > > > > confusion.
> > > > > > > > > 
> > > > > > > > >                            Userspace is never exposed with
> > > > > > > > > peek/poke
> > > > > > > > > interface, only the debugger
> > > > > > > > >                            connection which is its own FD.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                      In this series, we
> > > > > > > > > implement
> > > > > > > > > a
> > > > > > > > > function [3] similar to
> > > > > > > > > amdgpu_ttm_access_memory for
> > > > > > > > > the
> > > > > > > > > TTM vfunc access_memory. What is
> > > > > > > > >                                      missing is non-visible
> > > > > > > > > CPU
> > > > > > > > > memory
> > > > > > > > > access, similar to
> > > > > > > > > amdgpu_ttm_access_memory_sdma.
> > > > > > > > > This will be addressed in a follow-up and
> > > > > > > > >                                      was omitted in this
> > > > > > > > > series
> > > > > > > > > given
> > > > > > > > > its complexity.
> > > > > > > > > 
> > > > > > > > >                                      So, this looks more or
> > > > > > > > > less
> > > > > > > > > identical to AMD's ptrace implementation,
> > > > > > > > >                                      but in GPU address
> > > > > > > > > space.
> > > > > > > > > Again,
> > > > > > > > > I fail to see what the problem is here.
> > > > > > > > >                                      What am I missing?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                The main question is why
> > > > > > > > > can't you
> > > > > > > > > use
> > > > > > > > > the existing interfaces directly?
> > > > > > > > > 
> > > > > > > > >                            We're not working on the CPU
> > > > > > > > > address
> > > > > > > > > space
> > > > > > > > > or BOs. We're working
> > > > > > > > >                            strictly on the GPU address space
> > > > > > > > > as
> > > > > > > > > would
> > > > > > > > > be seen by an EU thread if it
> > > > > > > > >                            accessed address X.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                Additional to the peek/poke
> > > > > > > > > interface
> > > > > > > > > of ptrace Linux has the pidfd_getfd
> > > > > > > > >                                system call, see
> > > > > > > > > here
> > > > > > > > > https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
> > > > > > > > > 
> > > > > > > > >                                The pidfd_getfd() allows to
> > > > > > > > > dup()
> > > > > > > > > the
> > > > > > > > > render node file descriptor into your gdb
> > > > > > > > >                                process. That in turn gives
> > > > > > > > > you
> > > > > > > > > all the
> > > > > > > > > access you need from gdb, including
> > > > > > > > >                                mapping BOs and command
> > > > > > > > > submission
> > > > > > > > > on
> > > > > > > > > behalf of the application.
> > > > > > > > > 
> > > > > > > > >                            We're not operating on the CPU
> > > > > > > > > address
> > > > > > > > > space nor are we operating on BOs
> > > > > > > > >                            (there is no concept of BO in the
> > > > > > > > > EU
> > > > > > > > > debug
> > > > > > > > > interface). Each VMA in the VM
> > > > > > > > >                            could come from anywhere, only
> > > > > > > > > the
> > > > > > > > > start
> > > > > > > > > address and size matter. And
> > > > > > > > >                            neither do we need to interfere
> > > > > > > > > with
> > > > > > > > > the
> > > > > > > > > command submission of the
> > > > > > > > >                            process under debug.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                As far as I can see that
> > > > > > > > > allows
> > > > > > > > > for the
> > > > > > > > > same functionality as the eudebug
> > > > > > > > >                                interface, just without any
> > > > > > > > > driver
> > > > > > > > > specific code messing with ptrace
> > > > > > > > >                                permissions and peek/poke
> > > > > > > > > interfaces.
> > > > > > > > > 
> > > > > > > > >                                So the question is still why
> > > > > > > > > do
> > > > > > > > > you
> > > > > > > > > need the whole eudebug interface in the
> > > > > > > > >                                first place? I might be
> > > > > > > > > missing
> > > > > > > > > something, but that seems to be superfluous
> > > > > > > > >                                from a high level view.
> > > > > > > > > 
> > > > > > > > >                            Recapping from above. It is to
> > > > > > > > > allow
> > > > > > > > > the
> > > > > > > > > debugging of EU threads per DRM
> > > > > > > > >                            client, completely independent of
> > > > > > > > > the
> > > > > > > > > CPU
> > > > > > > > > process. If ptrace_may_acces
> > > > > > > > >                            is the sore point, we could
> > > > > > > > > consider
> > > > > > > > > other
> > > > > > > > > permission checks, too. There
> > > > > > > > >                            is no other connection to ptrace
> > > > > > > > > in
> > > > > > > > > this
> > > > > > > > > architecture as single
> > > > > > > > >                            permission check to know if PID
> > > > > > > > > is
> > > > > > > > > fair
> > > > > > > > > game to access by debugger
> > > > > > > > >                            process.
> > > > > > > > > 
> > > > > > > > >                            Why no parasitic thread or
> > > > > > > > > ptrace:
> > > > > > > > > Going
> > > > > > > > > forward, binding the EU debugging to
> > > > > > > > >                            the DRM client would also pave
> > > > > > > > > way for
> > > > > > > > > being able to extend core kernel generated
> > > > > > > > >                            core dump with each DRM client's
> > > > > > > > > EU
> > > > > > > > > thread/memory dump. We have similar
> > > > > > > > >                            feature called "Offline core
> > > > > > > > > dump"
> > > > > > > > > enabled
> > > > > > > > > in the downstream public
> > > > > > > > >                            trees for i915, where we
> > > > > > > > > currently
> > > > > > > > > attach
> > > > > > > > > the EU thread dump to i915 error state
> > > > > > > > >                            and then later combine i915 error
> > > > > > > > > state
> > > > > > > > > with CPU core dump file with a
> > > > > > > > >                            tool.
> > > > > > > > > 
> > > > > > > > >                            This is relatively little amount
> > > > > > > > > of
> > > > > > > > > extra
> > > > > > > > > code, as this baseline series
> > > > > > > > >                            already introduces GDB the
> > > > > > > > > ability to
> > > > > > > > > perform the necessary actions.
> > > > > > > > >                            It's just the matter of kernel
> > > > > > > > > driver
> > > > > > > > > calling: "stop all threads", then
> > > > > > > > >                            copying the memory map and memory
> > > > > > > > > contents
> > > > > > > > > for GPU threads, just like is
> > > > > > > > >                            done for CPU threads.
> > > > > > > > > 
> > > > > > > > >                            With parasitic thread injection,
> > > > > > > > > not
> > > > > > > > > sure
> > > > > > > > > if there is such way forward,
> > > > > > > > >                            as it would seem to require to
> > > > > > > > > inject
> > > > > > > > > quite
> > > > > > > > > abit more logic to core kernel?
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                It's true that the AMD KFD
> > > > > > > > > part
> > > > > > > > > has
> > > > > > > > > still similar functionality, but that is
> > > > > > > > >                                because of the broken KFD
> > > > > > > > > design
> > > > > > > > > of
> > > > > > > > > tying driver state to the CPU process
> > > > > > > > >                                (which makes it inaccessible
> > > > > > > > > for
> > > > > > > > > gdb
> > > > > > > > > even with imported render node fd).
> > > > > > > > > 
> > > > > > > > >                                Both Sima and I (and
> > > > > > > > > partially
> > > > > > > > > Dave as
> > > > > > > > > well) have pushed back on the KFD
> > > > > > > > >                                approach. And the long term
> > > > > > > > > plan
> > > > > > > > > is to
> > > > > > > > > get rid of such device driver specific
> > > > > > > > >                                interface which re-implement
> > > > > > > > > existing
> > > > > > > > > functionality just differently.
> > > > > > > > > 
> > > > > > > > >                            Recapping, this series is not
> > > > > > > > > adding
> > > > > > > > > it
> > > > > > > > > back. The debugger connection
> > > > > > > > >                            is a separate FD from the DRM
> > > > > > > > > one,
> > > > > > > > > with
> > > > > > > > > separate IOCTL set. We don't allow
> > > > > > > > >                            the DRM FD any new operations
> > > > > > > > > based on
> > > > > > > > > ptrace is attached or not. We
> > > > > > > > >                            don't ever do that check even.
> > > > > > > > > 
> > > > > > > > >                            We only restrict the opening of
> > > > > > > > > the
> > > > > > > > > debugger connection to given PID with
> > > > > > > > >                            ptrace_may_access check for now.
> > > > > > > > > That
> > > > > > > > > can
> > > > > > > > > be changed to something else,
> > > > > > > > >                            if necessary.
> > > > > > > > > 
> > > > > > > > >                        Yeah I think unnecessarily tying gpu
> > > > > > > > > processes
> > > > > > > > > to cpu processes is a bad
> > > > > > > > >                        thing, least because even today all
> > > > > > > > > the
> > > > > > > > > svm
> > > > > > > > > discussions we have still hit
> > > > > > > > >                        clear use-cases, where a 1:1 match is
> > > > > > > > > not
> > > > > > > > > wanted (like multiple gpu svm
> > > > > > > > >                        sections with offsets). Not even
> > > > > > > > > speaking
> > > > > > > > > of
> > > > > > > > > all the gpu usecases where
> > > > > > > > >                        the gpu vm space is still entirely
> > > > > > > > > independent
> > > > > > > > > of the cpu side.
> > > > > > > > > 
> > > > > > > > >                        So that's why I think this entirely
> > > > > > > > > separate
> > > > > > > > > approach looks like the right
> > > > > > > > >                        one, with ptrace_may_access as the
> > > > > > > > > access
> > > > > > > > > control check to make sure we
> > > > > > > > >                        match ptrace on the cpu side.
> > > > > > > > > 
> > > > > > > > >                        But there's very obviously a bikeshed
> > > > > > > > > to
> > > > > > > > > be had
> > > > > > > > > on what the actual uapi
> > > > > > > > >                        should look like, especially how gdb
> > > > > > > > > opens
> > > > > > > > > up a
> > > > > > > > > gpu debug access fd. But I
> > > > > > > > >                        also think that's not much on drm to
> > > > > > > > > decide,
> > > > > > > > > but whatever gdb wants. And
> > > > > > > > >                        then we aim for some consistency on
> > > > > > > > > that
> > > > > > > > > lookup/access control part
> > > > > > > > >                        (ideally, I might be missing some
> > > > > > > > > reasons
> > > > > > > > > why
> > > > > > > > > this is a bad idea) across
> > > > > > > > >                        drm drivers.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                So you need to have a really
> > > > > > > > > really
> > > > > > > > > good explanation why the eudebug interface
> > > > > > > > >                                is actually necessary.
> > > > > > > > > 
> > > > > > > > >                            TL;DR The main point is to
> > > > > > > > > decouple
> > > > > > > > > the
> > > > > > > > > debugging of the EU workloads from the
> > > > > > > > >                            debugging of the CPU process.
> > > > > > > > > This
> > > > > > > > > avoids
> > > > > > > > > the interference with the CPU process with
> > > > > > > > >                            parasitic thread injection.
> > > > > > > > > Further
> > > > > > > > > this
> > > > > > > > > also allows generating a core dump
> > > > > > > > >                            without any GDB connected. There
> > > > > > > > > are
> > > > > > > > > also
> > > > > > > > > many other smaller pros/cons
> > > > > > > > >                            which can be discussed but for
> > > > > > > > > the
> > > > > > > > > context
> > > > > > > > > of this patch, this is the
> > > > > > > > >                            main one.
> > > > > > > > > 
> > > > > > > > >                            So unlike parasitic thread
> > > > > > > > > injection,
> > > > > > > > > we
> > > > > > > > > don't unlock any special IOCTL for
> > > > > > > > >                            the process under debug to be
> > > > > > > > > performed by
> > > > > > > > > the parasitic thread, but we
> > > > > > > > >                            allow the minimal set of
> > > > > > > > > operations to
> > > > > > > > > be
> > > > > > > > > performed by GDB as if those were
> > > > > > > > >                            done on the EUs themselves.
> > > > > > > > > 
> > > > > > > > >                            One can think of it like the
> > > > > > > > > minimal
> > > > > > > > > subset
> > > > > > > > > of ptrace but for EU threads,
> > > > > > > > >                            not the CPU threads. And thus,
> > > > > > > > > building on
> > > > > > > > > this it's possible to extend
> > > > > > > > >                            the core kernel generated core
> > > > > > > > > dumps
> > > > > > > > > with
> > > > > > > > > DRM specific extension which
> > > > > > > > >                            would contain the EU
> > > > > > > > > thread/memory
> > > > > > > > > dump.
> > > > > > > > > 
> > > > > > > > >                        It might be good to document (in that
> > > > > > > > > debugging
> > > > > > > > > doc patch probably) why
> > > > > > > > >                        thread injection is not a great
> > > > > > > > > option,
> > > > > > > > > and why
> > > > > > > > > the tradeoffs for
> > > > > > > > >                        debugging are different than for for
> > > > > > > > > checkpoint/restore, where with CRIU
> > > > > > > > >                        we landed on doing most of this in
> > > > > > > > > userspace,
> > > > > > > > > and often requiring
> > > > > > > > >                        injection threads to make it all
> > > > > > > > > work.
> > > > > > > > > 
> > > > > > > > >                        Cheers, Sima
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                            Regards, Joonas
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                Regards,
> > > > > > > > >                                Christian.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                      Matt
> > > > > > > > > 
> > > > > > > > > [3]
> > > > > > > > > https://patchwork.freedesktop.org/patch/622520/?series=140200&r
> > > > > > > > > e
> > > > > > > > > v=6
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                          Regards,
> > > > > > > > >                                          Christian.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                              Matt
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >                                                  Regards,
> > > > > > > > >                                                  Christian.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >