[PATCH v6 2/8] drm/ttm: Add ttm_bo_access

Wed Nov 13 11:42:35 UTC 2024

Am 13.11.24 um 11:44 schrieb Thomas Hellström:
> On Wed, 2024-11-13 at 09:37 +0100, Christian König wrote:
>> Am 12.11.24 um 17:33 schrieb Thomas Hellström:
>>> [SNIP]
>>>>>> This has been extensively discussed already, but was expected
>>>>>> to
>>>>>> really
>>>>>> only be needed for low-on-memory scenarios. However it now
>>>>>> seems
>>>>>> like
>>>>>> the need is much earlier due to the random userptr page
>>>>>> joining
>>>>>> by
>>>>>> core
>>>>>> mm.
>>>>> Just to clarify here:
>>>>>     
>>>>> In Long-Running mode with recoverable pagefaults enabled we
>>>>> don't
>>>>> have
>>>>> any preempt-fences, but rather just zap the PTEs pointing to
>>>>> the
>>>>> affected memory and flush TLB. So from a memory resource POW a
>>>>> breakpoint should be safe, and no mmu notifier nor shrinker
>>>>> will be
>>>>> blocked.
>>>> That sounds like a HMM based approach which would clearly work.
>>>>
>>>> But where is that? I don't see any HMM based approach anywhere in
>>>> the
>>>> XE
>>>> driver.
>>> This is a mode that uses recoverable pagefaults to fault either
>>> full
>>> userptr or full bos, and used with DRM_XE_VM_CRATE_FLAG_FAULT_MODE.
>>> (not SVM)!
>>>
>>> userptrs in xe are bo-less, and using the vm's resv, but otherwise
>>> using hmm similar to amdgpu: xe_hmm.c
>> Yeah, I have seen that one.
>>
>>> fault servicing:
>>> xe_gt_pagefault.c
>>>
>>> PTE zapping on eviction and notifier:
>>> xe_vm_invalidate_vma(), xe_vm.c
>> Ah, that was the stuff I was missing.
>>
>> So the implementation in xe_preempt_fence.c is just for graphics
>> submissions? That would make the whole thing much easier to handle.
> Actually it's not, it's intended for long-running mode, but as a
> consequence the debugger would be allowed only in fault mode.

Make sense, yes.

>> The only remaining question I can then see is if long running
>> submissions with DRM_XE_VM_CRATE_FLAG_FAULT_MODE could potentially
>> block
>> graphics submissions without this flag from accessing the hardware?
> Yes and no. We have a mechanism in place that allows either only fault
> mode jobs or non-faulting jobs on the same, what we call "engine
> group".
> A pagefault on an engine group would block or hamper progress of other
> jobs on that engine group.
>
> So let's say a dma-fence job is submitted to an engine group that is
> currently running a faulting job. We'd then need to switch mode of the
> engine group and, in the exec ioctl we'd (explicitly without preempt-
> fences) preempt the faulting job before submitting the dma-fence job
> and publishing its fence. This preemption will incur a delay which is
> typically the delay of servicing any outstanding pagefaults. It's not
> ideal, but the best we can do, and it doesn't affect core memory
> management nor does it affect migration blits.
>
> In the debugger case, this delay could be long due to breakpoints, and
> that's why enabling the debugger would sit behind a flag and not
> something default (I think this was discussed earlier in the thread).
> Still, core memory management would be unaffected, and also ofc the
> migration blits are completely independent.

Yeah, that sounds totally sane to me.

Sorry for the noise then. I didn't realized that you have two separate 
modes of operation.

Going to reply on the other open questions separately.

Regards,
Christian.

>
> /Thomas
>
>> Thanks a lot for pointing this out,
>> Christian.
>>
>>> Thanks,
>>> Thomas
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Nor will there be any jobs with published dma-fences depending
>>>>> on
>>>>> the
>>>>> job blocked either temporarily by a pagefault or long-term by a
>>>>> debugger breakpoint.
>>>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>> If that is done and the memory pre-empt fence is serviced
>>>>>> even
>>>>>> for
>>>>>> debuggable contexts, do you have further concerns with the
>>>>>> presented
>>>>>> approach
>>>>>> from dma-buf and drm/sched perspective?
>>>>>>
>>>>>> Regards, Joonas
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>
>>>>>>>            This means that a breakpoint or core dump doesn't
>>>>>>> halt
>>>>>>> GPU
>>>>>>> threads, but
>>>>>>>            rather suspends them. E.g. all running wave data
>>>>>>> is
>>>>>>> collected into a state
>>>>>>>            bag which can be restored later on.
>>>>>>>
>>>>>>>            I was under the impression that those long
>>>>>>> running
>>>>>>> compute
>>>>>>> threads do
>>>>>>>            exactly that, but when the hardware can't switch
>>>>>>> out
>>>>>>> the
>>>>>>> GPU thread/process
>>>>>>>            while in a break then that isn't the case.
>>>>>>>
>>>>>>>            As long as you don't find a way to avoid that
>>>>>>> this
>>>>>>> patch
>>>>>>> set is a pretty
>>>>>>>            clear NAK from my side as DMA-buf and TTM
>>>>>>> maintainer.
>>>>>>>
>>>>>>>
>>>>>>>        I believe this is addressed above.
>>>>>>>
>>>>>>>        Matt
>>>>>>>
>>>>>>>
>>>>>>>            What might work is to keep the submission on the
>>>>>>> hardware
>>>>>>> in the break state
>>>>>>>            but forbid any memory access. This way you can
>>>>>>> signal
>>>>>>> your
>>>>>>> preemption fence
>>>>>>>            even when the hardware isn't made available.
>>>>>>>
>>>>>>>            Before you continue XE setups a new pre-emption
>>>>>>> fence
>>>>>>> and
>>>>>>> makes sure that
>>>>>>>            all page tables etc... are up to date.
>>>>>>>
>>>>>>>            Could be tricky to get this right if completion
>>>>>>> fence
>>>>>>> based
>>>>>>> submissions are
>>>>>>>            mixed in as well, but that gives you at least a
>>>>>>> direction
>>>>>>> you could
>>>>>>>            potentially go.
>>>>>>>
>>>>>>>            Regards,
>>>>>>>            Christian.
>>>>>>>
>>>>>>>
>>>>>>>                Regards, Joonas
>>>>>>>
>>>>>>>
>>>>>>>                    Regards,
>>>>>>>                    Christian.
>>>>>>>
>>>>>>>
>>>>>>>                        Some wash-up thoughts from me below,
>>>>>>> but
>>>>>>> consider them fairly irrelevant
>>>>>>>                        since I think the main driver for
>>>>>>> these
>>>>>>> big
>>>>>>> questions here should be
>>>>>>>                        gdb/userspace.
>>>>>>>
>>>>>>>
>>>>>>>                            Quoting Christian König (2024-11-
>>>>>>> 07
>>>>>>> 11:44:33)
>>>>>>>
>>>>>>>                                Am 06.11.24 um 18:00 schrieb
>>>>>>> Matthew
>>>>>>> Brost:
>>>>>>>
>>>>>>>                                      [SNIP]
>>>>>>>
>>>>>>>                                      This is not a generic
>>>>>>> interface
>>>>>>> that anyone can freely access. The same
>>>>>>>                                      permissions used by
>>>>>>> ptrace
>>>>>>> are
>>>>>>> checked when opening such an interface.
>>>>>>>                                      See [1] [2].
>>>>>>>
>>>>>>>                                     
>>>>>>> [1]
>>>>>>> https://patchwork.freedesktop.org/patch/617470/?series=136572&r
>>>>>>> e
>>>>>>> v=2
>>>>>>>                                     
>>>>>>> [2]
>>>>>>> https://patchwork.freedesktop.org/patch/617471/?series=136572&r
>>>>>>> e
>>>>>>> v=2
>>>>>>>
>>>>>>>
>>>>>>>                                Thanks a lot for those
>>>>>>> pointers,
>>>>>>> that
>>>>>>> is exactly what I was looking for.
>>>>>>>
>>>>>>>                                And yeah, it is what I
>>>>>>> feared. You
>>>>>>> are
>>>>>>> re-implementing existing functionality,
>>>>>>>                                but see below.
>>>>>>>
>>>>>>>                            Could you elaborate on what this
>>>>>>> "existing
>>>>>>> functionality" exactly is?
>>>>>>>                            I do not think this functionality
>>>>>>> exists at
>>>>>>> this time.
>>>>>>>
>>>>>>>                            The EU debugging architecture for
>>>>>>> Xe
>>>>>>> specifically avoids the need for GDB
>>>>>>>                            to attach with ptrace to the CPU
>>>>>>> process or
>>>>>>> interfere with the CPU process for
>>>>>>>                            the debugging via parasitic
>>>>>>> threads or
>>>>>>> so.
>>>>>>>
>>>>>>>                            Debugger connection is opened to
>>>>>>> the
>>>>>>> DRM
>>>>>>> driver for given PID (which uses the
>>>>>>>                            ptrace may access check for now)
>>>>>>> after
>>>>>>> which the all DRM client of that
>>>>>>>                            PID are exposed to the debugger
>>>>>>> process.
>>>>>>>
>>>>>>>                            What we want to expose via that
>>>>>>> debugger
>>>>>>> connection is the ability for GDB to
>>>>>>>                            read/write the different GPU VM
>>>>>>> address
>>>>>>> spaces (ppGTT for Intel GPUs) just like
>>>>>>>                            the EU threads would see them.
>>>>>>> Note
>>>>>>> that
>>>>>>> the layout of the ppGTT is
>>>>>>>                            completely up to the userspace
>>>>>>> driver
>>>>>>> to
>>>>>>> setup and is mostly only partially
>>>>>>>                            equal to the CPU address space.
>>>>>>>
>>>>>>>                            Specifically as part of
>>>>>>> reading/writing the
>>>>>>> ppGTT for debugging purposes,
>>>>>>>                            there are deep flushes needed:
>>>>>>> for
>>>>>>> example
>>>>>>> flushing instruction cache
>>>>>>>                            when adding/removing breakpoints.
>>>>>>>
>>>>>>>                            Maybe that will explain the
>>>>>>> background. I
>>>>>>> elaborate on this at the end some more.
>>>>>>>
>>>>>>>
>>>>>>>                                              kmap/vmap are
>>>>>>> used
>>>>>>> everywhere in the DRM subsystem to access BOs, so I’m
>>>>>>>                                              failing to see
>>>>>>> the
>>>>>>> problem with adding a simple helper based on existing
>>>>>>>                                              code.
>>>>>>>
>>>>>>>                                          What#s possible and
>>>>>>> often
>>>>>>> done is to do kmap/vmap if you need to implement a
>>>>>>>                                          CPU copy for
>>>>>>> scanout for
>>>>>>> example or for copying/validating command buffers.
>>>>>>>                                          But that usually
>>>>>>> requires
>>>>>>> accessing the whole BO and has separate security
>>>>>>>                                          checks.
>>>>>>>
>>>>>>>                                          When you want to
>>>>>>> access
>>>>>>> only
>>>>>>> a few bytes of a BO that sounds massively like
>>>>>>>                                          a peek/poke like
>>>>>>> interface
>>>>>>> and we have already rejected that more than once.
>>>>>>>                                          There even used to
>>>>>>> be
>>>>>>> standardized GEM IOCTLs for that which have been
>>>>>>>                                          removed by now.
>>>>>>>
>>>>>>>                            Referring to the explanation at
>>>>>>> top:
>>>>>>> These
>>>>>>> IOCTL are not for the debugging target
>>>>>>>                            process to issue. The peek/poke
>>>>>>> interface
>>>>>>> is specifically for GDB only
>>>>>>>                            to facilitate the emulation of
>>>>>>> memory
>>>>>>> reads/writes on the GPU address
>>>>>>>                            space as they were done by EUs
>>>>>>> themselves.
>>>>>>> And to recap: for modifying
>>>>>>>                            instructions for example
>>>>>>> (add/remove
>>>>>>> breakpoint), extra level of cache flushing is
>>>>>>>                            needed which is not available to
>>>>>>> regular
>>>>>>> userspace.
>>>>>>>
>>>>>>>                            I specifically discussed with
>>>>>>> Sima on
>>>>>>> the
>>>>>>> difference before moving forward with this
>>>>>>>                            design originally. If something
>>>>>>> has
>>>>>>> changed
>>>>>>> since then, I'm of course happy to rediscuss.
>>>>>>>
>>>>>>>                            However, if this code can't be
>>>>>>> added,
>>>>>>> not
>>>>>>> sure how we would ever be able
>>>>>>>                            to implement core dumps for GPU
>>>>>>> threads/memory?
>>>>>>>
>>>>>>>
>>>>>>>                                          If you need to
>>>>>>> access
>>>>>>> BOs
>>>>>>> which are placed in not CPU accessible memory then
>>>>>>>                                          implement the
>>>>>>> access
>>>>>>> callback
>>>>>>> for ptrace, see amdgpu_ttm_access_memory for
>>>>>>>                                          an example how to
>>>>>>> do
>>>>>>> this.
>>>>>>>
>>>>>>>                            As also mentioned above, we don't
>>>>>>> work
>>>>>>> via
>>>>>>> ptrace at all when it comes
>>>>>>>                            to debugging the EUs. The only
>>>>>>> thing
>>>>>>> used
>>>>>>> for now is the ptrace_may_access to
>>>>>>>                            implement similar access
>>>>>>> restrictions
>>>>>>> as
>>>>>>> ptrace has. This can be changed
>>>>>>>                            to something else if needed.
>>>>>>>
>>>>>>>
>>>>>>>                                      Ptrace access via
>>>>>>> vm_operations_struct.access → ttm_bo_vm_access.
>>>>>>>
>>>>>>>                                      This series renames
>>>>>>> ttm_bo_vm_access to ttm_bo_access, with no code changes.
>>>>>>>
>>>>>>>                                      The above function
>>>>>>> accesses
>>>>>>> a BO
>>>>>>> via kmap if it is in SYSTEM / TT,
>>>>>>>                                      which is existing code.
>>>>>>>
>>>>>>>                                      This function is only
>>>>>>> exposed to
>>>>>>> user space via ptrace permissions.
>>>>>>>
>>>>>>>                            Maybe this sentence is what
>>>>>>> caused the
>>>>>>> confusion.
>>>>>>>
>>>>>>>                            Userspace is never exposed with
>>>>>>> peek/poke
>>>>>>> interface, only the debugger
>>>>>>>                            connection which is its own FD.
>>>>>>>
>>>>>>>
>>>>>>>                                      In this series, we
>>>>>>> implement
>>>>>>> a
>>>>>>> function [3] similar to
>>>>>>>                                     
>>>>>>> amdgpu_ttm_access_memory for
>>>>>>> the
>>>>>>> TTM vfunc access_memory. What is
>>>>>>>                                      missing is non-visible
>>>>>>> CPU
>>>>>>> memory
>>>>>>> access, similar to
>>>>>>>                                     
>>>>>>> amdgpu_ttm_access_memory_sdma.
>>>>>>> This will be addressed in a follow-up and
>>>>>>>                                      was omitted in this
>>>>>>> series
>>>>>>> given
>>>>>>> its complexity.
>>>>>>>
>>>>>>>                                      So, this looks more or
>>>>>>> less
>>>>>>> identical to AMD's ptrace implementation,
>>>>>>>                                      but in GPU address
>>>>>>> space.
>>>>>>> Again,
>>>>>>> I fail to see what the problem is here.
>>>>>>>                                      What am I missing?
>>>>>>>
>>>>>>>
>>>>>>>                                The main question is why
>>>>>>> can't you
>>>>>>> use
>>>>>>> the existing interfaces directly?
>>>>>>>
>>>>>>>                            We're not working on the CPU
>>>>>>> address
>>>>>>> space
>>>>>>> or BOs. We're working
>>>>>>>                            strictly on the GPU address space
>>>>>>> as
>>>>>>> would
>>>>>>> be seen by an EU thread if it
>>>>>>>                            accessed address X.
>>>>>>>
>>>>>>>
>>>>>>>                                Additional to the peek/poke
>>>>>>> interface
>>>>>>> of ptrace Linux has the pidfd_getfd
>>>>>>>                                system call, see
>>>>>>> here
>>>>>>> https://man7.org/linux/man-pages/man2/pidfd_getfd.2.html.
>>>>>>>
>>>>>>>                                The pidfd_getfd() allows to
>>>>>>> dup()
>>>>>>> the
>>>>>>> render node file descriptor into your gdb
>>>>>>>                                process. That in turn gives
>>>>>>> you
>>>>>>> all the
>>>>>>> access you need from gdb, including
>>>>>>>                                mapping BOs and command
>>>>>>> submission
>>>>>>> on
>>>>>>> behalf of the application.
>>>>>>>
>>>>>>>                            We're not operating on the CPU
>>>>>>> address
>>>>>>> space nor are we operating on BOs
>>>>>>>                            (there is no concept of BO in the
>>>>>>> EU
>>>>>>> debug
>>>>>>> interface). Each VMA in the VM
>>>>>>>                            could come from anywhere, only
>>>>>>> the
>>>>>>> start
>>>>>>> address and size matter. And
>>>>>>>                            neither do we need to interfere
>>>>>>> with
>>>>>>> the
>>>>>>> command submission of the
>>>>>>>                            process under debug.
>>>>>>>
>>>>>>>
>>>>>>>                                As far as I can see that
>>>>>>> allows
>>>>>>> for the
>>>>>>> same functionality as the eudebug
>>>>>>>                                interface, just without any
>>>>>>> driver
>>>>>>> specific code messing with ptrace
>>>>>>>                                permissions and peek/poke
>>>>>>> interfaces.
>>>>>>>
>>>>>>>                                So the question is still why
>>>>>>> do
>>>>>>> you
>>>>>>> need the whole eudebug interface in the
>>>>>>>                                first place? I might be
>>>>>>> missing
>>>>>>> something, but that seems to be superfluous
>>>>>>>                                from a high level view.
>>>>>>>
>>>>>>>                            Recapping from above. It is to
>>>>>>> allow
>>>>>>> the
>>>>>>> debugging of EU threads per DRM
>>>>>>>                            client, completely independent of
>>>>>>> the
>>>>>>> CPU
>>>>>>> process. If ptrace_may_acces
>>>>>>>                            is the sore point, we could
>>>>>>> consider
>>>>>>> other
>>>>>>> permission checks, too. There
>>>>>>>                            is no other connection to ptrace
>>>>>>> in
>>>>>>> this
>>>>>>> architecture as single
>>>>>>>                            permission check to know if PID
>>>>>>> is
>>>>>>> fair
>>>>>>> game to access by debugger
>>>>>>>                            process.
>>>>>>>
>>>>>>>                            Why no parasitic thread or
>>>>>>> ptrace:
>>>>>>> Going
>>>>>>> forward, binding the EU debugging to
>>>>>>>                            the DRM client would also pave
>>>>>>> way for
>>>>>>> being able to extend core kernel generated
>>>>>>>                            core dump with each DRM client's
>>>>>>> EU
>>>>>>> thread/memory dump. We have similar
>>>>>>>                            feature called "Offline core
>>>>>>> dump"
>>>>>>> enabled
>>>>>>> in the downstream public
>>>>>>>                            trees for i915, where we
>>>>>>> currently
>>>>>>> attach
>>>>>>> the EU thread dump to i915 error state
>>>>>>>                            and then later combine i915 error
>>>>>>> state
>>>>>>> with CPU core dump file with a
>>>>>>>                            tool.
>>>>>>>
>>>>>>>                            This is relatively little amount
>>>>>>> of
>>>>>>> extra
>>>>>>> code, as this baseline series
>>>>>>>                            already introduces GDB the
>>>>>>> ability to
>>>>>>> perform the necessary actions.
>>>>>>>                            It's just the matter of kernel
>>>>>>> driver
>>>>>>> calling: "stop all threads", then
>>>>>>>                            copying the memory map and memory
>>>>>>> contents
>>>>>>> for GPU threads, just like is
>>>>>>>                            done for CPU threads.
>>>>>>>
>>>>>>>                            With parasitic thread injection,
>>>>>>> not
>>>>>>> sure
>>>>>>> if there is such way forward,
>>>>>>>                            as it would seem to require to
>>>>>>> inject
>>>>>>> quite
>>>>>>> abit more logic to core kernel?
>>>>>>>
>>>>>>>
>>>>>>>                                It's true that the AMD KFD
>>>>>>> part
>>>>>>> has
>>>>>>> still similar functionality, but that is
>>>>>>>                                because of the broken KFD
>>>>>>> design
>>>>>>> of
>>>>>>> tying driver state to the CPU process
>>>>>>>                                (which makes it inaccessible
>>>>>>> for
>>>>>>> gdb
>>>>>>> even with imported render node fd).
>>>>>>>
>>>>>>>                                Both Sima and I (and
>>>>>>> partially
>>>>>>> Dave as
>>>>>>> well) have pushed back on the KFD
>>>>>>>                                approach. And the long term
>>>>>>> plan
>>>>>>> is to
>>>>>>> get rid of such device driver specific
>>>>>>>                                interface which re-implement
>>>>>>> existing
>>>>>>> functionality just differently.
>>>>>>>
>>>>>>>                            Recapping, this series is not
>>>>>>> adding
>>>>>>> it
>>>>>>> back. The debugger connection
>>>>>>>                            is a separate FD from the DRM
>>>>>>> one,
>>>>>>> with
>>>>>>> separate IOCTL set. We don't allow
>>>>>>>                            the DRM FD any new operations
>>>>>>> based on
>>>>>>> ptrace is attached or not. We
>>>>>>>                            don't ever do that check even.
>>>>>>>
>>>>>>>                            We only restrict the opening of
>>>>>>> the
>>>>>>> debugger connection to given PID with
>>>>>>>                            ptrace_may_access check for now.
>>>>>>> That
>>>>>>> can
>>>>>>> be changed to something else,
>>>>>>>                            if necessary.
>>>>>>>
>>>>>>>                        Yeah I think unnecessarily tying gpu
>>>>>>> processes
>>>>>>> to cpu processes is a bad
>>>>>>>                        thing, least because even today all
>>>>>>> the
>>>>>>> svm
>>>>>>> discussions we have still hit
>>>>>>>                        clear use-cases, where a 1:1 match is
>>>>>>> not
>>>>>>> wanted (like multiple gpu svm
>>>>>>>                        sections with offsets). Not even
>>>>>>> speaking
>>>>>>> of
>>>>>>> all the gpu usecases where
>>>>>>>                        the gpu vm space is still entirely
>>>>>>> independent
>>>>>>> of the cpu side.
>>>>>>>
>>>>>>>                        So that's why I think this entirely
>>>>>>> separate
>>>>>>> approach looks like the right
>>>>>>>                        one, with ptrace_may_access as the
>>>>>>> access
>>>>>>> control check to make sure we
>>>>>>>                        match ptrace on the cpu side.
>>>>>>>
>>>>>>>                        But there's very obviously a bikeshed
>>>>>>> to
>>>>>>> be had
>>>>>>> on what the actual uapi
>>>>>>>                        should look like, especially how gdb
>>>>>>> opens
>>>>>>> up a
>>>>>>> gpu debug access fd. But I
>>>>>>>                        also think that's not much on drm to
>>>>>>> decide,
>>>>>>> but whatever gdb wants. And
>>>>>>>                        then we aim for some consistency on
>>>>>>> that
>>>>>>> lookup/access control part
>>>>>>>                        (ideally, I might be missing some
>>>>>>> reasons
>>>>>>> why
>>>>>>> this is a bad idea) across
>>>>>>>                        drm drivers.
>>>>>>>
>>>>>>>
>>>>>>>                                So you need to have a really
>>>>>>> really
>>>>>>> good explanation why the eudebug interface
>>>>>>>                                is actually necessary.
>>>>>>>
>>>>>>>                            TL;DR The main point is to
>>>>>>> decouple
>>>>>>> the
>>>>>>> debugging of the EU workloads from the
>>>>>>>                            debugging of the CPU process.
>>>>>>> This
>>>>>>> avoids
>>>>>>> the interference with the CPU process with
>>>>>>>                            parasitic thread injection.
>>>>>>> Further
>>>>>>> this
>>>>>>> also allows generating a core dump
>>>>>>>                            without any GDB connected. There
>>>>>>> are
>>>>>>> also
>>>>>>> many other smaller pros/cons
>>>>>>>                            which can be discussed but for
>>>>>>> the
>>>>>>> context
>>>>>>> of this patch, this is the
>>>>>>>                            main one.
>>>>>>>
>>>>>>>                            So unlike parasitic thread
>>>>>>> injection,
>>>>>>> we
>>>>>>> don't unlock any special IOCTL for
>>>>>>>                            the process under debug to be
>>>>>>> performed by
>>>>>>> the parasitic thread, but we
>>>>>>>                            allow the minimal set of
>>>>>>> operations to
>>>>>>> be
>>>>>>> performed by GDB as if those were
>>>>>>>                            done on the EUs themselves.
>>>>>>>
>>>>>>>                            One can think of it like the
>>>>>>> minimal
>>>>>>> subset
>>>>>>> of ptrace but for EU threads,
>>>>>>>                            not the CPU threads. And thus,
>>>>>>> building on
>>>>>>> this it's possible to extend
>>>>>>>                            the core kernel generated core
>>>>>>> dumps
>>>>>>> with
>>>>>>> DRM specific extension which
>>>>>>>                            would contain the EU
>>>>>>> thread/memory
>>>>>>> dump.
>>>>>>>
>>>>>>>                        It might be good to document (in that
>>>>>>> debugging
>>>>>>> doc patch probably) why
>>>>>>>                        thread injection is not a great
>>>>>>> option,
>>>>>>> and why
>>>>>>> the tradeoffs for
>>>>>>>                        debugging are different than for for
>>>>>>> checkpoint/restore, where with CRIU
>>>>>>>                        we landed on doing most of this in
>>>>>>> userspace,
>>>>>>> and often requiring
>>>>>>>                        injection threads to make it all
>>>>>>> work.
>>>>>>>
>>>>>>>                        Cheers, Sima
>>>>>>>
>>>>>>>
>>>>>>>                            Regards, Joonas
>>>>>>>
>>>>>>>
>>>>>>>                                Regards,
>>>>>>>                                Christian.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                                      Matt
>>>>>>>
>>>>>>>                                     
>>>>>>> [3]
>>>>>>> https://patchwork.freedesktop.org/patch/622520/?series=140200&r
>>>>>>> e
>>>>>>> v=6
>>>>>>>
>>>>>>>
>>>>>>>                                          Regards,
>>>>>>>                                          Christian.
>>>>>>>
>>>>>>>
>>>>>>>                                              Matt
>>>>>>>
>>>>>>>
>>>>>>>                                                  Regards,
>>>>>>>                                                  Christian.
>>>>>>>
>>>>>>>
>>>>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-xe/attachments/20241113/6980c9ba/attachment-0001.htm>