[PATCH 1/4] drm/xe: Introduce a simple wedged state

Lucas De Marchi lucas.demarchi at intel.com
Tue Apr 16 17:03:14 UTC 2024


On Tue, Apr 09, 2024 at 06:15:04PM GMT, Rodrigo Vivi wrote:
>Introduce a very simple 'wedged' state where any attempt
>to access the GPU is entirely blocked.
>
>On some critical cases, like on gt_reset failure, we need to
>block any other attempt to use the GPU. Otherwise we are at
>a risk of reaching cases that would force us to reboot the machine.
>
>So, when this cases are identified we corner and block any GPU
>access. No IOCTL and not even another GT reset should be attempted.
>
>The 'wedged' state in Xe is an end state with no way back.
>Only a device "re-probe" (unbind + bind) can restore the GPU access.
>
>v2: - s/wedged/busted (Lucas)
>    - use unbind+bind instead of module reload (Lucas)
>    - added more info on unbind operations and instruction on bug report
>    - only print the message once.
>
>v3: - s/busted/wedged (Ashutosh, Tvrtko, Thomas)
>    - don't assume user has sudo and tee available (Lucas)
>
>v4: - remove unnecessary cases around ct communication or migration.
>
>Cc: Ashutosh Dixit <ashutosh.dixit at intel.com>
>Cc: Tvrtko Ursulin <tursulin at ursulin.net>
>Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>Cc: Anshuman Gupta <anshuman.gupta at intel.com>
>Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray at intel.com> #v2
>Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com> #v2

my r-b remains for this version.

thanks
Lucas De Marchi


More information about the Intel-xe mailing list