[PATCH] drm/i915/gt: Add a delay to let engine resumes correctly
Gote, Nitin R
nitin.r.gote at intel.com
Wed Mar 5 07:45:31 UTC 2025
Hi Andi,
> Hi Nitin,
>
> On Mon, Feb 24, 2025 at 12:01:04PM +0530, Nitin Gote wrote:
> > Sometimes engine reset fails because the engine resumes from an
> > incorrect RING_HEAD. Engine head failed to set to zero even after
> > writing into it. This is a timing issue and we experimented different
> > values and found out that 20ms delay works best based on testing.
> >
> > So, add a 20ms delay to let engine resumes from correct RING_HEAD.
> >
> > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13650
> > Signed-off-by: Nitin Gote <nitin.r.gote at intel.com>
> > ---
> > drivers/gpu/drm/i915/gt/intel_ring_submission.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > index 6e9977b2d180..5576f000e965 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > @@ -365,6 +365,13 @@ static void reset_prepare(struct intel_engine_cs
> *engine)
> > ENGINE_READ_FW(engine, RING_HEAD),
> > ENGINE_READ_FW(engine, RING_TAIL),
> > ENGINE_READ_FW(engine, RING_START));
> > + /*
> > + * Sometimes engine head failed to set to zero even after writing
> into it.
> > + * Use 20ms delay to let engine resumes from correct
> RING_HEAD.
> > + * Experimented different values and determined that 20ms
> works best
> > + * based on testing.
> > + */
> > + mdelay(20);
>
> Is there any extremely strong reason for using mdelay here, rather than any other
> delay function?
>
> Andi
Yes. Firstly I checked with udelay(20000) and while testing a test for 1000 times,
a couple of times got an issue of "BUG: scheduling while atomic: i915_selftest/10313/0x00000201" from the scheduler.
Adding here a failure stack trace in case you want to take a look.
And that's why I used mdelay(20), where I have not seen this issue. I have tested with mdelay(20), thousands of times and it worked.
stack trace:
i915: Running intel_hangcheck_live_selftests/igt_reset_nop_engine
BUG: scheduling while atomic: i915_selftest/10313/0x00000201
1 lock held by i915_selftest/10313:
#0: ffff888102e011b0 (&dev->mutex){....}-{3:3}, at: __device_driver_lock+0x43/0x60
CPU: 4 UID: 0 PID: 10313 Comm: i915_selftest Tainted: G U 6.14.0-rc3-ci-drm-16154+ #1
Tainted: [U]=USER
Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013
Call Trace:
<TASK>
dump_stack_lvl+0xa0/0xc0
dump_stack+0x10/0x20
__schedule_bug+0x6c/0x90
__schedule+0x1a04/0x21a0
? lock_acquire+0xc7/0x300
? find_held_lock+0x31/0x90
? lock_release+0xd1/0x2a0
schedule+0x40/0x130
schedule_timeout+0x82/0x100
? __pfx_process_timeout+0x10/0x10
? msleep+0x13/0x50
msleep+0x3b/0x50
reset_prepare+0x10b/0x1d0 [i915]
reset_prepare_engine+0x31/0x40 [i915]
__intel_engine_reset_bh+0xac/0x230 [i915]
? intel_engine_reset+0x21/0x60 [i915]
intel_engine_reset+0x34/0x60 [i915]
igt_reset_nop_engine+0x22e/0x4e0 [i915]
__i915_subtests+0xb3/0x230 [i915]
? __pfx___intel_gt_live_teardown+0x10/0x10 [i915]
? __pfx___intel_gt_live_setup+0x10/0x10 [i915]
intel_hangcheck_live_selftests+0xc0/0x110 [i915]
__run_selftests+0xd4/0x1d0 [i915]
? acpi_dev_found+0x68/0x80
i915_live_selftests+0x53/0x90 [i915]
i915_pci_probe+0x118/0x210 [i915]
local_pci_probe+0x4b/0xb0
pci_device_probe+0xe7/0x270
really_probe+0xfb/0x390
__driver_probe_device+0x8a/0x170
driver_probe_device+0x23/0xb0
__driver_attach+0xc7/0x190
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x7f/0xd0
driver_attach+0x1e/0x30
bus_add_driver+0x146/0x280
driver_register+0x64/0x130
__pci_register_driver+0x7d/0x90
i915_pci_register_driver+0x23/0x30 [i915]
i915_init+0x37/0x120 [i915]
? __pfx_i915_init+0x10/0x10 [i915]
do_one_initcall+0x63/0x3d0
do_init_module+0x99/0x2b0
load_module+0x2313/0x27d0
init_module_from_file+0x9c/0xe0
? init_module_from_file+0x9c/0xe0
idempotent_init_module+0x1a5/0x2b0
__x64_sys_finit_module+0x63/0xc0
x64_sys_call+0x1b6f/0x2140
do_syscall_64+0x8f/0x170
? syscall_exit_to_user_mode+0x11a/0x300
? do_syscall_64+0x9b/0x170
? __fput+0x1cb/0x2f0
? syscall_exit_to_user_mode+0x11a/0x300
? do_syscall_64+0x9b/0x170
? ksys_read+0x70/0xf0
? syscall_exit_to_user_mode+0x11a/0x300
? do_syscall_64+0x9b/0x170
? seq_read_iter+0x216/0x470
? lock_release+0xd1/0x2a0
? __mutex_unlock_slowpath+0x41/0x300
? mutex_unlock+0x12/0x20
? seq_read_iter+0x216/0x470
? vfs_read+0x139/0x360
? vfs_read+0x139/0x360
? ksys_read+0x70/0xf0
? syscall_exit_to_user_mode+0x11a/0x300
? do_syscall_64+0x9b/0x170
? sysvec_apic_timer_interrupt+0x56/0xb0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7ab0b172725d
- Nitin
>
> > if (!stop_ring(engine)) {
> > drm_err(&engine->i915->drm,
> > "failed to set %s head to zero "
> > --
> > 2.25.1
More information about the Intel-gfx
mailing list