[PATCH] i915/drm/gvt: initialize CSB tail value with zero

Fri Aug 31 09:30:33 UTC 2018

On 2018.08.31 16:14:52 +0800, Xinyun Liu wrote:
> When run `./drv_hangman --run-subtest hangcheck-unterminated` with
> AcrnGT, vGPU reset falls into a dead loop because the original CSB tail
> value (0xF) was not updated correctly. In fact, the value should be zero
> after gpu reset caused by an invalid context. This dead loop also causes
> the kernel panic if there is some graphics workload running on the vGPU.
>

Is this guest kernel panic or host?

>  BUG: unable to handle kernel paging request at 00000000fffffffc
>  IP: process_csb+0x14a/0x2a0
>  PGD 0 P4D 0
>  Oops: 0002 [#1] PREEMPT SMP
>  Modules linked in: dwc3_pci dwc3 snd_usb_audio xhci_pci mei_me xhci_hcd snd_usbmidi_lib mei snd_hwdep hci_uart bluetooth ecdh_generic rfkill_gpio trusty_timer trusty_wall trusty_b
>  CPU: 0 PID: 1371 Comm: kworker/0:1H Tainted: P     U  W  O    4.14.61-quilt-2e5dc0ac-g0feae7d57171 #2
>  Hardware name:  ACRN-DM, BIOS 1.00 03/14/2014
>  Workqueue: events_highpri i915_error_reset
>  task: ffff88007cbc0040 task.stack: ffffc900010b0000
>  RIP: 0010:process_csb+0x14a/0x2a0
>  RSP: 0018:ffffc900010b3c90 EFLAGS: 00010206
>  RAX: 00000000fffffffc RBX: ffffc90001e02370 RCX: 0000000000000008
>  RDX: 0000000000000009 RSI: ffff88007c830308 RDI: 0000000000000000
>  RBP: ffffc900010b3cd8 R08: 0000000000000001 R09: 0000000000002370
>  R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007c758000
>  R13: 0000000000000007 R14: 0000000000000004 R15: ffff88007c830000
>  FS:  0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00000000fffffffc CR3: 00000000796e0000 CR4: 00000000003406f0
>  Call Trace:
>   ? wake_up_process+0x20/0x20
>   execlists_reset_prepare+0x65/0x120
>   i915_gem_reset_prepare_engine+0x28/0x40
>   i915_reset_engine+0x1e/0xe0
>   i915_handle_error+0x117/0x470
>   ? cpuacct_charge+0x81/0x90
>   ? _raw_spin_unlock_irq+0x1e/0x40
>   ? finish_task_switch+0x8d/0x1f0
>   i915_error_reset+0x32/0x40
>   process_one_work+0x186/0x3e0
>   worker_thread+0x3d/0x3b0
>   kthread+0x132/0x150
>   ? process_one_work+0x3e0/0x3e0
>   ? kthread_create_on_node+0x70/0x70
>   ret_from_fork+0x3a/0x50
>  Code: 00 00 44 89 00 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 89 d8 31 d2 45 31 f6 e9 57 ff ff ff 0f 1f 44 00 00 48 85 c0 74 13 <f0> ff 08 0f 88 ed 6f 5d 00 75 08 48 89 c7
>  RIP: process_csb+0x14a/0x2a0 RSP: ffffc900010b3c90
>  CR2: 00000000fffffffc
>  ---[ end trace 5751fb1d7b00b459 ]---
> 
> Link: https://lists.projectacrn.org/g/acrn-dev/message/11136
> Signed-off-by: Xinyun Liu <xinyun.liu at intel.com>
> ---
>  drivers/gpu/drm/i915/gvt/execlist.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/execlist.c b/drivers/gpu/drm/i915/gvt/execlist.c
> index 70494e394d2c..768e0b467a11 100644
> --- a/drivers/gpu/drm/i915/gvt/execlist.c
> +++ b/drivers/gpu/drm/i915/gvt/execlist.c
> @@ -523,7 +523,7 @@ static void init_vgpu_execlist(struct intel_vgpu *vgpu, int ring_id)
>  			_EL_OFFSET_STATUS_PTR);
>  	ctx_status_ptr.dw = vgpu_vreg(vgpu, ctx_status_ptr_reg);
>  	ctx_status_ptr.read_ptr = 0;
> -	ctx_status_ptr.write_ptr = 0x7;
> +	ctx_status_ptr.write_ptr = 0;
>  	vgpu_vreg(vgpu, ctx_status_ptr_reg) = ctx_status_ptr.dw;
>  }

I think we do follow HW definition for initial execlist status regs value,
I haven't double checked with spec, but that's just my memory. And after
vgpu reset, it should be back to initial state. Is there any wrong assumption
during reset handle? Or maybe you could find the real reason for panic?

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gvt-dev/attachments/20180831/49ab45e0/attachment.sig>