[PATCH] drm/amdgpu: fix a possible NULL dereference when attempting reset

Christian König ckoenig.leichtzumerken at gmail.com
Wed Oct 18 07:17:39 UTC 2017


Yeah, we already stumbled over that internally as well.

The patch is incorrect, the problem is that we forgot to keep an extra 
reference on the s_fence to avoid freeing it to early.

The correct fix should be on Alex public branch by the end of today.

Regards,
Christian.

Am 17.10.2017 um 20:41 schrieb Darren Salt:
> [drm:gfx_v8_0_priv_reg_irq] *ERROR* Illegal register access in command stream
> [drm] IP block:gmc_v8_0 is hung!
> [drm] IP block:gfx_v8_0 is hung!
>
> BUG: unable to handle kernel NULL pointer dereference at 00000000000000d8
> IP: amd_sched_hw_job_reset+0x3c/0x9a
> PGD 3aedd8067 P4D 3aedd8067 PUD 3aedd9067 PMD 0
> Oops: 0000 [#1] PREEMPT SMP
> Modules linked in: cpufreq_conservative bnep bluetooth ecdh_generic serial_ir snd_hrtimer snd_seq_dummy snd_seq_midi snd_rawmidi snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device nct6775 em28xx_rc tda18271 cxd2820r joydev em28xx_dvb usb_storage em28xx tveeprom snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_pcm_oss snd_mixer_oss sp5100_tco sg snd_pcm snd_timer
> CPU: 0 PID: 34 Comm: kworker/0:1 Not tainted 4.14.0-rc4+ #3
> Hardware name: System manufacturer System Product Name/A88X-PRO, BIOS 1602 12/04/2014
> Workqueue: events amdgpu_irq_reset_work_func
> task: ffff88041caa44c0 task.stack: ffffc90000144000
> RIP: 0010:amd_sched_hw_job_reset+0x3c/0x9a
> RSP: 0018:ffffc90000147de8 EFLAGS: 00010293
> RAX: ffff88031adee850 RBX: ffff88031adee800 RCX: 0000000000000001
> RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88040db14c08
> RBP: ffffc90000147e08 R08: ffff88040dac71d0 R09: ffff88040dac71c0
> R10: 0000000000000000 R11: ffff880409575038 R12: ffff88040db14c08
> R13: ffff88040db14bf8 R14: ffff88040db14b50 R15: ffff8803e1575580
> FS:  0000000000000000(0000) GS:ffff88041ec00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000000d8 CR3: 00000003aec3e000 CR4: 00000000000406f0
> Call Trace:
>   amdgpu_gpu_reset+0x9b/0x55b
>   ? _raw_spin_unlock_irq+0x12/0x24
>   amdgpu_irq_reset_work_func+0x16/0x18
>   process_one_work+0x124/0x1db
>   ? rescuer_thread+0x26a/0x26a
>   worker_thread+0x19d/0x250
>   ? rescuer_thread+0x26a/0x26a
>   kthread+0xf1/0xf6
> Code: 00 41 54 4c 8d a7 b8 00 00 00 53 4c 89 e7 e8 d9 a9 26 00 49 8b 86 b0 00 00 00 48 8d 58 b0 48 8d 43 50 4c 39 e8 74 51 48 8b 73 10 <48> 8b be d8 00 00 00 48 85 ff 74 37 48 81 c6 c0 00 00 00 e8 6c
> RIP: amd_sched_hw_job_reset+0x3c/0x9a RSP: ffffc90000147de8
> CR2: 00000000000000d8
>
> Signed-off-by: Darren Salt <devspam at moreofthesa.me.uk>
> ---
>   drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> index 08e1332d814a..10749c0c0ca0 100644
> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> @@ -427,7 +427,7 @@ void amd_sched_hw_job_reset(struct amd_gpu_scheduler *sched)
>   
>   	spin_lock(&sched->job_list_lock);
>   	list_for_each_entry_reverse(s_job, &sched->ring_mirror_list, node) {
> -		if (s_job->s_fence->parent &&
> +		if (s_job->s_fence && s_job->s_fence->parent &&
>   		    dma_fence_remove_callback(s_job->s_fence->parent,
>   					      &s_job->s_fence->cb)) {
>   			dma_fence_put(s_job->s_fence->parent);




More information about the amd-gfx mailing list