[PATCH] drm/amdgpu: fix the null pointer to get timeline by scheduler fence

Christian König ckoenig.leichtzumerken at gmail.com
Tue Aug 14 06:51:14 UTC 2018


That is fixed by "drm/scheduler: bind job earlier to scheduler".

Christian.

Am 13.08.2018 um 16:33 schrieb Andres Rodriguez:
> Any updates on this issue?
>
> Regards,
> Andres
>
> On 2018-08-08 03:10 AM, Christian König wrote:
>> Yeah that is a known issue, but this solution is not correct either.
>>
>> See the scheduler where the job is execute on is simply not 
>> determined yet when we want to trace it.
>>
>> So using the scheduler name from the entity is wrong as well.
>>
>> We should probably move the reschedule from 
>> drm_sched_entity_push_job() to drm_sched_job_init() to fix that.
>>
>> I will prepare a patch for that today,
>> Christian.
>>
>> Am 08.08.2018 um 09:05 schrieb Huang Rui:
>>> We won't initialize fence scheduler in drm_sched_fence_create() 
>>> anymore, so it
>>> will refer null fence scheduler if open trace event to get the 
>>> timeline name.
>>> Actually, it is the scheduler name from the entity, so add a macro 
>>> to replace
>>> legacy getting timeline name by job.
>>>
>>> [  212.844281] BUG: unable to handle kernel NULL pointer dereference 
>>> at 0000000000000018
>>> [  212.852401] PGD 8000000427c13067 P4D 8000000427c13067 PUD 
>>> 4235fc067 PMD 0
>>> [  212.859419] Oops: 0000 [#1] SMP PTI
>>> [  212.862981] CPU: 4 PID: 1520 Comm: amdgpu_test Tainted: 
>>> G           OE     4.18.0-rc1-custom #1
>>> [  212.872194] Hardware name: Gigabyte Technology Co., Ltd. 
>>> Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016
>>> [  212.881704] RIP: 0010:drm_sched_fence_get_timeline_name+0x2b/0x30 
>>> [gpu_sched]
>>> [  212.888948] Code: 1f 44 00 00 48 8b 47 08 48 3d c0 b1 4f c0 74 13 
>>> 48 83 ef 60 48 3d 60 b1 4f c0 b8 00 00 00 00 48 0f 45 f8 48 8b 87 e0 
>>> 00 00 00 <48> 8b 40 18 c3 0f 1f 44 00 00 b8 01 00 00 00 c3 0f 1f 44 
>>> 00 00 0f
>>> [  212.908162] RSP: 0018:ffffa3ed81f27af0 EFLAGS: 00010246
>>> [  212.913483] RAX: 0000000000000000 RBX: 0000000000070034 RCX: 
>>> ffffa3ed81f27da8
>>> [  212.920735] RDX: ffff8f24ebfb5460 RSI: ffff8f24e40d3c00 RDI: 
>>> ffff8f24ebfb5400
>>> [  212.928008] RBP: ffff8f24e40d3c00 R08: 0000000000000000 R09: 
>>> ffffffffae4deafc
>>> [  212.935263] R10: ffffffffada000ed R11: 0000000000000001 R12: 
>>> ffff8f24e891f898
>>> [  212.942558] R13: 0000000000000000 R14: ffff8f24ebc46000 R15: 
>>> ffff8f24e3de97a8
>>> [  212.949796] FS:  00007ffff7fd2700(0000) GS:ffff8f24fed00000(0000) 
>>> knlGS:0000000000000000
>>> [  212.958047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  212.963921] CR2: 0000000000000018 CR3: 0000000423422003 CR4: 
>>> 00000000003606e0
>>> [  212.971201] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
>>> 0000000000000000
>>> [  212.978482] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
>>> 0000000000000400
>>> [  212.985720] Call Trace:
>>> [  212.988236] trace_event_raw_event_amdgpu_cs_ioctl+0x4c/0x170 
>>> [amdgpu]
>>> [  212.994904]  ? amdgpu_ctx_add_fence+0xa9/0x110 [amdgpu]
>>> [  213.000246]  ? amdgpu_job_free_resources+0x4b/0x70 [amdgpu]
>>> [  213.005944]  amdgpu_cs_ioctl+0x16d1/0x1b50 [amdgpu]
>>> [  213.010920]  ? amdgpu_cs_find_mapping+0xf0/0xf0 [amdgpu]
>>> [  213.016354]  drm_ioctl_kernel+0x8a/0xd0 [drm]
>>> [  213.020794]  ? recalc_sigpending+0x17/0x50
>>> [  213.024965]  drm_ioctl+0x2d7/0x390 [drm]
>>> [  213.028979]  ? amdgpu_cs_find_mapping+0xf0/0xf0 [amdgpu]
>>> [  213.034366]  ? do_signal+0x36/0x700
>>> [  213.037928]  ? signal_wake_up_state+0x15/0x30
>>> [  213.042375]  amdgpu_drm_ioctl+0x46/0x80 [amdgpu]
>>>
>>> Signed-off-by: Huang Rui <ray.huang at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c    |  2 +-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 10 ++++++----
>>>   2 files changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> index e12871d..be01e1b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>>> @@ -1247,7 +1247,7 @@ static int amdgpu_cs_submit(struct 
>>> amdgpu_cs_parser *p,
>>>       amdgpu_job_free_resources(job);
>>> -    trace_amdgpu_cs_ioctl(job);
>>> +    trace_amdgpu_cs_ioctl(job, entity);
>>>       amdgpu_vm_bo_trace_cs(&fpriv->vm, &p->ticket);
>>>       priority = job->base.s_priority;
>>>       drm_sched_entity_push_job(&job->base, entity);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>> index 8c2dab2..25cdcb7 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>> @@ -36,6 +36,8 @@
>>>   #define AMDGPU_JOB_GET_TIMELINE_NAME(job) \
>>> job->base.s_fence->finished.ops->get_timeline_name(&job->base.s_fence->finished) 
>>>
>>> +#define AMDGPU_GET_SCHED_NAME(entity) \
>>> +     (entity->rq->sched->name)
>>>   TRACE_EVENT(amdgpu_mm_rreg,
>>>           TP_PROTO(unsigned did, uint32_t reg, uint32_t value),
>>> @@ -161,11 +163,11 @@ TRACE_EVENT(amdgpu_cs,
>>>   );
>>>   TRACE_EVENT(amdgpu_cs_ioctl,
>>> -        TP_PROTO(struct amdgpu_job *job),
>>> -        TP_ARGS(job),
>>> +        TP_PROTO(struct amdgpu_job *job, struct drm_sched_entity 
>>> *entity),
>>> +        TP_ARGS(job, entity),
>>>           TP_STRUCT__entry(
>>>                    __field(uint64_t, sched_job_id)
>>> -                 __string(timeline, AMDGPU_JOB_GET_TIMELINE_NAME(job))
>>> +                 __string(timeline, AMDGPU_GET_SCHED_NAME(entity))
>>>                    __field(unsigned int, context)
>>>                    __field(unsigned int, seqno)
>>>                    __field(struct dma_fence *, fence)
>>> @@ -175,7 +177,7 @@ TRACE_EVENT(amdgpu_cs_ioctl,
>>>           TP_fast_assign(
>>>                  __entry->sched_job_id = job->base.id;
>>> -               __assign_str(timeline, 
>>> AMDGPU_JOB_GET_TIMELINE_NAME(job))
>>> +               __assign_str(timeline, AMDGPU_GET_SCHED_NAME(entity))
>>>                  __entry->context = 
>>> job->base.s_fence->finished.context;
>>>                  __entry->seqno = job->base.s_fence->finished.seqno;
>>>                  __entry->ring_name = 
>>> to_amdgpu_ring(job->base.sched)->name;
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list