[PATCH] drm/scheduler: fix setting the priorty for entities - bisected
Dieter Nützel
Dieter at nuetzel-hh.de
Wed Aug 8 04:49:13 UTC 2018
Am 06.08.2018 02:13, schrieb Dieter Nützel:
> Am 04.08.2018 06:18, schrieb Dieter Nützel:
>> Am 04.08.2018 06:12, schrieb Dieter Nützel:
>>> Am 04.08.2018 05:27, schrieb Dieter Nützel:
>>>> Am 03.08.2018 13:09, schrieb Christian König:
>>>>> Am 03.08.2018 um 03:08 schrieb Dieter Nützel:
>>>>>> Hello Christian, AMD guys,
>>>>>>
>>>>>> this one _together_ with these series
>>>>>> [PATCH 1/7] drm/amdgpu: use new scheduler load balancing for VMs
>>>>>> https://lists.freedesktop.org/archives/amd-gfx/2018-August/024802.html
>>>>>>
>>>>>> on top of
>>>>>> amd-staging-drm-next 53d5f1e4a6d9
>>>>>>
>>>>>> freeze whole system (Intel Xeon X3470, RX580) during _first_ mouse
>>>>>> move.
>>>>>> Same for sddm login or first move in KDE Plasma 5.
>>>>>> NO logs so far. - Expected?
>>>>>
>>>>> Not even remotely, can you double check which patch from the
>>>>> "[PATCH
>>>>> 1/7] drm/amdgpu: use new scheduler load balancing for VMs" series
>>>>> is
>>>>> causing the issue?
>>>>
>>>> Ups,
>>>>
>>>> _both_ 'series' on top of
>>>>
>>>> bf1fd52b0632 (origin/amd-staging-drm-next) drm/amdgpu: move gem
>>>> definitions into amdgpu_gem header
>>>>
>>>> works without a hitch.
>>>>
>>>> But I have new (latest) µcode from openSUSE Tumbleweed.
>>>> kernel-firmware-20180730-35.1.src.rpm
>>>>
>>>> Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
>>>
>>> I take this back.
>>>
>>> Last much longer.
>>> Mouse freeze.
>>> Could grep a dmesg with remote phone ;-)
>>>
>>> See the attachment.
>>> Dieter
>>
>> Argh, shi...
>> wrong dmesg version.
>>
>> Should be this one. (For sure...)
>
> Puh,
>
> this took some time...
> During the 'last' git bisect run => 'first bad commit is' I got next
> freeze.
> But I could get a new dmesg.log file per remote phone (see attachment).
>
> git bisect log show this:
>
> SOURCE/amd-staging-drm-next> git bisect log
> git bisect start
> # good: [adebfff9c806afe1143d69a0174d4580cd27b23d] drm/scheduler: fix
> setting the priorty for entities
> git bisect good adebfff9c806afe1143d69a0174d4580cd27b23d
> # bad: [43202e67a4e6fcb0e6b773e8eb1ed56e1721e882] drm/amdgpu: use
> entity instead of ring for CS
> git bisect bad 43202e67a4e6fcb0e6b773e8eb1ed56e1721e882
> # bad: [9867b3a6ddfb73ee3105871541053f8e49949478] drm/amdgpu: use
> scheduler load balancing for compute CS
> git bisect bad 9867b3a6ddfb73ee3105871541053f8e49949478
> # good: [5d097a4591aa2be16b21adbaa19a8abb76e47ea1] drm/amdgpu: use
> scheduler load balancing for SDMA CS
> git bisect good 5d097a4591aa2be16b21adbaa19a8abb76e47ea1
> # first bad commit: [9867b3a6ddfb73ee3105871541053f8e49949478]
> drm/amdgpu: use scheduler load balancing for compute CS
>
> git log --oneline
> 5d097a4591aa (HEAD,
> refs/bisect/good-5d097a4591aa2be16b21adbaa19a8abb76e47ea1) drm/amdgpu:
> use scheduler load balancing for SDMA CS
> d12ae5172f1f drm/amdgpu: use new scheduler load balancing for VMs
> adebfff9c806
> (refs/bisect/good-adebfff9c806afe1143d69a0174d4580cd27b23d)
> drm/scheduler: fix setting the priorty for entities
> bf1fd52b0632 (origin/amd-staging-drm-next) drm/amdgpu: move gem
> definitions into amdgpu_gem header
> 5031ae5f9e5c drm/amdgpu: move psp macro into amdgpu_psp header
> [-]
>
> I'm not really sure that
> drm/amdgpu: use scheduler load balancing for compute CS
> is the offender.
>
> One step earlier could it be, too.
> drm/amdgpu: use scheduler load balancing for SDMA CS
>
> I'm try running with the SDMA CS patch for the next days.
>
> If you need more ask!
Hello Christian,
running the second day _without_ the 2. patch
[2/7] drm/amdgpu: use scheduler load balancing for SDMA CS
my system is stable, again.
To be clear.
I've now only #1 applied on top of amd-staging-drm-next.
'This one' is still in.
So we should switching the thread.
Dieter
>>>>
>>>>> Thanks,
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Greetings,
>>>>>> Dieter
>>>>>>
>>>>>> Am 01.08.2018 16:27, schrieb Christian König:
>>>>>>> Since we now deal with multiple rq we need to update all of them,
>>>>>>> not
>>>>>>> just the current one.
>>>>>>>
>>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 3 +--
>>>>>>> drivers/gpu/drm/scheduler/gpu_scheduler.c | 36
>>>>>>> ++++++++++++++++++++-----------
>>>>>>> include/drm/gpu_scheduler.h | 5 ++---
>>>>>>> 3 files changed, 26 insertions(+), 18 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>>>> index df6965761046..9fcc14e2dfcf 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>>>> @@ -407,12 +407,11 @@ void amdgpu_ctx_priority_override(struct
>>>>>>> amdgpu_ctx *ctx,
>>>>>>> for (i = 0; i < adev->num_rings; i++) {
>>>>>>> ring = adev->rings[i];
>>>>>>> entity = &ctx->rings[i].entity;
>>>>>>> - rq = &ring->sched.sched_rq[ctx_prio];
>>>>>>>
>>>>>>> if (ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
>>>>>>> continue;
>>>>>>>
>>>>>>> - drm_sched_entity_set_rq(entity, rq);
>>>>>>> + drm_sched_entity_set_priority(entity, ctx_prio);
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>>> b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>>> index 05dc6ecd4003..85908c7f913e 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>>> @@ -419,29 +419,39 @@ static void
>>>>>>> drm_sched_entity_clear_dep(struct
>>>>>>> dma_fence *f, struct dma_fence_cb
>>>>>>> }
>>>>>>>
>>>>>>> /**
>>>>>>> - * drm_sched_entity_set_rq - Sets the run queue for an entity
>>>>>>> + * drm_sched_entity_set_rq_priority - helper for
>>>>>>> drm_sched_entity_set_priority
>>>>>>> + */
>>>>>>> +static void drm_sched_entity_set_rq_priority(struct drm_sched_rq
>>>>>>> **rq,
>>>>>>> + enum drm_sched_priority priority)
>>>>>>> +{
>>>>>>> + *rq = &(*rq)->sched->sched_rq[priority];
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_sched_entity_set_priority - Sets priority of the entity
>>>>>>> *
>>>>>>> * @entity: scheduler entity
>>>>>>> - * @rq: scheduler run queue
>>>>>>> + * @priority: scheduler priority
>>>>>>> *
>>>>>>> - * Sets the run queue for an entity and removes the entity from
>>>>>>> the previous
>>>>>>> - * run queue in which was present.
>>>>>>> + * Update the priority of runqueus used for the entity.
>>>>>>> */
>>>>>>> -void drm_sched_entity_set_rq(struct drm_sched_entity *entity,
>>>>>>> - struct drm_sched_rq *rq)
>>>>>>> +void drm_sched_entity_set_priority(struct drm_sched_entity
>>>>>>> *entity,
>>>>>>> + enum drm_sched_priority priority)
>>>>>>> {
>>>>>>> - if (entity->rq == rq)
>>>>>>> - return;
>>>>>>> -
>>>>>>> - BUG_ON(!rq);
>>>>>>> + unsigned int i;
>>>>>>>
>>>>>>> spin_lock(&entity->rq_lock);
>>>>>>> +
>>>>>>> + for (i = 0; i < entity->num_rq_list; ++i)
>>>>>>> + drm_sched_entity_set_rq_priority(&entity->rq_list[i],
>>>>>>> priority);
>>>>>>> +
>>>>>>> drm_sched_rq_remove_entity(entity->rq, entity);
>>>>>>> - entity->rq = rq;
>>>>>>> - drm_sched_rq_add_entity(rq, entity);
>>>>>>> + drm_sched_entity_set_rq_priority(&entity->rq, priority);
>>>>>>> + drm_sched_rq_add_entity(entity->rq, entity);
>>>>>>> +
>>>>>>> spin_unlock(&entity->rq_lock);
>>>>>>> }
>>>>>>> -EXPORT_SYMBOL(drm_sched_entity_set_rq);
>>>>>>> +EXPORT_SYMBOL(drm_sched_entity_set_priority);
>>>>>>>
>>>>>>> /**
>>>>>>> * drm_sched_dependency_optimized
>>>>>>> diff --git a/include/drm/gpu_scheduler.h
>>>>>>> b/include/drm/gpu_scheduler.h
>>>>>>> index 0c4cfe689d4c..22c0f88f7d8f 100644
>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>> @@ -298,9 +298,8 @@ void drm_sched_entity_fini(struct
>>>>>>> drm_sched_entity *entity);
>>>>>>> void drm_sched_entity_destroy(struct drm_sched_entity *entity);
>>>>>>> void drm_sched_entity_push_job(struct drm_sched_job *sched_job,
>>>>>>> struct drm_sched_entity *entity);
>>>>>>> -void drm_sched_entity_set_rq(struct drm_sched_entity *entity,
>>>>>>> - struct drm_sched_rq *rq);
>>>>>>> -
>>>>>>> +void drm_sched_entity_set_priority(struct drm_sched_entity
>>>>>>> *entity,
>>>>>>> + enum drm_sched_priority priority);
>>>>>>> struct drm_sched_fence *drm_sched_fence_create(
>>>>>>> struct drm_sched_entity *s_entity, void *owner);
>>>>>>> void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
More information about the amd-gfx
mailing list