[PATCH] drm/scheduler: fix setting the priorty for entities - bisected

Dieter Nützel Dieter at nuetzel-hh.de
Wed Aug 8 04:49:13 UTC 2018


Am 06.08.2018 02:13, schrieb Dieter Nützel:
> Am 04.08.2018 06:18, schrieb Dieter Nützel:
>> Am 04.08.2018 06:12, schrieb Dieter Nützel:
>>> Am 04.08.2018 05:27, schrieb Dieter Nützel:
>>>> Am 03.08.2018 13:09, schrieb Christian König:
>>>>> Am 03.08.2018 um 03:08 schrieb Dieter Nützel:
>>>>>> Hello Christian, AMD guys,
>>>>>> 
>>>>>> this one _together_ with these series
>>>>>> [PATCH 1/7] drm/amdgpu: use new scheduler load balancing for VMs
>>>>>> https://lists.freedesktop.org/archives/amd-gfx/2018-August/024802.html
>>>>>> 
>>>>>> on top of
>>>>>> amd-staging-drm-next 53d5f1e4a6d9
>>>>>> 
>>>>>> freeze whole system (Intel Xeon X3470, RX580) during _first_ mouse 
>>>>>> move.
>>>>>> Same for sddm login or first move in KDE Plasma 5.
>>>>>> NO logs so far. - Expected?
>>>>> 
>>>>> Not even remotely, can you double check which patch from the 
>>>>> "[PATCH
>>>>> 1/7] drm/amdgpu: use new scheduler load balancing for VMs" series 
>>>>> is
>>>>> causing the issue?
>>>> 
>>>> Ups,
>>>> 
>>>> _both_ 'series' on top of
>>>> 
>>>> bf1fd52b0632 (origin/amd-staging-drm-next) drm/amdgpu: move gem
>>>> definitions into amdgpu_gem header
>>>> 
>>>> works without a hitch.
>>>> 
>>>> But I have new (latest) µcode from openSUSE Tumbleweed.
>>>> kernel-firmware-20180730-35.1.src.rpm
>>>> 
>>>> Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>
>>> 
>>> I take this back.
>>> 
>>> Last much longer.
>>> Mouse freeze.
>>> Could grep a dmesg with remote phone ;-)
>>> 
>>> See the attachment.
>>> Dieter
>> 
>> Argh, shi...
>> wrong dmesg version.
>> 
>> Should be this one. (For sure...)
> 
> Puh,
> 
> this took some time...
> During the 'last' git bisect run => 'first bad commit is' I got next 
> freeze.
> But I could get a new dmesg.log file per remote phone (see attachment).
> 
> git bisect log show this:
> 
> SOURCE/amd-staging-drm-next> git bisect log
> git bisect start
> # good: [adebfff9c806afe1143d69a0174d4580cd27b23d] drm/scheduler: fix
> setting the priorty for entities
> git bisect good adebfff9c806afe1143d69a0174d4580cd27b23d
> # bad: [43202e67a4e6fcb0e6b773e8eb1ed56e1721e882] drm/amdgpu: use
> entity instead of ring for CS
> git bisect bad 43202e67a4e6fcb0e6b773e8eb1ed56e1721e882
> # bad: [9867b3a6ddfb73ee3105871541053f8e49949478] drm/amdgpu: use
> scheduler load balancing for compute CS
> git bisect bad 9867b3a6ddfb73ee3105871541053f8e49949478
> # good: [5d097a4591aa2be16b21adbaa19a8abb76e47ea1] drm/amdgpu: use
> scheduler load balancing for SDMA CS
> git bisect good 5d097a4591aa2be16b21adbaa19a8abb76e47ea1
> # first bad commit: [9867b3a6ddfb73ee3105871541053f8e49949478]
> drm/amdgpu: use scheduler load balancing for compute CS
> 
> git log --oneline
> 5d097a4591aa (HEAD,
> refs/bisect/good-5d097a4591aa2be16b21adbaa19a8abb76e47ea1) drm/amdgpu:
> use scheduler load balancing for SDMA CS
> d12ae5172f1f drm/amdgpu: use new scheduler load balancing for VMs
> adebfff9c806
> (refs/bisect/good-adebfff9c806afe1143d69a0174d4580cd27b23d)
> drm/scheduler: fix setting the priorty for entities
> bf1fd52b0632 (origin/amd-staging-drm-next) drm/amdgpu: move gem
> definitions into amdgpu_gem header
> 5031ae5f9e5c drm/amdgpu: move psp macro into amdgpu_psp header
> [-]
> 
> I'm not really sure that
> drm/amdgpu: use scheduler load balancing for compute CS
> is the offender.
> 
> One step earlier could it be, too.
> drm/amdgpu: use scheduler load balancing for SDMA CS
> 
> I'm try running with the SDMA CS patch for the next days.
> 
> If you need more ask!

Hello Christian,

running the second day _without_ the 2. patch
[2/7] drm/amdgpu: use scheduler load balancing for SDMA CS
my system is stable, again.

To be clear.
I've now only #1 applied on top of amd-staging-drm-next.
'This one' is still in.
So we should switching the thread.

Dieter

>>>> 
>>>>> Thanks,
>>>>> Christian.
>>>>> 
>>>>>> 
>>>>>> Greetings,
>>>>>> Dieter
>>>>>> 
>>>>>> Am 01.08.2018 16:27, schrieb Christian König:
>>>>>>> Since we now deal with multiple rq we need to update all of them, 
>>>>>>> not
>>>>>>> just the current one.
>>>>>>> 
>>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>>> ---
>>>>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |  3 +--
>>>>>>>  drivers/gpu/drm/scheduler/gpu_scheduler.c | 36 
>>>>>>> ++++++++++++++++++++-----------
>>>>>>>  include/drm/gpu_scheduler.h               |  5 ++---
>>>>>>>  3 files changed, 26 insertions(+), 18 deletions(-)
>>>>>>> 
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>>>> index df6965761046..9fcc14e2dfcf 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>>>>>> @@ -407,12 +407,11 @@ void amdgpu_ctx_priority_override(struct 
>>>>>>> amdgpu_ctx *ctx,
>>>>>>>      for (i = 0; i < adev->num_rings; i++) {
>>>>>>>          ring = adev->rings[i];
>>>>>>>          entity = &ctx->rings[i].entity;
>>>>>>> -        rq = &ring->sched.sched_rq[ctx_prio];
>>>>>>> 
>>>>>>>          if (ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
>>>>>>>              continue;
>>>>>>> 
>>>>>>> -        drm_sched_entity_set_rq(entity, rq);
>>>>>>> +        drm_sched_entity_set_priority(entity, ctx_prio);
>>>>>>>      }
>>>>>>>  }
>>>>>>> 
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>>> b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>>> index 05dc6ecd4003..85908c7f913e 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>>>>>>> @@ -419,29 +419,39 @@ static void 
>>>>>>> drm_sched_entity_clear_dep(struct
>>>>>>> dma_fence *f, struct dma_fence_cb
>>>>>>>  }
>>>>>>> 
>>>>>>>  /**
>>>>>>> - * drm_sched_entity_set_rq - Sets the run queue for an entity
>>>>>>> + * drm_sched_entity_set_rq_priority - helper for 
>>>>>>> drm_sched_entity_set_priority
>>>>>>> + */
>>>>>>> +static void drm_sched_entity_set_rq_priority(struct drm_sched_rq 
>>>>>>> **rq,
>>>>>>> +                         enum drm_sched_priority priority)
>>>>>>> +{
>>>>>>> +    *rq = &(*rq)->sched->sched_rq[priority];
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * drm_sched_entity_set_priority - Sets priority of the entity
>>>>>>>   *
>>>>>>>   * @entity: scheduler entity
>>>>>>> - * @rq: scheduler run queue
>>>>>>> + * @priority: scheduler priority
>>>>>>>   *
>>>>>>> - * Sets the run queue for an entity and removes the entity from 
>>>>>>> the previous
>>>>>>> - * run queue in which was present.
>>>>>>> + * Update the priority of runqueus used for the entity.
>>>>>>>   */
>>>>>>> -void drm_sched_entity_set_rq(struct drm_sched_entity *entity,
>>>>>>> -                 struct drm_sched_rq *rq)
>>>>>>> +void drm_sched_entity_set_priority(struct drm_sched_entity 
>>>>>>> *entity,
>>>>>>> +                   enum drm_sched_priority priority)
>>>>>>>  {
>>>>>>> -    if (entity->rq == rq)
>>>>>>> -        return;
>>>>>>> -
>>>>>>> -    BUG_ON(!rq);
>>>>>>> +    unsigned int i;
>>>>>>> 
>>>>>>>      spin_lock(&entity->rq_lock);
>>>>>>> +
>>>>>>> +    for (i = 0; i < entity->num_rq_list; ++i)
>>>>>>> + drm_sched_entity_set_rq_priority(&entity->rq_list[i], 
>>>>>>> priority);
>>>>>>> +
>>>>>>>      drm_sched_rq_remove_entity(entity->rq, entity);
>>>>>>> -    entity->rq = rq;
>>>>>>> -    drm_sched_rq_add_entity(rq, entity);
>>>>>>> +    drm_sched_entity_set_rq_priority(&entity->rq, priority);
>>>>>>> +    drm_sched_rq_add_entity(entity->rq, entity);
>>>>>>> +
>>>>>>>      spin_unlock(&entity->rq_lock);
>>>>>>>  }
>>>>>>> -EXPORT_SYMBOL(drm_sched_entity_set_rq);
>>>>>>> +EXPORT_SYMBOL(drm_sched_entity_set_priority);
>>>>>>> 
>>>>>>>  /**
>>>>>>>   * drm_sched_dependency_optimized
>>>>>>> diff --git a/include/drm/gpu_scheduler.h 
>>>>>>> b/include/drm/gpu_scheduler.h
>>>>>>> index 0c4cfe689d4c..22c0f88f7d8f 100644
>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>> @@ -298,9 +298,8 @@ void drm_sched_entity_fini(struct 
>>>>>>> drm_sched_entity *entity);
>>>>>>>  void drm_sched_entity_destroy(struct drm_sched_entity *entity);
>>>>>>>  void drm_sched_entity_push_job(struct drm_sched_job *sched_job,
>>>>>>>                     struct drm_sched_entity *entity);
>>>>>>> -void drm_sched_entity_set_rq(struct drm_sched_entity *entity,
>>>>>>> -                 struct drm_sched_rq *rq);
>>>>>>> -
>>>>>>> +void drm_sched_entity_set_priority(struct drm_sched_entity 
>>>>>>> *entity,
>>>>>>> +                   enum drm_sched_priority priority);
>>>>>>>  struct drm_sched_fence *drm_sched_fence_create(
>>>>>>>      struct drm_sched_entity *s_entity, void *owner);
>>>>>>>  void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>> 
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> 
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


More information about the amd-gfx mailing list