<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <br>
    <div class="moz-cite-prefix">On 2016年06月28日 18:22, Christian König
      wrote:<br>
    </div>
    <blockquote cite="mid:57724FFD.4080909@vodafone.de" type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">Am 28.06.2016 um 12:05 schrieb
        zhoucm1:<br>
      </div>
      <blockquote cite="mid:57724BEA.9090404@amd.com" type="cite"> <br>
        <br>
        <div class="moz-cite-prefix">On 2016年06月28日 18:03, Christian
          König wrote:<br>
        </div>
        <blockquote cite="mid:57724B64.9060107@vodafone.de" type="cite">
          <div class="moz-cite-prefix">Am 28.06.2016 um 11:33 schrieb
            zhoucm1:<br>
          </div>
          <blockquote cite="mid:5772444E.20309@amd.com" type="cite"> <br>
            <br>
            On 2016年06月28日 17:36, Christian König wrote: <br>
            <blockquote type="cite">Am 28.06.2016 um 09:27 schrieb Huang
              Rui: <br>
              <blockquote type="cite">On Tue, Jun 28, 2016 at 03:04:18PM
                +0800, Chunming Zhou wrote: <br>
                <blockquote type="cite">ring_mirror_list is only used
                  kthread context, no need to spinlock. <br>
                  otherwise deadlock happens when kthread_park. <br>
                  <br>
                </blockquote>
                Yes, in process context, we prefer to use mutex, because
                it avoids to <br>
                grab the CPU all the time. <br>
                <br>
                Reviewed-by: Huang Rui <a moz-do-not-send="true"
                  class="moz-txt-link-rfc2396E"
                  href="mailto:ray.huang@amd.com"><ray.huang@amd.com></a>
                <br>
              </blockquote>
              <br>
              NAK, the patch won't apply because I've changed the irq
              save spin lock to a normal one quite a while ago. But, I'm
              not sure if Alex picked up that patch yet. <br>
              <br>
              You shouldn't use a mutex here when you don't have a
              reason to do so. Spin locks have less overhead and we
              won't expect any contention here. <br>
              <br>
              Additional to that how should this cause a deadlock with
              kthread_park? <br>
            </blockquote>
            you can apply another patch to drop hw ring, and then run
            glxgears, and hang the gfx with the attached, then you will
            find that deadlock info. <br>
          </blockquote>
          <br>
          Please provide the deadlock info. Since the spin lock only
          protects the linked list we should be able to easily avoid any
          lock inversion which could lead to a deadlock.<br>
        </blockquote>
        <br>
        With the latest code, you could easily reproduce it with above
        steps, after mutex, these deadlock info disappears:<br>
      </blockquote>
      <br>
      That looks like a known issue which should already be fixed. Don't
      you have commit 3ebd43f5696147c6a4f808ded4ce1233496c7fd6
      "drm/amdgpu: stop trying to schedule() with a spin held" in your
      branch?<br>
    </blockquote>
    No, I didn't find it in both upstream and pro branch.<br>
    <br>
    Regards,<br>
    David Zhou<br>
    <blockquote cite="mid:57724FFD.4080909@vodafone.de" type="cite"> <br>
      Regards,<br>
      Christian.<br>
      <br>
      <blockquote cite="mid:57724BEA.9090404@amd.com" type="cite"> <br>
        <br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.282892] NMI watchdog: BUG: soft lockup - CPU#7 stuck for 22s!
        [kworker/7:0:58]<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283106] Modules linked in: amdgpu(OE) ttm(OE)
        drm_kms_helper(OE) drm(E) i2c_algo_bit(E) fb_sys_fops(E)
        syscopyarea(E) sysfillrect(E) sysimgblt(E) nfsv3(E) bnep(E)
        rfcomm(E) bluetooth(E) rpcsec_gss_krb5(E) nfsv4(E) nfsd(E)
        auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E)
        fscache(E) binfmt_misc(E) snd_hda_codec_realtek(E)
        snd_hda_codec_hdmi(E) snd_hda_codec_generic(E) snd_hda_intel(E)
        snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) coretemp(E)
        snd_pcm(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_rawmidi(E)
        snd_seq(E) joydev(E) kvm_intel(E) snd_seq_device(E) snd_timer(E)
        kvm(E) snd(E) soundcore(E) irqbypass(E) gpio_ich(E) mxm_wmi(E)
        serio_raw(E) shpchp(E) wmi(E) i7core_edac(E) edac_core(E)
        lpc_ich(E) parport_pc(E) mac_hid(E) ppdev(E) lp(E) parport(E)
        hid_generic(E) usbhid(E) hid(E) firewire_ohci(E) 8139too(E)
        pata_acpi(E) firewire_core(E) 8139cp(E) crc_itu_t(E) r8169(E)
        mii(E) pata_jmicron(E) ahci(E) libahci(E)<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283134] CPU: 7 PID: 58 Comm: kworker/7:0 Tainted: G         
        IOE   4.5.0-custom #7<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283136] Hardware name: Gigabyte Technology Co., Ltd.
        X58A-UD3R/X58A-UD3R, BIOS FE 12/23/2010<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283182] Workqueue: events amd_sched_job_finish [amdgpu]<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283183] task: ffff8801a8490000 ti: ffff8801a848c000 task.ti:
        ffff8801a848c000<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283184] RIP: 0010:[<ffffffff810c1131>] 
        [<ffffffff810c1131>]
        native_queued_spin_lock_slowpath+0x171/0x190<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283189] RSP: 0018:ffff8801a848fdc8  EFLAGS: 00000202<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283190] RAX: 0000000000000101 RBX: ffff8800c5ba1030 RCX:
        0000000000000001<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283191] RDX: 0000000000000101 RSI: 0000000000000001 RDI:
        ffff8800c42a4518<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283192] RBP: ffff8801a848fdc8 R08: 0000000000000101 R09:
        ffff8801a93d6300<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283193] R10: 0000000000000001 R11: 0000000000000000 R12:
        ffff8800c42a4468<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283194] R13: ffff8800c42a4518 R14: ffff8800c5ba1000 R15:
        ffff8800c5ba1030<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283195] FS:  0000000000000000(0000) GS:ffff8801a93c0000(0000)
        knlGS:0000000000000000<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283196] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283197] CR2: 00007efc3bc21e50 CR3: 0000000001c0c000 CR4:
        00000000000006e0<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283198] Stack:<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283199]  ffff8801a848fdd8 ffffffff81173ef2 ffff8801a848fde8
        ffffffff817a19c0<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283200]  ffff8801a848fe18 ffffffffa06937fb ffff8801a75deb40
        ffff8801a93d6300<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283202]  ffff8801a93dae00 00000000000001c0 ffff8801a848fe60
        ffffffff81090bb0<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283203] Call Trace:<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283206]  [<ffffffff81173ef2>]
        queued_spin_lock_slowpath+0xb/0xf<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283209]  [<ffffffff817a19c0>] _raw_spin_lock+0x20/0x30<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283234]  [<ffffffffa06937fb>]
        amd_sched_job_finish+0x2b/0xc0 [amdgpu]<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283237]  [<ffffffff81090bb0>]
        process_one_work+0x150/0x3f0<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283239]  [<ffffffff8109136b>] worker_thread+0x12b/0x4b0<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283241]  [<ffffffff81091240>] ?
        rescuer_thread+0x340/0x340<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283242]  [<ffffffff81096b22>] kthread+0xd2/0xf0<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283244]  [<ffffffff81096a50>] ? kthread_park+0x50/0x50<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283246]  [<ffffffff817a210f>] ret_from_fork+0x3f/0x70<br>
        Jun 28 14:34:25 zhoucm1-System-Product-Name kernel: [  
        96.283247]  [<ffffffff81096a50>] ? kthread_park+0x50/0x50<br>
        <br>
        Regards,<br>
        David Zhou<br>
        <br>
        <blockquote cite="mid:57724B64.9060107@vodafone.de" type="cite">
          <br>
          BTW: The debug patch you attached is not such a good idea
          either, cause you modify the gfx ring from outside the worker
          thread.<br>
          <br>
          Regards,<br>
          Christian.<br>
          <br>
          <blockquote cite="mid:5772444E.20309@amd.com" type="cite"> <br>
            Regards, <br>
            David Zhou <br>
            <blockquote type="cite"> <br>
              Regards, <br>
              Christian. <br>
              <br>
              <blockquote type="cite"> <br>
                <blockquote type="cite">Change-Id:
                  I906022297015faf14a0ddb5f62a728af3e5f9448 <br>
                  Signed-off-by: Chunming Zhou <a
                    moz-do-not-send="true" class="moz-txt-link-rfc2396E"
                    href="mailto:David1.Zhou@amd.com"><David1.Zhou@amd.com></a>
                  <br>
                  --- <br>
                    drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 12
                  +++++------- <br>
                    drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  2
                  +- <br>
                    2 files changed, 6 insertions(+), 8 deletions(-) <br>
                  <br>
                  diff --git
                  a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
                  b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c <br>
                  index b53cf58..3373d97 100644 <br>
                  --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c <br>
                  +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c <br>
                  @@ -331,10 +331,9 @@ static void
                  amd_sched_job_finish(struct work_struct *work) <br>
                        struct amd_sched_job *s_job = container_of(work,
                  struct amd_sched_job, <br>
                                               finish_work); <br>
                        struct amd_gpu_scheduler *sched =
                  s_job->sched; <br>
                  -    unsigned long flags; <br>
                          /* remove job from ring_mirror_list */ <br>
                  -    spin_lock_irqsave(&sched->job_list_lock,
                  flags); <br>
                  +    mutex_lock(&sched->job_list_lock); <br>
                        list_del_init(&s_job->node); <br>
                        if (sched->timeout != MAX_SCHEDULE_TIMEOUT) {
                  <br>
                            struct amd_sched_job *next; <br>
                  @@ -348,7 +347,7 @@ static void
                  amd_sched_job_finish(struct work_struct *work) <br>
                            if (next) <br>
                               
                  schedule_delayed_work(&next->work_tdr,
                  sched->timeout); <br>
                        } <br>
                  -   
                  spin_unlock_irqrestore(&sched->job_list_lock,
                  flags); <br>
                  +    mutex_unlock(&sched->job_list_lock); <br>
                        sched->ops->free_job(s_job); <br>
                    } <br>
                    @@ -362,15 +361,14 @@ static void
                  amd_sched_job_finish_cb(struct fence *f, struct
                  fence_cb *cb) <br>
                    static void amd_sched_job_begin(struct amd_sched_job
                  *s_job) <br>
                    { <br>
                        struct amd_gpu_scheduler *sched =
                  s_job->sched; <br>
                  -    unsigned long flags; <br>
                    -    spin_lock_irqsave(&sched->job_list_lock,
                  flags); <br>
                  +    mutex_lock(&sched->job_list_lock); <br>
                        list_add_tail(&s_job->node,
                  &sched->ring_mirror_list); <br>
                        if (sched->timeout != MAX_SCHEDULE_TIMEOUT
                  && <br>
                  list_first_entry_or_null(&sched->ring_mirror_list,


                  <br>
                                         struct amd_sched_job, node) ==
                  s_job) <br>
                           
                  schedule_delayed_work(&s_job->work_tdr,
                  sched->timeout); <br>
                  -   
                  spin_unlock_irqrestore(&sched->job_list_lock,
                  flags); <br>
                  +    mutex_unlock(&sched->job_list_lock); <br>
                    } <br>
                      static void amd_sched_job_timedout(struct
                  work_struct *work) <br>
                  @@ -564,7 +562,7 @@ int amd_sched_init(struct
                  amd_gpu_scheduler *sched, <br>
                       
                  init_waitqueue_head(&sched->wake_up_worker); <br>
                       
                  init_waitqueue_head(&sched->job_scheduled); <br>
                        INIT_LIST_HEAD(&sched->ring_mirror_list);
                  <br>
                  -    spin_lock_init(&sched->job_list_lock); <br>
                  +    mutex_init(&sched->job_list_lock); <br>
                        atomic_set(&sched->hw_rq_count, 0); <br>
                        if (atomic_inc_return(&sched_fence_slab_ref)
                  == 1) { <br>
                            sched_fence_slab = kmem_cache_create( <br>
                  diff --git
                  a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h
                  b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h <br>
                  index 221a515..5675906 100644 <br>
                  --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h <br>
                  +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.h <br>
                  @@ -132,7 +132,7 @@ struct amd_gpu_scheduler { <br>
                        atomic_t            hw_rq_count; <br>
                        struct task_struct        *thread; <br>
                        struct list_head    ring_mirror_list; <br>
                  -    spinlock_t            job_list_lock; <br>
                  +    struct mutex            job_list_lock; <br>
                    }; <br>
                      int amd_sched_init(struct amd_gpu_scheduler
                  *sched, <br>
                  -- <br>
                  1.9.1 <br>
                  <br>
                  _______________________________________________ <br>
                  amd-gfx mailing list <br>
                  <a moz-do-not-send="true"
                    class="moz-txt-link-abbreviated"
                    href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
                  <br>
                  <a moz-do-not-send="true"
                    class="moz-txt-link-freetext"
                    href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
                  <br>
                </blockquote>
                _______________________________________________ <br>
                amd-gfx mailing list <br>
                <a moz-do-not-send="true"
                  class="moz-txt-link-abbreviated"
                  href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
                <br>
                <a moz-do-not-send="true" class="moz-txt-link-freetext"
href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
                <br>
              </blockquote>
              <br>
              _______________________________________________ <br>
              amd-gfx mailing list <br>
              <a moz-do-not-send="true" class="moz-txt-link-abbreviated"
                href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
              <br>
              <a moz-do-not-send="true" class="moz-txt-link-freetext"
                href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
              <br>
            </blockquote>
            <br>
            <br>
            <fieldset class="mimeAttachmentHeader"></fieldset>
            <br>
            <pre wrap="">_______________________________________________
amd-gfx mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
          </blockquote>
          <br>
        </blockquote>
        <br>
        <br>
        <fieldset class="mimeAttachmentHeader"></fieldset>
        <br>
        <pre wrap="">_______________________________________________
amd-gfx mailing list
<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
      </blockquote>
      <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
amd-gfx mailing list
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>