<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 1/14/21 12:11 AM, Chen, Xiaogang
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:6ba05f63-12f2-73c5-33c5-4b29c6166d8b@amd.com">
      
      <div class="moz-cite-prefix">On 1/12/2021 10:54 PM, Grodzovsky,
        Andrey wrote:<br>
      </div>
      <blockquote type="cite" cite="mid:51a8727c-1ad0-8e7f-8c07-ed0b4bbed7a5@amd.com"><br>
        On 1/4/21 1:01 AM, Xiaogang.Chen wrote: <br>
        <blockquote type="cite">From: Xiaogang Chen <a class="moz-txt-link-rfc2396E" href="mailto:xiaogang.chen@amd.com" moz-do-not-send="true">
            <xiaogang.chen@amd.com></a> <br>
          <br>
          amdgpu DM handles INTERRUPT_LOW_IRQ_CONTEXT interrupt(hpd,
          hpd_rx) by <br>
          using work queue and uses single work_struct. If previous
          interrupt <br>
          has not been handled new interrupts(same type) will be
          discarded and <br>
          driver just sends "amdgpu_dm_irq_schedule_work FAILED" message
          out. <br>
          If some important hpd, hpd_rx related interrupts are missed by
          driver <br>
          the hot (un)plug devices may cause system hang or unstable,
          such as <br>
          system resumes from S3 sleep with mst device connected. <br>
          <br>
          This patch dynamically allocates new
          amdgpu_dm_irq_handler_data for <br>
          new interrupts if previous INTERRUPT_LOW_IRQ_CONTEXT interrupt
          work <br>
          has not been handled. So the new interrupt works can be queued
          to the <br>
          same workqueue_struct, instead discard the new interrupts. <br>
          All allocated amdgpu_dm_irq_handler_data are put into a single
          linked <br>
          list and will be reused after. <br>
        </blockquote>
        <br>
        <br>
        I believe this creates a possible concurrency between already
        executing work item
        <br>
        and the new incoming one for which you allocate a new work item
        on the fly. While
        <br>
        handle_hpd_irq is serialized with aconnector->hpd_lock I am
        seeing that for handle_hpd_rx_irq
        <br>
        it's not locked for MST use case (which is the most frequently
        used with this interrupt).  Did you
        <br>
        verified that handle_hpd_rx_irq is reentrant ? <br>
        <br>
      </blockquote>
      <p class="MsoPlainText">handle_hpd_rx_irq is put at a work queue.
        Its execution is serialized by the work queue. So there is no
        reentrant.</p>
    </blockquote>
    <p>You are using system_highpri_wq which has the property that it
      has multiple workers thread pool spread across all the<br>
      active CPUs, see all work queue definitions here
<a class="moz-txt-link-freetext" href="https://elixir.bootlin.com/linux/v5.11-rc3/source/include/linux/workqueue.h#L358">https://elixir.bootlin.com/linux/v5.11-rc3/source/include/linux/workqueue.h#L358</a><br>
      I beleieve that what you saying about no chance of reentrnacy
      would be correct if it would be same work item dequeued for
      execution<br>
      while previous instance is still running, see the explanation here
      -
<a class="moz-txt-link-freetext" href="https://elixir.bootlin.com/linux/v5.11-rc3/source/kernel/workqueue.c#L1435">https://elixir.bootlin.com/linux/v5.11-rc3/source/kernel/workqueue.c#L1435</a>.<br>
      Non reentrancy is guaranteed only for the same work item. If you
      want non reentrancy (full serializtion) for different work items
      you should create<br>
      you own single threaded work-queue using
      create_singlethread_workqueue</p>
    <p><br>
    </p>
    <blockquote type="cite" cite="mid:6ba05f63-12f2-73c5-33c5-4b29c6166d8b@amd.com">
      amdgpu_dm_irq_schedule_work does queuing of work(put
      handle_hpd_rx_irq into work queue). The first call is
      dm_irq_work_func, then call handle_hpd_rx_irq.
      <blockquote type="cite" cite="mid:51a8727c-1ad0-8e7f-8c07-ed0b4bbed7a5@amd.com"><br>
        <blockquote type="cite"><br>
          Signed-off-by: Xiaogang Chen <a class="moz-txt-link-rfc2396E" href="mailto:xiaogang.chen@amd.com" moz-do-not-send="true">
            <xiaogang.chen@amd.com></a> <br>
          --- <br>
            drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h  |  14 +--
          <br>
            .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c  | 114
          ++++++++++++++------- <br>
            2 files changed, 80 insertions(+), 48 deletions(-) <br>
          <br>
          diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
          b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h
          <br>
          index c9d82b9..730e540 100644 <br>
          --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h <br>
          +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h <br>
          @@ -69,18 +69,6 @@ struct common_irq_params { <br>
            }; <br>
              /** <br>
          - * struct irq_list_head - Linked-list for low context IRQ
          handlers. <br>
          - * <br>
          - * @head: The list_head within &struct handler_data <br>
          - * @work: A work_struct containing the deferred handler work
          <br>
          - */ <br>
          -struct irq_list_head { <br>
          -    struct list_head head; <br>
          -    /* In case this interrupt needs post-processing, 'work'
          will be queued*/ <br>
          -    struct work_struct work; <br>
          -}; <br>
          - <br>
          -/** <br>
             * struct dm_compressor_info - Buffer info used by frame
          buffer compression <br>
             * @cpu_addr: MMIO cpu addr <br>
             * @bo_ptr: Pointer to the buffer object <br>
          @@ -270,7 +258,7 @@ struct amdgpu_display_manager { <br>
                 * Note that handlers are called in the same order as
          they were <br>
                 * registered (FIFO). <br>
                 */ <br>
          -    struct irq_list_head
          irq_handler_list_low_tab[DAL_IRQ_SOURCES_NUMBER]; <br>
          +    struct list_head
          irq_handler_list_low_tab[DAL_IRQ_SOURCES_NUMBER]; <br>
                  /** <br>
                 * @irq_handler_list_high_tab: <br>
          diff --git
          a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
          b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c
          <br>
          index 3577785..ada344a 100644 <br>
          --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c <br>
          +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c <br>
          @@ -82,6 +82,7 @@ struct amdgpu_dm_irq_handler_data { <br>
                struct amdgpu_display_manager *dm; <br>
                /* DAL irq source which registered for this interrupt.
          */ <br>
                enum dc_irq_source irq_source; <br>
          +    struct work_struct work; <br>
            }; <br>
              #define DM_IRQ_TABLE_LOCK(adev, flags) \ <br>
          @@ -111,20 +112,10 @@ static void
          init_handler_common_data(struct amdgpu_dm_irq_handler_data
          *hcd,
          <br>
             */ <br>
            static void dm_irq_work_func(struct work_struct *work) <br>
            { <br>
          -    struct irq_list_head *irq_list_head = <br>
          -        container_of(work, struct irq_list_head, work); <br>
          -    struct list_head *handler_list =
          &irq_list_head->head; <br>
          -    struct amdgpu_dm_irq_handler_data *handler_data; <br>
          - <br>
          -    list_for_each_entry(handler_data, handler_list, list) { <br>
          -        DRM_DEBUG_KMS("DM_IRQ: work_func: for dal_src=%d\n",
          <br>
          -                handler_data->irq_source); <br>
          +    struct amdgpu_dm_irq_handler_data *handler_data = <br>
          +     container_of(work, struct amdgpu_dm_irq_handler_data,
          work); <br>
            -        DRM_DEBUG_KMS("DM_IRQ: schedule_work: for
          dal_src=%d\n", <br>
          -            handler_data->irq_source); <br>
          - <br>
          -       
          handler_data->handler(handler_data->handler_arg); <br>
          -    } <br>
          +    handler_data->handler(handler_data->handler_arg); <br>
                  /* Call a DAL subcomponent which registered for
          interrupt notification <br>
                 * at INTERRUPT_LOW_IRQ_CONTEXT. <br>
          @@ -156,7 +147,7 @@ static struct list_head
          *remove_irq_handler(struct amdgpu_device *adev,
          <br>
                    break; <br>
                case INTERRUPT_LOW_IRQ_CONTEXT: <br>
                default: <br>
          -        hnd_list =
          &adev->dm.irq_handler_list_low_tab[irq_source].head; <br>
          +        hnd_list =
          &adev->dm.irq_handler_list_low_tab[irq_source]; <br>
                    break; <br>
                } <br>
            @@ -287,7 +278,8 @@ void
          *amdgpu_dm_irq_register_interrupt(struct amdgpu_device *adev,
          <br>
                    break; <br>
                case INTERRUPT_LOW_IRQ_CONTEXT: <br>
                default: <br>
          -        hnd_list =
          &adev->dm.irq_handler_list_low_tab[irq_source].head; <br>
          +        hnd_list =
          &adev->dm.irq_handler_list_low_tab[irq_source]; <br>
          +        INIT_WORK(&handler_data->work,
          dm_irq_work_func); <br>
                    break; <br>
                } <br>
            @@ -369,7 +361,7 @@ void
          amdgpu_dm_irq_unregister_interrupt(struct amdgpu_device *adev,
          <br>
            int amdgpu_dm_irq_init(struct amdgpu_device *adev) <br>
            { <br>
                int src; <br>
          -    struct irq_list_head *lh; <br>
          +    struct list_head *lh; <br>
                  DRM_DEBUG_KMS("DM_IRQ\n"); <br>
            @@ -378,9 +370,7 @@ int amdgpu_dm_irq_init(struct
          amdgpu_device *adev) <br>
                for (src = 0; src < DAL_IRQ_SOURCES_NUMBER; src++) {
          <br>
                    /* low context handler list init */ <br>
                    lh = &adev->dm.irq_handler_list_low_tab[src];
          <br>
          -        INIT_LIST_HEAD(&lh->head); <br>
          -        INIT_WORK(&lh->work, dm_irq_work_func); <br>
          - <br>
          +        INIT_LIST_HEAD(lh); <br>
                    /* high context handler init */ <br>
                   
          INIT_LIST_HEAD(&adev->dm.irq_handler_list_high_tab[src]);
          <br>
                } <br>
          @@ -397,8 +387,11 @@ int amdgpu_dm_irq_init(struct
          amdgpu_device *adev) <br>
            void amdgpu_dm_irq_fini(struct amdgpu_device *adev) <br>
            { <br>
                int src; <br>
          -    struct irq_list_head *lh; <br>
          +    struct list_head *lh; <br>
          +    struct list_head *entry, *tmp; <br>
          +    struct amdgpu_dm_irq_handler_data *handler; <br>
                unsigned long irq_table_flags; <br>
          + <br>
                DRM_DEBUG_KMS("DM_IRQ: releasing resources.\n"); <br>
                for (src = 0; src < DAL_IRQ_SOURCES_NUMBER; src++) {
          <br>
                    DM_IRQ_TABLE_LOCK(adev, irq_table_flags); <br>
          @@ -407,7 +400,15 @@ void amdgpu_dm_irq_fini(struct
          amdgpu_device *adev) <br>
                     * (because no code can schedule a new one). */ <br>
                    lh = &adev->dm.irq_handler_list_low_tab[src];
          <br>
                    DM_IRQ_TABLE_UNLOCK(adev, irq_table_flags); <br>
          -        flush_work(&lh->work); <br>
          + <br>
          +        if (!list_empty(lh)) { <br>
          +            list_for_each_safe(entry, tmp, lh) { <br>
          + <br>
          +                handler = list_entry(entry, struct
          amdgpu_dm_irq_handler_data, <br>
          +                                     list); <br>
          +                flush_work(&handler->work); <br>
          +            } <br>
          +        } <br>
                } <br>
            } <br>
            @@ -417,6 +418,8 @@ int amdgpu_dm_irq_suspend(struct
          amdgpu_device *adev) <br>
                struct list_head *hnd_list_h; <br>
                struct list_head *hnd_list_l; <br>
                unsigned long irq_table_flags; <br>
          +    struct list_head *entry, *tmp; <br>
          +    struct amdgpu_dm_irq_handler_data *handler; <br>
                  DM_IRQ_TABLE_LOCK(adev, irq_table_flags); <br>
            @@ -427,14 +430,22 @@ int amdgpu_dm_irq_suspend(struct
          amdgpu_device *adev) <br>
                 * will be disabled from manage_dm_interrupts on disable
          CRTC. <br>
                 */ <br>
                for (src = DC_IRQ_SOURCE_HPD1; src <=
          DC_IRQ_SOURCE_HPD6RX; src++) { <br>
          -        hnd_list_l =
          &adev->dm.irq_handler_list_low_tab[src].head; <br>
          +        hnd_list_l =
          &adev->dm.irq_handler_list_low_tab[src]; <br>
                    hnd_list_h =
          &adev->dm.irq_handler_list_high_tab[src]; <br>
                    if (!list_empty(hnd_list_l) ||
          !list_empty(hnd_list_h)) <br>
                        dc_interrupt_set(adev->dm.dc, src, false); <br>
                      DM_IRQ_TABLE_UNLOCK(adev, irq_table_flags); <br>
          -       
          flush_work(&adev->dm.irq_handler_list_low_tab[src].work);
          <br>
            +        if (!list_empty(hnd_list_l)) { <br>
          + <br>
          +            list_for_each_safe(entry, tmp, hnd_list_l) { <br>
          + <br>
          +                handler = list_entry(entry, struct
          amdgpu_dm_irq_handler_data, <br>
          +                                     list); <br>
          +                flush_work(&handler->work); <br>
          +            } <br>
          +        } <br>
                    DM_IRQ_TABLE_LOCK(adev, irq_table_flags); <br>
                } <br>
            @@ -454,7 +465,7 @@ int amdgpu_dm_irq_resume_early(struct
          amdgpu_device *adev) <br>
                  /* re-enable short pulse interrupts HW interrupt */ <br>
                for (src = DC_IRQ_SOURCE_HPD1RX; src <=
          DC_IRQ_SOURCE_HPD6RX; src++) { <br>
          -        hnd_list_l =
          &adev->dm.irq_handler_list_low_tab[src].head; <br>
          +        hnd_list_l =
          &adev->dm.irq_handler_list_low_tab[src]; <br>
                    hnd_list_h =
          &adev->dm.irq_handler_list_high_tab[src]; <br>
                    if (!list_empty(hnd_list_l) ||
          !list_empty(hnd_list_h)) <br>
                        dc_interrupt_set(adev->dm.dc, src, true); <br>
          @@ -480,7 +491,7 @@ int amdgpu_dm_irq_resume_late(struct
          amdgpu_device *adev) <br>
                 * will be enabled from manage_dm_interrupts on enable
          CRTC. <br>
                 */ <br>
                for (src = DC_IRQ_SOURCE_HPD1; src <=
          DC_IRQ_SOURCE_HPD6; src++) { <br>
          -        hnd_list_l =
          &adev->dm.irq_handler_list_low_tab[src].head; <br>
          +        hnd_list_l =
          &adev->dm.irq_handler_list_low_tab[src]; <br>
                    hnd_list_h =
          &adev->dm.irq_handler_list_high_tab[src]; <br>
                    if (!list_empty(hnd_list_l) ||
          !list_empty(hnd_list_h)) <br>
                        dc_interrupt_set(adev->dm.dc, src, true); <br>
          @@ -497,20 +508,53 @@ int amdgpu_dm_irq_resume_late(struct
          amdgpu_device *adev) <br>
            static void amdgpu_dm_irq_schedule_work(struct amdgpu_device
          *adev, <br>
                                enum dc_irq_source irq_source) <br>
            { <br>
          -    unsigned long irq_table_flags; <br>
          -    struct work_struct *work = NULL; <br>
            -    DM_IRQ_TABLE_LOCK(adev, irq_table_flags); <br>
          +    struct  list_head *handler_list =
          &adev->dm.irq_handler_list_low_tab[irq_source];
          <br>
          +    struct  amdgpu_dm_irq_handler_data *handler_data; <br>
          +    bool    work_queued = false; <br>
            -    if
(!list_empty(&adev->dm.irq_handler_list_low_tab[irq_source].head))<br>
          -        work =
          &adev->dm.irq_handler_list_low_tab[irq_source].work; <br>
          +    if (list_empty(handler_list)) <br>
          +        return; <br>
            -    DM_IRQ_TABLE_UNLOCK(adev, irq_table_flags); <br>
          +    list_for_each_entry(handler_data, handler_list, list) { <br>
          + <br>
          +        if (!queue_work(system_highpri_wq,
          &handler_data->work)) { <br>
          +            continue; <br>
          +        } else { <br>
          +            work_queued = true; <br>
          +            break; <br>
          +        } <br>
          +    } <br>
          + <br>
          +    if (!work_queued) { <br>
          + <br>
          +        struct  amdgpu_dm_irq_handler_data *handler_data_add;
          <br>
          +        /*get the amdgpu_dm_irq_handler_data of first item
          pointed by handler_list*/
          <br>
          +        handler_data = container_of(handler_list->next,
          struct amdgpu_dm_irq_handler_data, list);
          <br>
          + <br>
          +        /*allocate a new amdgpu_dm_irq_handler_data*/ <br>
          +        handler_data_add = kzalloc(sizeof(*handler_data),
          GFP_KERNEL); <br>
          +        if (!handler_data_add) { <br>
          +            DRM_ERROR("DM_IRQ: failed to allocate irq
          handler!\n"); <br>
          +            return; <br>
          +        } <br>
          + <br>
          +        /*copy new amdgpu_dm_irq_handler_data members from
          handler_data*/ <br>
          +        handler_data_add->handler       =
          handler_data->handler; <br>
          +        handler_data_add->handler_arg   =
          handler_data->handler_arg; <br>
          +        handler_data_add->dm            =
          handler_data->dm; <br>
          +        handler_data_add->irq_source    = irq_source; <br>
          + <br>
          +        list_add_tail(&handler_data_add->list,
          handler_list); <br>
        </blockquote>
        <br>
        <br>
        At what place are you deleting completed work items from the
        handler_list ? <br>
        <br>
        Andrey <br>
        <br>
      </blockquote>
      <p class="MsoPlainText">The new allocated work item
        handler_data_add is put at end of single list handler_list. It
        is not deleted, but put into this list. In the future for same
        interrupt source handling the previous allocated work items can
        be reused.  So we do not have to reallocate new work items if
        previous interrupts have not been handled by cpu. On other side
        system will keep all the allocated work items at run time. Note,
        the new work item allocation only happens when cpu has not
        handled previous interrupts yet, and new same type interrupts
        have came. <br>
      </p>
      <p class="MsoPlainText">Thanks<br>
      </p>
      <p class="MsoPlainText">Xiaogang<br>
      </p>
    </blockquote>
    <p><br>
    </p>
    <p>I am still confused, how you avoid executing a second time a
      handler you previously allocated because first queue_work failed,<br>
      you always start iteration from beginning of the list and you
      never remove already successfully executed handlers.</p>
    <p>Andrey</p>
    <p><br>
    </p>
    <blockquote type="cite" cite="mid:6ba05f63-12f2-73c5-33c5-4b29c6166d8b@amd.com">
      <p class="MsoPlainText">
      </p>
      <blockquote type="cite" cite="mid:51a8727c-1ad0-8e7f-8c07-ed0b4bbed7a5@amd.com"><br>
        <blockquote type="cite">+ <br>
          +        INIT_WORK(&handler_data_add->work,
          dm_irq_work_func); <br>
            -    if (work) { <br>
          -        if (!schedule_work(work)) <br>
          -            DRM_INFO("amdgpu_dm_irq_schedule_work FAILED src
          %d\n", <br>
          -                        irq_source); <br>
          +        if (queue_work(system_highpri_wq,
          &handler_data_add->work)) <br>
          +            DRM_DEBUG("__func__: a work_struct is allocated
          and queued, " <br>
          +                     "src %d\n", irq_source); <br>
          +        else <br>
          +            DRM_ERROR("__func__: a new work_struct cannot be
          queued, " <br>
          +                      "something is wrong, src %d\n",
          irq_source); <br>
                } <br>
              } </blockquote>
      </blockquote>
      <br>
      <p><br>
      </p>
    </blockquote>
  </body>
</html>