<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html;
      charset=windows-1252">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Checking more code and documentation
      and thinking about it over my vacation I think I have some new
      conclusions here.<br>
      <br>
      Currently we are using get_user_pages() together with an MMU
      notifier to guarantee coherent address space view, because
      get_user_pages() works by grabbing a reference to the pages and
      ignoring concurrent page table updates.<br>
      <br>
      But HMM uses a different approach by checking the address space
      for modifications using hmm_vma_range_done() and re-trying when
      the address space has changed.<br>
      <br>
      Now what you are trying to do is to change that into
      get_user_pages() and HMM callback and this is what won't work. We
      can either use get_user_pages() with MMU notifier or we can use
      HMM for the work, but we can't mix and match.<br>
      <br>
      So my initial guess was correct that we just need to change both
      sides of the implementation at the same time.<br>
      <br>
      Regards,<br>
      Christian.<br>
      <br>
      Am 28.09.2018 um 17:13 schrieb Koenig, Christian:<br>
    </div>
    <blockquote type="cite"
      cite="mid:b8686e6b-0c3e-4feb-afbd-80397aac31a0@email.android.com">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <meta content="text/html; charset=Windows-1252">
      <div dir="auto">No it definitely isn't.
        <div dir="auto"><br>
        </div>
        <div dir="auto">We have literally worked month on this with the
          core MM developers.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Making sure that we have a consistent page array
          is absolutely vital for correct operation.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Please also check Jerome's presentation from XDC
          it also perfectly explains why this approach won't work
          correctly.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Christian.</div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">Am 28.09.2018 17:07 schrieb "Yang,
          Philip" <a class="moz-txt-link-rfc2396E" href="mailto:Philip.Yang@amd.com"><Philip.Yang@amd.com></a>:<br type="attribution">
        </div>
      </div>
      <div>
        <div class="moz-cite-prefix">For B path, we take
          mm->mmap_sem, then call hmm_vma_get_pfns() or
          get_user_pages(). This is obvious.<br>
          <br>
          For A path, mmu notifier
mmu_notifier_invalidate_range_start()/mmu_notifier_invalidate_range_end()
          is called in many places, and the calling path is quit
          complicated inside mm, it's not obvious. I checked many of the
          them, for example:<br>
          <br>
          do_munmap()   <br>
            down_write(&mm->mmap_sem)<br>
            arch_unmap()<br>
              mpx_notify_unmap()...<br>
                 zap_bt_entries_mapping()<br>
                   zap_page_range()<br>
           up_write(&mm->mmap_sem)<br>
          <br>
          void zap_page_range(struct vm_area_struct *vma, unsigned long
          start,<br>
                  unsigned long size)<br>
          {<br>
              struct mm_struct *mm = vma->vm_mm;<br>
              struct mmu_gather tlb;<br>
              unsigned long end = start + size;<br>
          <br>
              lru_add_drain();<br>
              tlb_gather_mmu(&tlb, mm, start, end);<br>
              update_hiwater_rss(mm);<br>
              mmu_notifier_invalidate_range_start(mm, start, end);<br>
              for ( ; vma && vma->vm_start < end; vma =
          vma->vm_next)<br>
                  unmap_single_vma(&tlb, vma, start, end, NULL);<br>
              mmu_notifier_invalidate_range_end(mm, start, end);<br>
              tlb_finish_mmu(&tlb, start, end);<br>
          }<br>
           <br>
          So AFAIK it's okay without invalidate_range_end() callback. <br>
                   <br>
          Regards,<br>
          Philip<br>
          <br>
          On 2018-09-28 01:25 AM, Koenig, Christian wrote:<br>
        </div>
        <blockquote type="cite">
          <meta content="text/html; charset=utf-8">
          <div dir="auto">No, that is incorrect as well :)
            <div dir="auto"><br>
            </div>
            <div dir="auto">The mmap_sem isn't necessary taken during
              page table updates.</div>
            <div dir="auto"><br>
            </div>
            <div dir="auto">What you could do is replace
              get_user_pages() directly with HMM. If I'm not completely
              mistaken that should work as expected.</div>
            <div dir="auto"><br>
            </div>
            <div dir="auto">Christian.</div>
          </div>
          <div class="gmail_extra"><br>
            <div class="gmail_quote">Am 27.09.2018 22:18 schrieb "Yang,
              Philip" <a class="moz-txt-link-rfc2396E"
                href="mailto:Philip.Yang@amd.com" moz-do-not-send="true">
                <Philip.Yang@amd.com></a>:<br type="attribution">
            </div>
          </div>
          <div>
            <div class="moz-cite-prefix">I was trying to understand the
              way how HMM handle this concurrent issue and how we handle
              it in amdgpu_ttm_tt_userptr_needs_pages() and 
              amdgpu_ttm_tt_affect_userptr(). HMM uses range->valid
              flag, we use gtt->mmu_invalidations and
              gtt->last_set_pages. Both use the same lock plus flag
              idea actually.<br>
              <br>
              Thanks for the information, now I understand fence
              ttm_eu_fence_buffer_objects() put to BOs will block CPU
              page table update. This is another side of this concurrent
              issue I didn't know.<br>
              <br>
              I had same worry that it has issue without
              invalidate_range_end() callback as the calling sequence
              Felix lists. Now I think it's fine after taking a look
              again today because of mm->mmap_sem usage, this is my
              understanding:<br>
              <br>
              A path:<br>
              <br>
              down_write(&mm->mmap_sem);<br>
              mmu_notifier_invalidate_range_start()<br>
                  take_lock()<br>
                  release_lock()<br>
              CPU page table update<br>
              mmu_notifier_invalidate_range_end()<br>
              up_write(&mm->mmap_sem);<br>
              <br>
              B path:<br>
              <br>
              again: <br>
              down_read(&mm->mmap_sem);<br>
              hmm_vma_get_pfns()<br>
              up_read(&mm->mmap_sem);<br>
              ....<br>
              ....<br>
              take_lock()<br>
              if (!hmm_vma_range_done()) {<br>
                 release_lock()<br>
                 goto again<br>
              }<br>
              submit command job...<br>
              release_lock()<br>
              <br>
              If you agree, I will submit patch v5 with some minor
              changes, and submit another patch to replace
              get_user_page() with HMM.<br>
               <br>
              Regards,<br>
              Philip  <br>
              <br>
              On 2018-09-27 11:36 AM, Christian König wrote:<br>
            </div>
            <blockquote type="cite">
              <div class="moz-cite-prefix">Yeah, I've read that as well.<br>
                <br>
                My best guess is that we just need to add a call to
                hmm_vma_range_done() after taking the lock and also
                replace get_user_pages() with hmm_vma_get_pfns().<br>
                <br>
                But I'm still not 100% sure how all of that is supposed
                to work together.<br>
                <br>
                Regards,<br>
                Christian.<br>
                <br>
                Am 27.09.2018 um 16:50 schrieb Kuehling, Felix:<br>
              </div>
              <blockquote type="cite">
                <meta name="Generator" content="Microsoft Word 15
                  (filtered medium)">
                <style>
<!--
@font-face
        {font-family:"Cambria Math"}
@font-face
        {font-family:Calibri}
@font-face
        {font-family:Consolas}
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        color:black}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        color:black}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        color:black}
p.msonormal0, li.msonormal0, div.msonormal0
        {margin-right:0in;
        margin-left:0in;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif}
span.PlainTextChar
        {font-family:Consolas;
        color:black}
p.msonormal00, li.msonormal00, div.msonormal00
        {margin-right:0in;
        margin-left:0in;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif}
p.msonormal000, li.msonormal000, div.msonormal000
        {margin-right:0in;
        margin-left:0in;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        color:black}
p.emailquote, li.emailquote, div.emailquote
        {margin-right:0in;
        margin-left:1.0pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        color:black}
p.msochpdefault, li.msochpdefault, div.msochpdefault
        {margin-right:0in;
        margin-left:0in;
        font-size:10.0pt;
        font-family:"Calibri",sans-serif}
span.plaintextchar0
        {font-family:Consolas;
        color:black}
span.plaintextchar00
        {font-family:"Calibri",sans-serif}
span.emailstyle22
        {font-family:"Calibri",sans-serif;
        color:windowtext}
span.emailstyle23
        {font-family:"Calibri",sans-serif;
        color:windowtext}
span.emailstyle24
        {font-family:"Calibri",sans-serif;
        color:windowtext}
span.emailstyle25
        {font-family:"Calibri",sans-serif;
        color:windowtext}
span.emailstyle30
        {font-family:"Calibri",sans-serif;
        color:windowtext}
span.EmailStyle33
        {font-family:"Calibri",sans-serif;
        color:windowtext}
.MsoChpDefault
        {font-size:10.0pt}
@page WordSection1
        {margin:1.0in 1.0in 1.0in 1.0in}
ol
        {margin-bottom:0in}
ul
        {margin-bottom:0in}
-->
</style>
                <div class="WordSection1">
                  <p class="MsoNormal"><span style="color:windowtext">I
                      think the answer is here: <a
href="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hmm.rst#n216"
                        moz-do-not-send="true">
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/hmm.rst#n216</a></span></p>
                  <p class="MsoNormal"><span style="color:windowtext"> </span></p>
                  <p class="MsoNormal"><span style="color:windowtext">Regards,</span></p>
                  <p class="MsoNormal"><span style="color:windowtext"> 
                      Felix</span></p>
                  <p class="MsoNormal"><span style="color:windowtext"> </span></p>
                  <div>
                    <div style="border:none; border-top:solid #E1E1E1
                      1.0pt; padding:3.0pt 0in 0in 0in">
                      <p class="MsoNormal"><b><span
                            style="color:windowtext">From:</span></b><span
                          style="color:windowtext"> Koenig, Christian
                          <br>
                          <b>Sent:</b> Thursday, September 27, 2018
                          10:30 AM<br>
                          <b>To:</b> Kuehling, Felix <a
                            class="moz-txt-link-rfc2396E"
                            href="mailto:Felix.Kuehling@amd.com"
                            moz-do-not-send="true">
                            <Felix.Kuehling@amd.com></a><br>
                          <b>Cc:</b> <a
                            class="moz-txt-link-abbreviated"
                            href="mailto:j.glisse@gmail.com"
                            moz-do-not-send="true">j.glisse@gmail.com</a>;
                          Yang, Philip
                          <a class="moz-txt-link-rfc2396E"
                            href="mailto:Philip.Yang@amd.com"
                            moz-do-not-send="true"><Philip.Yang@amd.com></a>;
                          <a class="moz-txt-link-abbreviated"
                            href="mailto:amd-gfx@lists.freedesktop.org"
                            moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                          <b>Subject:</b> RE: [PATCH] drm/amdgpu: use
                          HMM mirror callback to replace mmu notifier v4</span></p>
                    </div>
                  </div>
                  <p class="MsoNormal"> </p>
                  <div>
                    <p class="MsoNormal"><span style="color:windowtext">At
                        least with get_user_pages() that is perfectly
                        possible.
                      </span></p>
                    <div>
                      <p class="MsoNormal"><span
                          style="color:windowtext"> </span></p>
                    </div>
                    <div>
                      <p class="MsoNormal"><span
                          style="color:windowtext">For HMM it could be
                          that this is prevented somehow.</span></p>
                    </div>
                    <div>
                      <p class="MsoNormal"><span
                          style="color:windowtext"> </span></p>
                    </div>
                    <div>
                      <p class="MsoNormal"><span
                          style="color:windowtext">Christian.</span></p>
                    </div>
                  </div>
                  <div>
                    <p class="MsoNormal"><span style="color:windowtext"> </span></p>
                    <div>
                      <p class="MsoNormal"><span
                          style="color:windowtext">Am 27.09.2018 16:27
                          schrieb "Kuehling, Felix" <<a
                            href="mailto:Felix.Kuehling@amd.com"
                            moz-do-not-send="true">Felix.Kuehling@amd.com</a>>:</span></p>
                    </div>
                  </div>
                  <div>
                    <div>
                      <p class="MsoNormal"><span
                          style="color:windowtext">> In this case you
                          can end up accessing pages which are
                          invalidated while get_user_pages is in
                          process.</span></p>
                      <p class="MsoNormal"><span
                          style="color:windowtext"> </span></p>
                      <p class="MsoNormal"><span
                          style="color:windowtext">What’s the sequence
                          of events you have in mind? Something like
                          this?</span></p>
                      <p class="MsoNormal"><span
                          style="color:windowtext"> </span></p>
                      <ul style="margin-top:0in" type="disc">
                        <li class="MsoListParagraph"
                          style="color:windowtext; margin-left:0in">Page
                          table is updated and triggers MMU notifier
                        </li>
                        <li class="MsoListParagraph"
                          style="color:windowtext; margin-left:0in">amdgpu
                          MMU notifier runs and waits for pending CS to
                          finish while holding the read lock
                        </li>
                        <li class="MsoListParagraph"
                          style="color:windowtext; margin-left:0in">New
                          CS starts just after invalidate_range_start
                          MMU notifier finishes but before the page
                          table update is done
                        </li>
                        <li class="MsoListParagraph"
                          style="color:windowtext; margin-left:0in">get_user_pages
                          returns outdated physical addresses
                        </li>
                      </ul>
                      <p class="MsoNormal"><span
                          style="color:windowtext"> </span></p>
                      <p class="MsoNormal"><span
                          style="color:windowtext">I hope that’s not
                          actually possible and that get_user_pages or
                          hmm_vma_fault would block until the page table
                          update is done. That is,
                          invalidate_range_start marks the start of a
                          page table update, and while that update is in
                          progress, get_user_pages or hmm_vma_fault
                          block. Jerome, can you comment on that?</span></p>
                      <p class="MsoNormal"><span
                          style="color:windowtext"> </span></p>
                      <p class="MsoNormal"><span
                          style="color:windowtext">Thanks,<br>
                            Felix</span></p>
                      <p class="MsoNormal"><span
                          style="color:windowtext"> </span></p>
                      <div>
                        <div style="border:none; border-top:solid
                          #E1E1E1 1.0pt; padding:3.0pt 0in 0in 0in">
                          <p class="MsoNormal"><b><span
                                style="color:windowtext">From:</span></b><span
                              style="color:windowtext"> Koenig,
                              Christian
                              <br>
                              <b>Sent:</b> Thursday, September 27, 2018
                              9:59 AM<br>
                              <b>To:</b> Kuehling, Felix <<a
                                href="mailto:Felix.Kuehling@amd.com"
                                moz-do-not-send="true">Felix.Kuehling@amd.com</a>><br>
                              <b>Cc:</b> Yang, Philip <<a
                                href="mailto:Philip.Yang@amd.com"
                                moz-do-not-send="true">Philip.Yang@amd.com</a>>;
                              <a
                                href="mailto:amd-gfx@lists.freedesktop.org"
                                moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>;
                              Jerome Glisse <<a
                                href="mailto:j.glisse@gmail.com"
                                moz-do-not-send="true">j.glisse@gmail.com</a>><br>
                              <b>Subject:</b> RE: [PATCH] drm/amdgpu:
                              use HMM mirror callback to replace mmu
                              notifier v4</span></p>
                        </div>
                      </div>
                      <p class="MsoNormal"> </p>
                      <div>
                        <p class="MsoNormal"><span
                            style="color:windowtext">Yeah I understand
                            that, but again that won't work.
                          </span></p>
                        <div>
                          <p class="MsoNormal"><span
                              style="color:windowtext"> </span></p>
                        </div>
                        <div>
                          <p class="MsoNormal"><span
                              style="color:windowtext">In this case you
                              can end up accessing pages which are
                              invalidated while get_user_pages is in
                              process.</span></p>
                        </div>
                        <div>
                          <p class="MsoNormal"><span
                              style="color:windowtext"> </span></p>
                        </div>
                        <div>
                          <p class="MsoNormal"><span
                              style="color:windowtext">Christian.</span></p>
                        </div>
                      </div>
                      <div>
                        <p class="MsoNormal"><span
                            style="color:windowtext"> </span></p>
                        <div>
                          <p class="MsoNormal"><span
                              style="color:windowtext">Am 27.09.2018
                              15:41 schrieb "Kuehling, Felix" <<a
                                href="mailto:Felix.Kuehling@amd.com"
                                moz-do-not-send="true">Felix.Kuehling@amd.com</a>>:</span></p>
                        </div>
                      </div>
                      <div>
                        <div>
                          <p class="MsoNormal"><span
                              style="color:windowtext">> I’m not
                              planning to change that. I don’t think
                              there is any need to change it.</span></p>
                          <p class="MsoNormal">><br>
                            > Yeah, but when HMM doesn't provide both
                            the start and the end hock of the
                            invalidation this way won't work any more.<br>
                            ><br>
                            > So we need to find a solution for this,<br>
                            > Christian.</p>
                          <p class="MsoNormal"> </p>
                          <p class="MsoNormal">My whole argument is that
                            you don’t need to hold the read lock until
                            the invalidate_range_end. Just read_lock and
                            read_unlock in the invalidate_range_start
                            function.</p>
                          <p class="MsoNormal"> </p>
                          <p class="MsoNormal">Regards,</p>
                          <p class="MsoNormal"
                            style="margin-bottom:12.0pt">  Felix</p>
                          <p class="MsoNormal"><span
                              style="color:windowtext"> </span></p>
                          <div>
                            <div style="border:none; border-top:solid
                              #E1E1E1 1.0pt; padding:3.0pt 0in 0in 0in">
                              <p class="MsoNormal"><b><span
                                    style="color:windowtext">From:</span></b><span
                                  style="color:windowtext"> Koenig,
                                  Christian
                                  <br>
                                  <b>Sent:</b> Thursday, September 27,
                                  2018 9:22 AM<br>
                                  <b>To:</b> Kuehling, Felix <<a
                                    href="mailto:Felix.Kuehling@amd.com"
                                    moz-do-not-send="true">Felix.Kuehling@amd.com</a>><br>
                                  <b>Cc:</b> Yang, Philip <<a
                                    href="mailto:Philip.Yang@amd.com"
                                    moz-do-not-send="true">Philip.Yang@amd.com</a>>;
                                  <a
                                    href="mailto:amd-gfx@lists.freedesktop.org"
                                    moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>;
                                  Jerome Glisse <<a
                                    href="mailto:j.glisse@gmail.com"
                                    moz-do-not-send="true">j.glisse@gmail.com</a>><br>
                                  <b>Subject:</b> Re: [PATCH]
                                  drm/amdgpu: use HMM mirror callback to
                                  replace mmu notifier v4</span></p>
                            </div>
                          </div>
                          <p class="MsoNormal"> </p>
                          <div>
                            <p class="MsoNormal">Am 27.09.2018 um 15:18
                              schrieb Kuehling, Felix:</p>
                          </div>
                          <blockquote style="margin-top:5.0pt;
                            margin-bottom:5.0pt">
                            <p class="MsoNormal">> The problem is
                              here:<br>
                              ></p>
                            <p class="MsoNormal">>
                              ttm_eu_fence_buffer_objects(&p->ticket,
                              &p->validated, p->fence);</p>
                            <p class="MsoNormal">>
                              amdgpu_mn_unlock(p->mn); </p>
                            <p class="MsoNormal"
                              style="margin-bottom:12.0pt">><br>
                              > We need to hold the lock until the
                              fence is added to the reservation object.<br>
                              ><br>
                              > Otherwise somebody could have changed
                              the page tables just in the moment between
                              the check of
                              amdgpu_ttm_tt_userptr_needs_pages() and
                              adding the fence to the reservation
                              object.</p>
                            <p class="MsoNormal"><span
                                style="color:windowtext">I’m not
                                planning to change that. I don’t think
                                there is any need to change it.</span></p>
                          </blockquote>
                          <p class="MsoNormal"
                            style="margin-bottom:12.0pt"><br>
                            Yeah, but when HMM doesn't provide both the
                            start and the end hock of the invalidation
                            this way won't work any more.<br>
                            <br>
                            So we need to find a solution for this,<br>
                            Christian.</p>
                          <blockquote style="margin-top:5.0pt;
                            margin-bottom:5.0pt">
                            <p class="MsoNormal"><span
                                style="color:windowtext"> </span></p>
                            <p class="MsoNormal"><span
                                style="color:windowtext">Regards,</span></p>
                            <p class="MsoNormal"><span
                                style="color:windowtext">  Felix</span></p>
                            <p class="MsoNormal"><span
                                style="color:windowtext"> </span></p>
                            <div>
                              <div style="border:none; border-top:solid
                                #E1E1E1 1.0pt; padding:3.0pt 0in 0in
                                0in">
                                <p class="MsoNormal"><b><span
                                      style="color:windowtext">From:</span></b><span
                                    style="color:windowtext"> Koenig,
                                    Christian
                                    <br>
                                    <b>Sent:</b> Thursday, September 27,
                                    2018 7:24 AM<br>
                                    <b>To:</b> Kuehling, Felix <a
                                      href="mailto:Felix.Kuehling@amd.com"
                                      moz-do-not-send="true"><Felix.Kuehling@amd.com></a><br>
                                    <b>Cc:</b> Yang, Philip <a
                                      href="mailto:Philip.Yang@amd.com"
                                      moz-do-not-send="true"><Philip.Yang@amd.com></a>;
                                    <a
                                      href="mailto:amd-gfx@lists.freedesktop.org"
                                      moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>;
                                    Jerome Glisse
                                    <a href="mailto:j.glisse@gmail.com"
                                      moz-do-not-send="true"><j.glisse@gmail.com></a><br>
                                    <b>Subject:</b> Re: [PATCH]
                                    drm/amdgpu: use HMM mirror callback
                                    to replace mmu notifier v4</span></p>
                              </div>
                            </div>
                            <p class="MsoNormal"> </p>
                            <div>
                              <p class="MsoNormal">Am 27.09.2018 um
                                13:08 schrieb Kuehling, Felix:</p>
                            </div>
                            <blockquote style="margin-top:5.0pt;
                              margin-bottom:5.0pt">
                              <p class="MsoNormal">> We double check
                                that there wasn't any page table
                                modification while we prepared the
                                submission and restart the whole process
                                when there actually was some update.<br>
                                ><br>
                                > The reason why we need to do this
                                is here:<br>
                                ></p>
                              <p class="MsoNormal">>       
                                ttm_eu_fence_buffer_objects(&p->ticket,
                                &p->validated, p->fence);<br>
                                >        amdgpu_mn_unlock(p->mn);</p>
                              <p class="MsoNormal"
                                style="margin-bottom:12.0pt">><br>
                                > Only after the new fence is added
                                to the buffer object we can release the
                                lock so that any invalidation will now
                                block on our command submission to
                                finish before it modifies the page
                                table.<br>
                                <br>
                              </p>
                              <p class="MsoNormal"><span
                                  style="color:windowtext">I don’t see
                                  why this requires holding the
                                  read-lock until invalidate_range_end.
                                  amdgpu_ttm_tt_affect_userptr gets
                                  called while the mn read-lock is held
                                  in invalidate_range_start notifier.</span></p>
                            </blockquote>
                            <p class="MsoNormal"
                              style="margin-bottom:12.0pt"><br>
                              That's not related to
                              amdgpu_ttm_tt_affect_userptr(), this
                              function could actually be called outside
                              the lock.<br>
                              <br>
                              The problem is here:</p>
                            <blockquote style="margin-top:5.0pt;
                              margin-bottom:5.0pt">
                              <p class="MsoNormal">ttm_eu_fence_buffer_objects(&p->ticket,
                                &p->validated, p->fence);</p>
                            </blockquote>
                            <blockquote style="margin-top:5.0pt;
                              margin-bottom:5.0pt">
                              <p class="MsoNormal">amdgpu_mn_unlock(p->mn);
                              </p>
                            </blockquote>
                            <p class="MsoNormal"
                              style="margin-bottom:12.0pt"><br>
                              We need to hold the lock until the fence
                              is added to the reservation object.<br>
                              <br>
                              Otherwise somebody could have changed the
                              page tables just in the moment between the
                              check of
                              amdgpu_ttm_tt_userptr_needs_pages() and
                              adding the fence to the reservation
                              object.<br>
                              <br>
                              Regards,<br>
                              Christian.<br>
                              <br>
                              <br>
                            </p>
                            <blockquote style="margin-top:5.0pt;
                              margin-bottom:5.0pt">
                              <p class="MsoNormal"><span
                                  style="color:windowtext"> </span></p>
                              <p class="MsoNormal"><span
                                  style="color:windowtext">Regards,</span></p>
                              <p class="MsoNormal"><span
                                  style="color:windowtext">  Felix</span></p>
                              <p class="MsoNormal"><span
                                  style="color:windowtext"> </span></p>
                              <div>
                                <div style="border:none;
                                  border-top:solid #E1E1E1 1.0pt;
                                  padding:3.0pt 0in 0in 0in">
                                  <p class="MsoNormal"><b><span
                                        style="color:windowtext">From:</span></b><span
                                      style="color:windowtext"> Koenig,
                                      Christian
                                      <br>
                                      <b>Sent:</b> Thursday, September
                                      27, 2018 5:27 AM<br>
                                      <b>To:</b> Kuehling, Felix <a
                                        href="mailto:Felix.Kuehling@amd.com"
                                        moz-do-not-send="true"><Felix.Kuehling@amd.com></a><br>
                                      <b>Cc:</b> Yang, Philip <a
                                        href="mailto:Philip.Yang@amd.com"
                                        moz-do-not-send="true"><Philip.Yang@amd.com></a>;
                                      <a
                                        href="mailto:amd-gfx@lists.freedesktop.org"
                                        moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>;
                                      Jerome Glisse
                                      <a
                                        href="mailto:j.glisse@gmail.com"
                                        moz-do-not-send="true"><j.glisse@gmail.com></a><br>
                                      <b>Subject:</b> Re: [PATCH]
                                      drm/amdgpu: use HMM mirror
                                      callback to replace mmu notifier
                                      v4</span></p>
                                </div>
                              </div>
                              <p class="MsoNormal"> </p>
                              <div>
                                <p class="MsoNormal"
                                  style="margin-bottom:12.0pt">That is
                                  correct, but take a look what we do
                                  when after calling the
                                  amdgpu_mn_read_lock():<br>
                                  <br>
                                  <br>
                                </p>
                                <blockquote style="margin-top:5.0pt;
                                  margin-bottom:5.0pt">
                                  <p class="MsoNormal">        /* No
                                    memory allocation is allowed while
                                    holding the mn lock */<br>
                                            amdgpu_mn_lock(p->mn);<br>
                                           
                                    amdgpu_bo_list_for_each_userptr_entry(e,
                                    p->bo_list) {<br>
                                                    struct amdgpu_bo *bo
                                    = ttm_to_amdgpu_bo(e->tv.bo);<br>
                                    <br>
                                                    if
                                    (amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm))
                                    {<br>
                                                            r =
                                    -ERESTARTSYS;<br>
                                                            goto
                                    error_abort;<br>
                                                    }<br>
                                            }</p>
                                </blockquote>
                                <p class="MsoNormal"
                                  style="margin-bottom:12.0pt"><br>
                                  We double check that there wasn't any
                                  page table modification while we
                                  prepared the submission and restart
                                  the whole process when there actually
                                  was some update.<br>
                                  <br>
                                  The reason why we need to do this is
                                  here:<br>
                                  <br>
                                </p>
                                <blockquote style="margin-top:5.0pt;
                                  margin-bottom:5.0pt">
                                  <p class="MsoNormal">       
                                    ttm_eu_fence_buffer_objects(&p->ticket,
                                    &p->validated, p->fence);<br>
                                            amdgpu_mn_unlock(p->mn);</p>
                                </blockquote>
                                <p class="MsoNormal"><br>
                                  Only after the new fence is added to
                                  the buffer object we can release the
                                  lock so that any invalidation will now
                                  block on our command submission to
                                  finish before it modifies the page
                                  table.<br>
                                  <br>
                                  The only other option would be to add
                                  the fence first and then check if
                                  there was any update to the page
                                  tables.<br>
                                  <br>
                                  The issue with that approach is that
                                  adding a fence can't be made undone,
                                  so if we find that there actually was
                                  an update to the page tables we would
                                  need to somehow turn the CS into a
                                  dummy (e.g. overwrite all IBs with
                                  NOPs or something like that) and still
                                  submit it.<br>
                                  <br>
                                  Not sure if that is actually possible.<br>
                                  <br>
                                  Regards,<br>
                                  Christian.<br>
                                  <br>
                                  Am 27.09.2018 um 10:47 schrieb
                                  Kuehling, Felix:</p>
                              </div>
                              <blockquote style="margin-top:5.0pt;
                                margin-bottom:5.0pt">
                                <p class="MsoNormal">So back to my
                                  previous question:</p>
                                <p class="MsoPlainText"> </p>
                                <p class="MsoPlainText">>> But do
                                  we really need another lock for this?
                                  Wouldn't the
                                </p>
                                <p class="MsoPlainText">>>
                                  re-validation of userptr BOs
                                  (currently calling get_user_pages)
                                  force
                                </p>
                                <p class="MsoPlainText">>>
                                  synchronization with the ongoing page
                                  table invalidation through the
                                </p>
                                <p class="MsoPlainText">>>
                                  mmap_sem or other MM locks?</p>
                                <p class="MsoPlainText">> </p>
                                <p class="MsoPlainText">> No and yes.
                                  We don't hold any other locks while
                                  doing command submission, but I expect
                                  that HMM has its own mechanism to
                                  prevent that.</p>
                                <p class="MsoPlainText">> </p>
                                <p class="MsoPlainText">> Since we
                                  don't modify
                                  amdgpu_mn_lock()/amdgpu_mn_unlock() we
                                  are certainly not using this mechanism
                                  correctly.</p>
                                <p class="MsoNormal"> </p>
                                <p class="MsoNormal">The existing
                                  amdgpu_mn_lock/unlock should block the
                                  MMU notifier while a command
                                  submission is in progress. It should
                                  also block command submission while an
                                  MMU notifier is in progress.</p>
                                <p class="MsoNormal"> </p>
                                <p class="MsoNormal">What we lose with
                                  HMM is the ability to hold a read-lock
                                  for the entire duration of the
                                  invalidate_range_start until
                                  invalidate_range_end. As I understand
                                  it, that lock is meant to prevent new
                                  command submissions while the page
                                  tables are being updated by the
                                  kernel. But my point is, that
                                  get_user_pages or hmm_vma_fault should
                                  do the same kind of thing. Before the
                                  command submission can go ahead, it
                                  needs to update the userptr addresses.
                                  If the page tables are still being
                                  updated, it will block there even
                                  without holding the
                                  amdgpu_mn_read_lock.</p>
                                <p class="MsoNormal"> </p>
                                <p class="MsoNormal">Regards,</p>
                                <p class="MsoNormal">  Felix</p>
                                <p class="MsoNormal"> </p>
                                <div>
                                  <div style="border:none;
                                    border-top:solid #E1E1E1 1.0pt;
                                    padding:3.0pt 0in 0in 0in">
                                    <p class="MsoNormal"><b>From:</b>
                                      Koenig, Christian <br>
                                      <b>Sent:</b> Thursday, September
                                      27, 2018 3:00 AM<br>
                                      <b>To:</b> Kuehling, Felix <a
                                        href="mailto:Felix.Kuehling@amd.com"
                                        moz-do-not-send="true"><Felix.Kuehling@amd.com></a><br>
                                      <b>Cc:</b> Yang, Philip <a
                                        href="mailto:Philip.Yang@amd.com"
                                        moz-do-not-send="true"><Philip.Yang@amd.com></a>;
                                      <a
                                        href="mailto:amd-gfx@lists.freedesktop.org"
                                        moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>;
                                      Jerome Glisse
                                      <a
                                        href="mailto:j.glisse@gmail.com"
                                        moz-do-not-send="true"><j.glisse@gmail.com></a><br>
                                      <b>Subject:</b> RE: [PATCH]
                                      drm/amdgpu: use HMM mirror
                                      callback to replace mmu notifier
                                      v4</p>
                                  </div>
                                </div>
                                <p class="MsoNormal"> </p>
                                <div>
                                  <div>
                                    <p class="MsoNormal">No, that won't
                                      work. We would still run into lock
                                      inversion problems.</p>
                                    <div>
                                      <p class="MsoNormal"> </p>
                                    </div>
                                    <div>
                                      <p class="MsoNormal">What we could
                                        do with the scheduler is to turn
                                        submissions into dummies if we
                                        find that the page tables are
                                        now outdated.</p>
                                    </div>
                                    <div>
                                      <p class="MsoNormal"> </p>
                                    </div>
                                    <div>
                                      <p class="MsoNormal">But that
                                        would be really hacky and I'm
                                        not sure if that would really
                                        work in all cases.</p>
                                    </div>
                                    <div>
                                      <p class="MsoNormal"> </p>
                                    </div>
                                    <div>
                                      <p class="MsoNormal">Christian.</p>
                                    </div>
                                  </div>
                                  <div>
                                    <p class="MsoNormal"> </p>
                                    <div>
                                      <p class="MsoNormal">Am 27.09.2018
                                        08:53 schrieb "Kuehling, Felix"
                                        <<a
                                          href="mailto:Felix.Kuehling@amd.com"
                                          moz-do-not-send="true">Felix.Kuehling@amd.com</a>>:</p>
                                    </div>
                                  </div>
                                </div>
                                <div>
                                  <p class="MsoNormal">I had a chat with
                                    Jerome yesterday. He pointed out
                                    that the new blockable parameter can
                                    be used to infer whether the MMU
                                    notifier is being called  in a
                                    reclaim operation. So if
                                    blockable==true, it should even be
                                    safe to take the BO reservation lock
                                    without problems. I think with that
                                    we should be able to remove the
                                    read-write locking completely and go
                                    back to locking (or try-locking for
                                    blockable==false) the reservation
                                    locks in the MMU notifier?<br>
                                    <br>
                                    Regards,<br>
                                      Felix<br>
                                    <br>
                                    -----Original Message-----<br>
                                    From: amd-gfx <<a
                                      href="mailto:amd-gfx-bounces@lists.freedesktop.org"
                                      moz-do-not-send="true">amd-gfx-bounces@lists.freedesktop.org</a>>
                                    On Behalf Of Christian König<br>
                                    Sent: Saturday, September 15, 2018
                                    3:47 AM<br>
                                    To: Kuehling, Felix <<a
                                      href="mailto:Felix.Kuehling@amd.com"
                                      moz-do-not-send="true">Felix.Kuehling@amd.com</a>>;
                                    Yang, Philip <<a
                                      href="mailto:Philip.Yang@amd.com"
                                      moz-do-not-send="true">Philip.Yang@amd.com</a>>;
                                    <a
                                      href="mailto:amd-gfx@lists.freedesktop.org"
                                      moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a>;
                                    Jerome Glisse <<a
                                      href="mailto:j.glisse@gmail.com"
                                      moz-do-not-send="true">j.glisse@gmail.com</a>><br>
                                    Subject: Re: [PATCH] drm/amdgpu: use
                                    HMM mirror callback to replace mmu
                                    notifier v4<br>
                                    <br>
                                    Am 14.09.2018 um 22:21 schrieb Felix
                                    Kuehling:<br>
                                    > On 2018-09-14 01:52 PM,
                                    Christian König wrote:<br>
                                    >> Am 14.09.2018 um 19:47
                                    schrieb Philip Yang:<br>
                                    >>> On 2018-09-14 03:51 AM,
                                    Christian König wrote:<br>
                                    >>>> Am 13.09.2018 um
                                    23:51 schrieb Felix Kuehling:<br>
                                    >>>>> On 2018-09-13
                                    04:52 PM, Philip Yang wrote:<br>
                                    >>>>> [SNIP]<br>
                                    >>>>>>    +   
                                    amdgpu_mn_read_unlock(amn);<br>
                                    >>>>>> +<br>
                                    >>>>>
                                    amdgpu_mn_read_lock/unlock support
                                    recursive locking for multiple <br>
                                    >>>>> overlapping or
                                    nested invalidation ranges. But if
                                    you'r locking <br>
                                    >>>>> and unlocking
                                    in the same function. Is that still
                                    a concern?<br>
                                    >>> I don't understand the
                                    possible recursive case, but<br>
                                    >>> amdgpu_mn_read_lock()
                                    still support recursive locking.<br>
                                    >>>> Well the real
                                    problem is that unlocking them here
                                    won't work.<br>
                                    >>>><br>
                                    >>>> We need to hold the
                                    lock until we are sure that the
                                    operation which <br>
                                    >>>> updates the page
                                    tables is completed.<br>
                                    >>>><br>
                                    >>> The reason for this
                                    change is because hmm mirror has <br>
                                    >>> invalidate_start
                                    callback, no invalidate_end callback<br>
                                    >>><br>
                                    >>> Check mmu_notifier.c
                                    and hmm.c again, below is entire
                                    logic to <br>
                                    >>> update CPU page tables
                                    and callback:<br>
                                    >>><br>
                                    >>> mn lock amn->lock is
                                    used to protect interval tree access
                                    because <br>
                                    >>> user may
                                    submit/register new userptr anytime.<br>
                                    >>> This is same for old
                                    and new way.<br>
                                    >>><br>
                                    >>> step 2 guarantee the
                                    GPU operation is done before
                                    updating CPU page <br>
                                    >>> table.<br>
                                    >>><br>
                                    >>> So I think the change
                                    is safe. We don't need hold mn lock
                                    until the <br>
                                    >>> CPU page tables update
                                    is completed.<br>
                                    >> No, that isn't even
                                    remotely correct. The lock doesn't
                                    protects the <br>
                                    >> interval tree.<br>
                                    >><br>
                                    >>> Old:<br>
                                    >>>     1.
                                    down_read_non_owner(&amn->lock)<br>
                                    >>>     2. loop to handle
                                    BOs from node->bos through
                                    interval tree<br>
                                    >>> amn->object nodes<br>
                                    >>>         gfx: wait for
                                    pending BOs fence operation done,
                                    mark user <br>
                                    >>> pages dirty<br>
                                    >>>         kfd: evict user
                                    queues of the process, wait for
                                    queue <br>
                                    >>> unmap/map operation
                                    done<br>
                                    >>>     3. update CPU page
                                    tables<br>
                                    >>>     4.
                                    up_read(&amn->lock)<br>
                                    >>><br>
                                    >>> New, switch step 3 and
                                    4<br>
                                    >>>     1.
                                    down_read_non_owner(&amn->lock)<br>
                                    >>>     2. loop to handle
                                    BOs from node->bos through
                                    interval tree<br>
                                    >>> amn->object nodes<br>
                                    >>>         gfx: wait for
                                    pending BOs fence operation done,
                                    mark user <br>
                                    >>> pages dirty<br>
                                    >>>         kfd: evict user
                                    queues of the process, wait for
                                    queue <br>
                                    >>> unmap/map operation
                                    done<br>
                                    >>>     3.
                                    up_read(&amn->lock)<br>
                                    >>>     4. update CPU page
                                    tables<br>
                                    >> The lock is there to make
                                    sure that we serialize page table
                                    updates <br>
                                    >> with command submission.<br>
                                    > As I understand it, the idea is
                                    to prevent command submission
                                    (adding <br>
                                    > new fences to BOs) while a page
                                    table invalidation is in progress.<br>
                                    <br>
                                    Yes, exactly.<br>
                                    <br>
                                    > But do we really need another
                                    lock for this? Wouldn't the <br>
                                    > re-validation of userptr BOs
                                    (currently calling get_user_pages)
                                    force <br>
                                    > synchronization with the
                                    ongoing page table invalidation
                                    through the <br>
                                    > mmap_sem or other MM locks?<br>
                                    <br>
                                    No and yes. We don't hold any other
                                    locks while doing command
                                    submission, but I expect that HMM
                                    has its own mechanism to prevent
                                    that.<br>
                                    <br>
                                    Since we don't modify
                                    amdgpu_mn_lock()/amdgpu_mn_unlock()
                                    we are certainly not using this
                                    mechanism correctly.<br>
                                    <br>
                                    Regards,<br>
                                    Christian.<br>
_______________________________________________<br>
                                    amd-gfx mailing list<br>
                                    <a
                                      href="mailto:amd-gfx@lists.freedesktop.org"
                                      moz-do-not-send="true">amd-gfx@lists.freedesktop.org</a><br>
                                    <a
                                      href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx"
                                      moz-do-not-send="true">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a></p>
                                </div>
                              </blockquote>
                              <p class="MsoNormal"> </p>
                            </blockquote>
                            <p class="MsoNormal"> </p>
                          </blockquote>
                          <p class="MsoNormal"> </p>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </blockquote>
              <br>
            </blockquote>
            <br>
          </div>
        </blockquote>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
amd-gfx mailing list
<a class="moz-txt-link-abbreviated" href="mailto:amd-gfx@lists.freedesktop.org">amd-gfx@lists.freedesktop.org</a>
<a class="moz-txt-link-freetext" href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx">https://lists.freedesktop.org/mailman/listinfo/amd-gfx</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>