<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <br>
    <br>
    <div class="moz-cite-prefix">On 2017年07月20日 22:59, Marek Olšák
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAAxE2A6uFyqhWfJBNzojyMqFmk-AGyQmScwqScMjLutBQX5usA@mail.gmail.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="auto">
        <div><br>
          <div class="gmail_extra"><br>
            <div class="gmail_quote">On Jul 19, 2017 10:21 PM, "zhoucm1"
              <<a moz-do-not-send="true"
                href="mailto:david1.zhou@amd.com">david1.zhou@amd.com</a>>
              wrote:<br type="attribution">
              <blockquote class="quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF">
                  <div class="quoted-text"> <br>
                    <br>
                    <div class="m_-6147598457318751040moz-cite-prefix">On
                      2017年07月19日 23:34, Marek Olšák wrote:<br>
                    </div>
                    <blockquote type="cite">
                      <div dir="auto">
                        <div><br>
                          <div class="gmail_extra"><br>
                            <div class="gmail_quote">On Jul 19, 2017
                              3:36 AM, "zhoucm1" <<a
                                moz-do-not-send="true"
                                href="mailto:david1.zhou@amd.com"
                                target="_blank">david1.zhou@amd.com</a>>

                              wrote:<br type="attribution">
                              <blockquote
                                class="m_-6147598457318751040quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex">
                                <div
                                  class="m_-6147598457318751040quoted-text"><br>
                                  <br>
                                  On 2017年07月19日 04:08, Marek Olšák
                                  wrote:<br>
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex"> From: Marek
                                    Olšák <<a moz-do-not-send="true"
                                      href="mailto:marek.olsak@amd.com"
                                      target="_blank">marek.olsak@amd.com</a>><br>
                                    <br>
                                    For lower overhead in the CS ioctl.<br>
                                    Winsys allocators are not used with
                                    interprocess-sharable resources.<br>
                                  </blockquote>
                                </div>
                                Hi Marek,<br>
                                <br>
                                Could I know from how your this way
                                reduces overhead in CS ioctl? reusing BO
                                to short bo list?<br>
                              </blockquote>
                            </div>
                          </div>
                        </div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">The kernel part of the work
                          hasn't been done yet. The idea is that
                          nonsharable buffers don't have to be
                          revalidated by TTM,</div>
                      </div>
                    </blockquote>
                  </div>
                  OK, Maybe I only can see the whole picture of this
                  idea when you complete kernel part.<br>
                  Out of curious,  why/how can nonsharable buffers be
                  revalidated by TTM without exposing like
                  amdgpu_bo_make_resident api?<br>
                </div>
              </blockquote>
            </div>
          </div>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">I think the idea is that all nonsharable buffers
          will be backed by the same reservation object, so TTM can skip
          buffer validation if no buffer has been moved. It's just an
          optimization for the current design.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <blockquote class="quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF"> <br>
                  With mentioned in another thread, if we can expose
                  make_resident api, we can remove bo_list, even we can
                  remove reservation operation in CS ioctl.<br>
                  And now, I think our bo list is a very bad design,<br>
                  first, umd must create bo list for every command
                  submission, this is a extra cpu overhead compared with
                  traditional way.<br>
                  second, kernel also have to iterate the list, when bo
                  list is too long, like OpenCL program, they always
                  throw several thousands BOs to bo list, reservation
                  must keep these thousands ww_mutex safe, CPU overhead
                  is too big.<br>
                  <br>
                  So I strongly suggest we should expose make_resident
                  api to user space. if cannot, I want to know any
                  specific reason to see if we can solve it.<br>
                </div>
              </blockquote>
            </div>
          </div>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Yeah, I think the BO list idea is likely to die
          sooner or later. It made sense for GL before bindless was a
          thing. Nowadays I don't see much value in it.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">MesaGL will keep tracking the BO list because
          it's a requirement for good GL performance (it determines
          whether to flush IBs before BO synchronization, it allows
          tracking fences for each BO, which are used to determine
          dependencies between IBs, and that all allows async SDMA and
          async compute for GL, which doesn't have separate queues).</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">However, we don't need any BO list at the libdrm
          level and lower. I think a BO_CREATE flag that causes that the
          buffer is added to a kernel-side per-fd BO list would be
          sufficient. How the kernel manages its BO list should be its
          own implementation detail. Initially we can just move the
          current BO list management into the kernel.</div>
      </div>
    </blockquote>
    I guess this idea will make bo list worse, which just decrease umd
    effort, but increase kernel driver complication.<br>
    <br>
    First, from your and Christian's comments, we can get this agreement
    that bo list design is not a good way.<br>
    My proposal of exposing amdgpu_bo_make_resident is to replace bo
    list.<br>
    If we can make all needed bo resident, then we don't need to
    validate it again in cs ioctl, then we don't need their reservation
    lock more. After job pushed to scheduler, then we can un-resident
    BOs.<br>
    Even we can make it for VM bo, then we don't need to check vm update
    again while done in va map ioctl.<br>
    <br>
    If this is got done(eviction has been improved more), I cannot see
    any obvious gap for performance.<br>
    <br>
    What do you think of this proposal of exposing
    amdgpu_bo_make_resident api to user space? Or any other idea we can
    discuss.<br>
    <br>
    If you all agree with, I can volunteer to try with UMD guys.<br>
    <br>
    Regards,<br>
    David Zhou<br>
    <br>
    <blockquote
cite="mid:CAAxE2A6uFyqhWfJBNzojyMqFmk-AGyQmScwqScMjLutBQX5usA@mail.gmail.com"
      type="cite">
      <div dir="auto">
        <div dir="auto"><br>
        </div>
        <div dir="auto">Marek</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <blockquote class="quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF"> <br>
                  <br>
                  Regards,<br>
                  David Zhou
                  <div class="elided-text"><br>
                    <blockquote type="cite">
                      <div dir="auto">
                        <div dir="auto"> so it should remove a lot of
                          kernel overhead and the BO list remains the
                          same.</div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">Marek</div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto"><br>
                        </div>
                        <div dir="auto">
                          <div class="gmail_extra">
                            <div class="gmail_quote">
                              <blockquote
                                class="m_-6147598457318751040quote"
                                style="margin:0 0 0 .8ex;border-left:1px
                                #ccc solid;padding-left:1ex"> <br>
                                Thanks,<br>
                                David Zhou
                                <div
                                  class="m_-6147598457318751040elided-text"><br>
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex"> <br>
                                    v2: It shouldn't crash anymore, but
                                    the kernel will reject the new flag.<br>
                                    ---<br>
                                      src/gallium/drivers/radeon/r60<wbr>0_buffer_common.c

                                    |  7 +++++<br>
                                      src/gallium/drivers/radeon/rad<wbr>eon_winsys.h 
                                        | 20 +++++++++++---<br>
                                      src/gallium/winsys/amdgpu/drm/<wbr>amdgpu_bo.c 
                                         | 36 ++++++++++++++++---------<br>
                                      src/gallium/winsys/radeon/drm/<wbr>radeon_drm_bo.c 

                                     | 27 +++++++++++--------<br>
                                      4 files changed, 62 insertions(+),
                                    28 deletions(-)<br>
                                    <br>
                                    diff --git
                                    a/src/gallium/drivers/radeon/r<wbr>600_buffer_common.c

                                    b/src/gallium/drivers/radeon/r<wbr>600_buffer_common.c<br>
                                    index dd1c209..2747ac4 100644<br>
                                    --- a/src/gallium/drivers/radeon/r<wbr>600_buffer_common.c<br>
                                    +++ b/src/gallium/drivers/radeon/r<wbr>600_buffer_common.c<br>
                                    @@ -160,20 +160,27 @@ void
                                    r600_init_resource_fields(stru<wbr>ct

                                    r600_common_screen *rscreen,<br>
                                            }<br>
                                            /* Tiled textures are
                                    unmappable. Always put them in VRAM.
                                    */<br>
                                            if ((res->b.b.target !=
                                    PIPE_BUFFER &&
                                    !rtex->surface.is_linear) ||<br>
                                                res->flags &
                                    R600_RESOURCE_FLAG_UNMAPPABLE) {<br>
                                                    res->domains =
                                    RADEON_DOMAIN_VRAM;<br>
                                                    res->flags |=
                                    RADEON_FLAG_NO_CPU_ACCESS |<br>
                                                           
                                     RADEON_FLAG_GTT_WC;<br>
                                            }<br>
                                      +     /* Only displayable
                                    single-sample textures can be shared
                                    between<br>
                                    +        * processes. */<br>
                                    +       if (res->b.b.target ==
                                    PIPE_BUFFER ||<br>
                                    +           res->b.b.nr_samples
                                    >= 2 ||<br>
                                    +         
                                     rtex->surface.micro_tile_mode !=
                                    RADEON_MICRO_MODE_DISPLAY)<br>
                                    +               res->flags |=
                                    RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING;<br>
                                    +<br>
                                            /* If VRAM is just stolen
                                    system memory, allow both VRAM and<br>
                                             * GTT, whichever has free
                                    space. If a buffer is evicted from<br>
                                             * VRAM to GTT, it will stay
                                    there.<br>
                                             *<br>
                                             * DRM 3.6.0 has good BO
                                    move throttling, so we can allow
                                    VRAM-only<br>
                                             * placements even with a
                                    low amount of stolen VRAM.<br>
                                             */<br>
                                            if
                                    (!rscreen->info.has_dedicated_<wbr>vram

                                    &&<br>
                                               
                                    (rscreen->info.drm_major < 3
                                    || rscreen->info.drm_minor <
                                    6) &&<br>
                                                res->domains ==
                                    RADEON_DOMAIN_VRAM) {<br>
                                    diff --git
                                    a/src/gallium/drivers/radeon/r<wbr>adeon_winsys.h

                                    b/src/gallium/drivers/radeon/r<wbr>adeon_winsys.h<br>
                                    index 351edcd..0abcb56 100644<br>
                                    --- a/src/gallium/drivers/radeon/r<wbr>adeon_winsys.h<br>
                                    +++ b/src/gallium/drivers/radeon/r<wbr>adeon_winsys.h<br>
                                    @@ -47,20 +47,21 @@ enum
                                    radeon_bo_domain { /* bitfield */<br>
                                          RADEON_DOMAIN_GTT  = 2,<br>
                                          RADEON_DOMAIN_VRAM = 4,<br>
                                          RADEON_DOMAIN_VRAM_GTT =
                                    RADEON_DOMAIN_VRAM |
                                    RADEON_DOMAIN_GTT<br>
                                      };<br>
                                        enum radeon_bo_flag { /*
                                    bitfield */<br>
                                          RADEON_FLAG_GTT_WC =        (1
                                    << 0),<br>
                                          RADEON_FLAG_NO_CPU_ACCESS = (1
                                    << 1),<br>
                                          RADEON_FLAG_NO_SUBALLOC =   (1
                                    << 2),<br>
                                          RADEON_FLAG_SPARSE =        (1
                                    << 3),<br>
                                    +    RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING
                                    = (1 << 4),<br>
                                      };<br>
                                        enum radeon_bo_usage { /*
                                    bitfield */<br>
                                          RADEON_USAGE_READ = 2,<br>
                                          RADEON_USAGE_WRITE = 4,<br>
                                          RADEON_USAGE_READWRITE =
                                    RADEON_USAGE_READ |
                                    RADEON_USAGE_WRITE,<br>
                                            /* The winsys ensures that
                                    the CS submission will be scheduled
                                    after<br>
                                           * previously flushed CSs
                                    referencing this BO in a conflicting
                                    way.<br>
                                           */<br>
                                    @@ -685,28 +686,33 @@ static inline
                                    enum radeon_bo_domain
                                    radeon_domain_from_heap(enum
                                    radeon_heap hea<br>
                                          default:<br>
                                              assert(0);<br>
                                              return (enum
                                    radeon_bo_domain)0;<br>
                                          }<br>
                                      }<br>
                                        static inline unsigned
                                    radeon_flags_from_heap(enum
                                    radeon_heap heap)<br>
                                      {<br>
                                          switch (heap) {<br>
                                          case
                                    RADEON_HEAP_VRAM_NO_CPU_ACCESS<wbr>:<br>
                                    -        return RADEON_FLAG_GTT_WC |
                                    RADEON_FLAG_NO_CPU_ACCESS;<br>
                                    +        return RADEON_FLAG_GTT_WC |<br>
                                    +             
                                     RADEON_FLAG_NO_CPU_ACCESS |<br>
                                    +             
                                     RADEON_FLAG_NO_INTERPROCESS_S<wbr>HARING;<br>
                                    +<br>
                                          case RADEON_HEAP_VRAM:<br>
                                          case RADEON_HEAP_VRAM_GTT:<br>
                                          case RADEON_HEAP_GTT_WC:<br>
                                    -        return RADEON_FLAG_GTT_WC;<br>
                                    +        return RADEON_FLAG_GTT_WC |<br>
                                    +             
                                     RADEON_FLAG_NO_INTERPROCESS_S<wbr>HARING;<br>
                                    +<br>
                                          case RADEON_HEAP_GTT:<br>
                                          default:<br>
                                    -        return 0;<br>
                                    +        return
                                    RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING;<br>
                                          }<br>
                                      }<br>
                                        /* The pb cache bucket is chosen
                                    to minimize pb_cache misses.<br>
                                       * It must be between 0 and 3
                                    inclusive.<br>
                                       */<br>
                                      static inline unsigned
                                    radeon_get_pb_cache_bucket_ind<wbr>ex(enum

                                    radeon_heap heap)<br>
                                      {<br>
                                          switch (heap) {<br>
                                          case
                                    RADEON_HEAP_VRAM_NO_CPU_ACCESS<wbr>:<br>
                                    @@ -724,22 +730,28 @@ static inline
                                    unsigned
                                    radeon_get_pb_cache_bucket_ind<wbr>ex(enum

                                    radeon_heap heap)<br>
                                        /* Return the heap index for
                                    winsys allocators, or -1 on failure.
                                    */<br>
                                      static inline int
                                    radeon_get_heap_index(enum
                                    radeon_bo_domain domain,<br>
                                                                       
                                          enum radeon_bo_flag flags)<br>
                                      {<br>
                                          /* VRAM implies WC (write
                                    combining) */<br>
                                          assert(!(domain &
                                    RADEON_DOMAIN_VRAM) || flags &
                                    RADEON_FLAG_GTT_WC);<br>
                                          /* NO_CPU_ACCESS implies VRAM
                                    only. */<br>
                                          assert(!(flags &
                                    RADEON_FLAG_NO_CPU_ACCESS) || domain
                                    == RADEON_DOMAIN_VRAM);<br>
                                      +    /* Resources with
                                    interprocess sharing don't use any
                                    winsys allocators. */<br>
                                    +    if (!(flags &
                                    RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING))<br>
                                    +        return -1;<br>
                                    +<br>
                                          /* Unsupported flags:
                                    NO_SUBALLOC, SPARSE. */<br>
                                    -    if (flags &
                                    ~(RADEON_FLAG_GTT_WC |
                                    RADEON_FLAG_NO_CPU_ACCESS))<br>
                                    +    if (flags &
                                    ~(RADEON_FLAG_GTT_WC |<br>
                                    +                 
                                    RADEON_FLAG_NO_CPU_ACCESS |<br>
                                    +                 
                                    RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING))<br>
                                              return -1;<br>
                                            switch (domain) {<br>
                                          case RADEON_DOMAIN_VRAM:<br>
                                              if (flags &
                                    RADEON_FLAG_NO_CPU_ACCESS)<br>
                                                  return
                                    RADEON_HEAP_VRAM_NO_CPU_ACCESS<wbr>;<br>
                                              else<br>
                                                  return
                                    RADEON_HEAP_VRAM;<br>
                                          case RADEON_DOMAIN_VRAM_GTT:<br>
                                              return
                                    RADEON_HEAP_VRAM_GTT;<br>
                                    diff --git
                                    a/src/gallium/winsys/amdgpu/dr<wbr>m/amdgpu_bo.c

                                    b/src/gallium/winsys/amdgpu/dr<wbr>m/amdgpu_bo.c<br>
                                    index 97bbe23..06b8198 100644<br>
                                    --- a/src/gallium/winsys/amdgpu/dr<wbr>m/amdgpu_bo.c<br>
                                    +++ b/src/gallium/winsys/amdgpu/dr<wbr>m/amdgpu_bo.c<br>
                                    @@ -31,20 +31,24 @@<br>
                                        #include "amdgpu_cs.h"<br>
                                        #include "os/os_time.h"<br>
                                      #include
                                    "state_tracker/drm_driver.h"<br>
                                      #include <amdgpu_drm.h><br>
                                      #include <xf86drm.h><br>
                                      #include <stdio.h><br>
                                      #include <inttypes.h><br>
                                      +#ifndef
                                    AMDGPU_GEM_CREATE_NO_INTERPROC<wbr>ESS_SHARING<br>
                                    +#define
                                    AMDGPU_GEM_CREATE_NO_INTERPROC<wbr>ESS_SHARING

                                    (1 << 6)<br>
                                    +#endif<br>
                                    +<br>
                                      /* Set to 1 for verbose output
                                    showing committed sparse buffer
                                    ranges. */<br>
                                      #define DEBUG_SPARSE_COMMITS 0<br>
                                        struct
                                    amdgpu_sparse_backing_chunk {<br>
                                         uint32_t begin, end;<br>
                                      };<br>
                                        static struct pb_buffer *<br>
                                      amdgpu_bo_create(struct
                                    radeon_winsys *rws,<br>
                                                       uint64_t size,<br>
                                    @@ -395,20 +399,22 @@ static struct
                                    amdgpu_winsys_bo
                                    *amdgpu_create_bo(struct
                                    amdgpu_winsys *ws,<br>
                                           if (initial_domain &
                                    RADEON_DOMAIN_VRAM)<br>
                                            request.preferred_heap |=
                                    AMDGPU_GEM_DOMAIN_VRAM;<br>
                                         if (initial_domain &
                                    RADEON_DOMAIN_GTT)<br>
                                            request.preferred_heap |=
                                    AMDGPU_GEM_DOMAIN_GTT;<br>
                                           if (flags &
                                    RADEON_FLAG_NO_CPU_ACCESS)<br>
                                            request.flags |=
                                    AMDGPU_GEM_CREATE_NO_CPU_ACCES<wbr>S;<br>
                                         if (flags &
                                    RADEON_FLAG_GTT_WC)<br>
                                            request.flags |=
                                    AMDGPU_GEM_CREATE_CPU_GTT_USWC<wbr>;<br>
                                    +   if (flags &
                                    RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING)<br>
                                    +      request.flags |=
                                    AMDGPU_GEM_CREATE_NO_INTERPROC<wbr>ESS_SHARING;<br>
                                           r =
                                    amdgpu_bo_alloc(ws->dev,
                                    &request, &buf_handle);<br>
                                         if (r) {<br>
                                            fprintf(stderr, "amdgpu:
                                    Failed to allocate a buffer:\n");<br>
                                            fprintf(stderr, "amdgpu:   
                                    size      : %"PRIu64" bytes\n",
                                    size);<br>
                                            fprintf(stderr, "amdgpu:   
                                    alignment : %u bytes\n", alignment);<br>
                                            fprintf(stderr, "amdgpu:   
                                    domains   : %u\n", initial_domain);<br>
                                            goto error_bo_alloc;<br>
                                         }<br>
                                      @@ -1127,21 +1133,21 @@ static
                                    void amdgpu_buffer_set_metadata(str<wbr>uct
                                    pb_buffer *_buf,<br>
                                        static struct pb_buffer *<br>
                                      amdgpu_bo_create(struct
                                    radeon_winsys *rws,<br>
                                                       uint64_t size,<br>
                                                       unsigned
                                    alignment,<br>
                                                       enum
                                    radeon_bo_domain domain,<br>
                                                       enum
                                    radeon_bo_flag flags)<br>
                                      {<br>
                                         struct amdgpu_winsys *ws =
                                    amdgpu_winsys(rws);<br>
                                         struct amdgpu_winsys_bo *bo;<br>
                                    -   unsigned usage = 0,
                                    pb_cache_bucket;<br>
                                    +   unsigned usage = 0,
                                    pb_cache_bucket = 0;<br>
                                           /* VRAM implies WC. This is
                                    not optional. */<br>
                                         assert(!(domain &
                                    RADEON_DOMAIN_VRAM) || flags &
                                    RADEON_FLAG_GTT_WC);<br>
                                           /* NO_CPU_ACCESS is valid
                                    with VRAM only. */<br>
                                         assert(domain ==
                                    RADEON_DOMAIN_VRAM || !(flags &
                                    RADEON_FLAG_NO_CPU_ACCESS));<br>
                                           /* Sub-allocate small buffers
                                    from slabs. */<br>
                                         if (!(flags &
                                    (RADEON_FLAG_NO_SUBALLOC |
                                    RADEON_FLAG_SPARSE)) &&<br>
                                             size <= (1 <<
                                    AMDGPU_SLAB_MAX_SIZE_LOG2)
                                    &&<br>
                                    @@ -1182,48 +1188,52 @@ no_slab:<br>
                                         /* This flag is irrelevant for
                                    the cache. */<br>
                                         flags &=
                                    ~RADEON_FLAG_NO_SUBALLOC;<br>
                                           /* Align size to page size.
                                    This is the minimum alignment for
                                    normal<br>
                                          * BOs. Aligning this here
                                    helps the cached bufmgr. Especially
                                    small BOs,<br>
                                          * like constant/uniform
                                    buffers, can benefit from better and
                                    more reuse.<br>
                                          */<br>
                                         size = align64(size,
                                    ws->info.gart_page_size);<br>
                                         alignment = align(alignment,
                                    ws->info.gart_page_size);<br>
                                      -   int heap =
                                    radeon_get_heap_index(domain,
                                    flags);<br>
                                    -   assert(heap >= 0 &&
                                    heap < RADEON_MAX_CACHED_HEAPS);<br>
                                    -   usage = 1 << heap; /* Only
                                    set one usage bit for each heap. */<br>
                                    +   bool use_reusable_pool = flags
                                    & RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING;<br>
                                      -   pb_cache_bucket =
                                    radeon_get_pb_cache_bucket_ind<wbr>ex(heap);<br>
                                    -   assert(pb_cache_bucket <
                                    ARRAY_SIZE(ws->bo_cache.bucket<wbr>s));<br>
                                    +   if (use_reusable_pool) {<br>
                                    +       int heap =
                                    radeon_get_heap_index(domain,
                                    flags);<br>
                                    +       assert(heap >= 0
                                    && heap <
                                    RADEON_MAX_CACHED_HEAPS);<br>
                                    +       usage = 1 << heap; /*
                                    Only set one usage bit for each
                                    heap. */<br>
                                      -   /* Get a buffer from the
                                    cache. */<br>
                                    -   bo = (struct amdgpu_winsys_bo*)<br>
                                    -       
                                    pb_cache_reclaim_buffer(&ws->b<wbr>o_cache,

                                    size, alignment, usage,<br>
                                    -                               
                                    pb_cache_bucket);<br>
                                    -   if (bo)<br>
                                    -      return &bo->base;<br>
                                    +       pb_cache_bucket =
                                    radeon_get_pb_cache_bucket_ind<wbr>ex(heap);<br>
                                    +       assert(pb_cache_bucket <
                                    ARRAY_SIZE(ws->bo_cache.bucket<wbr>s));<br>
                                    +<br>
                                    +       /* Get a buffer from the
                                    cache. */<br>
                                    +       bo = (struct
                                    amdgpu_winsys_bo*)<br>
                                    +           
                                    pb_cache_reclaim_buffer(&ws->b<wbr>o_cache,

                                    size, alignment, usage,<br>
                                    +                                   
                                    pb_cache_bucket);<br>
                                    +       if (bo)<br>
                                    +          return &bo->base;<br>
                                    +   }<br>
                                           /* Create a new one. */<br>
                                         bo = amdgpu_create_bo(ws, size,
                                    alignment, usage, domain, flags,<br>
                                                             
                                     pb_cache_bucket);<br>
                                         if (!bo) {<br>
                                            /* Clear the cache and try
                                    again. */<br>
                                           
                                    pb_slabs_reclaim(&ws->bo_slabs<wbr>);<br>
                                           
                                    pb_cache_release_all_buffers(&<wbr>ws->bo_cache);<br>
                                            bo = amdgpu_create_bo(ws,
                                    size, alignment, usage, domain,
                                    flags,<br>
                                                                 
                                    pb_cache_bucket);<br>
                                            if (!bo)<br>
                                               return NULL;<br>
                                         }<br>
                                      - 
                                     bo->u.real.use_reusable_pool =
                                    true;<br>
                                    +   bo->u.real.use_reusable_pool
                                    = use_reusable_pool;<br>
                                         return &bo->base;<br>
                                      }<br>
                                        static struct pb_buffer
                                    *amdgpu_bo_from_handle(struct
                                    radeon_winsys *rws,<br>
                                                                       
                                                 struct winsys_handle
                                    *whandle,<br>
                                                                       
                                                 unsigned *stride,<br>
                                                                       
                                                 unsigned *offset)<br>
                                      {<br>
                                         struct amdgpu_winsys *ws =
                                    amdgpu_winsys(rws);<br>
                                         struct amdgpu_winsys_bo *bo;<br>
                                    diff --git
                                    a/src/gallium/winsys/radeon/dr<wbr>m/radeon_drm_bo.c

                                    b/src/gallium/winsys/radeon/dr<wbr>m/radeon_drm_bo.c<br>
                                    index 8027a5f..15e9d38 100644<br>
                                    --- a/src/gallium/winsys/radeon/dr<wbr>m/radeon_drm_bo.c<br>
                                    +++ b/src/gallium/winsys/radeon/dr<wbr>m/radeon_drm_bo.c<br>
                                    @@ -907,21 +907,21 @@ static void
                                    radeon_bo_set_metadata(struct
                                    pb_buffer *_buf,<br>
                                        static struct pb_buffer *<br>
                                      radeon_winsys_bo_create(struct
                                    radeon_winsys *rws,<br>
                                                              uint64_t
                                    size,<br>
                                                              unsigned
                                    alignment,<br>
                                                              enum
                                    radeon_bo_domain domain,<br>
                                                              enum
                                    radeon_bo_flag flags)<br>
                                      {<br>
                                          struct radeon_drm_winsys *ws =
                                    radeon_drm_winsys(rws);<br>
                                          struct radeon_bo *bo;<br>
                                    -    unsigned usage = 0,
                                    pb_cache_bucket;<br>
                                    +    unsigned usage = 0,
                                    pb_cache_bucket = 0;<br>
                                            assert(!(flags &
                                    RADEON_FLAG_SPARSE)); /* not
                                    supported */<br>
                                            /* Only 32-bit sizes are
                                    supported. */<br>
                                          if (size > UINT_MAX)<br>
                                              return NULL;<br>
                                            /* VRAM implies WC. This is
                                    not optional. */<br>
                                          if (domain &
                                    RADEON_DOMAIN_VRAM)<br>
                                              flags |=
                                    RADEON_FLAG_GTT_WC;<br>
                                    @@ -962,46 +962,51 @@ no_slab:<br>
                                          /* This flag is irrelevant for
                                    the cache. */<br>
                                          flags &=
                                    ~RADEON_FLAG_NO_SUBALLOC;<br>
                                            /* Align size to page size.
                                    This is the minimum alignment for
                                    normal<br>
                                           * BOs. Aligning this here
                                    helps the cached bufmgr. Especially
                                    small BOs,<br>
                                           * like constant/uniform
                                    buffers, can benefit from better and
                                    more reuse.<br>
                                           */<br>
                                          size = align(size,
                                    ws->info.gart_page_size);<br>
                                          alignment = align(alignment,
                                    ws->info.gart_page_size);<br>
                                      -    int heap =
                                    radeon_get_heap_index(domain,
                                    flags);<br>
                                    -    assert(heap >= 0 &&
                                    heap < RADEON_MAX_CACHED_HEAPS);<br>
                                    -    usage = 1 << heap; /*
                                    Only set one usage bit for each
                                    heap. */<br>
                                    +    bool use_reusable_pool = flags
                                    & RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING;<br>
                                      -    pb_cache_bucket =
                                    radeon_get_pb_cache_bucket_ind<wbr>ex(heap);<br>
                                    -    assert(pb_cache_bucket <
                                    ARRAY_SIZE(ws->bo_cache.bucket<wbr>s));<br>
                                    +    /* Shared resources don't use
                                    cached heaps. */<br>
                                    +    if (use_reusable_pool) {<br>
                                    +        int heap =
                                    radeon_get_heap_index(domain,
                                    flags);<br>
                                    +        assert(heap >= 0
                                    && heap <
                                    RADEON_MAX_CACHED_HEAPS);<br>
                                    +        usage = 1 << heap; /*
                                    Only set one usage bit for each
                                    heap. */<br>
                                      -    bo =
                                    radeon_bo(pb_cache_reclaim_buf<wbr>fer(&ws->bo_cache,

                                    size, alignment,<br>
                                    -                                   
                                           usage, pb_cache_bucket));<br>
                                    -    if (bo)<br>
                                    -        return &bo->base;<br>
                                    +        pb_cache_bucket =
                                    radeon_get_pb_cache_bucket_ind<wbr>ex(heap);<br>
                                    +        assert(pb_cache_bucket <
                                    ARRAY_SIZE(ws->bo_cache.bucket<wbr>s));<br>
                                    +<br>
                                    +        bo =
                                    radeon_bo(pb_cache_reclaim_buf<wbr>fer(&ws->bo_cache,

                                    size, alignment,<br>
                                    +                                   
                                               usage, pb_cache_bucket));<br>
                                    +        if (bo)<br>
                                    +            return
                                    &bo->base;<br>
                                    +    }<br>
                                            bo = radeon_create_bo(ws,
                                    size, alignment, usage, domain,
                                    flags,<br>
                                                               
                                    pb_cache_bucket);<br>
                                          if (!bo) {<br>
                                              /* Clear the cache and try
                                    again. */<br>
                                              if
                                    (ws->info.has_virtual_memory)<br>
                                                 
                                    pb_slabs_reclaim(&ws->bo_slabs<wbr>);<br>
                                             
                                    pb_cache_release_all_buffers(&<wbr>ws->bo_cache);<br>
                                              bo = radeon_create_bo(ws,
                                    size, alignment, usage, domain,
                                    flags,<br>
                                                                   
                                    pb_cache_bucket);<br>
                                              if (!bo)<br>
                                                  return NULL;<br>
                                          }<br>
                                      -   
                                    bo->u.real.use_reusable_pool =
                                    true;<br>
                                    +    bo->u.real.use_reusable_pool
                                    = use_reusable_pool;<br>
                                           
                                    mtx_lock(&ws->bo_handles_mutex<wbr>);<br>
                                         
                                    util_hash_table_set(ws->bo_han<wbr>dles,

                                    (void*)(uintptr_t)bo->handle,
                                    bo);<br>
                                         
                                    mtx_unlock(&ws->bo_handles_mut<wbr>ex);<br>
                                            return &bo->base;<br>
                                      }<br>
                                        static struct pb_buffer
                                    *radeon_winsys_bo_from_ptr(str<wbr>uct
                                    radeon_winsys *rws,<br>
                                                                       
                                                     void *pointer,
                                    uint64_t size)<br>
                                  </blockquote>
                                  <br>
                                </div>
                              </blockquote>
                            </div>
                          </div>
                          <div class="gmail_extra" dir="auto"><br>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                    <br>
                  </div>
                </div>
              </blockquote>
            </div>
            <br>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>