<div dir="auto"><div><br><div class="gmail_extra"><br><div class="gmail_quote">On Jul 19, 2017 10:21 PM, "zhoucm1" <<a href="mailto:david1.zhou@amd.com">david1.zhou@amd.com</a>> wrote:<br type="attribution"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF"><div class="quoted-text">
    <br>
    <br>
    <div class="m_-6147598457318751040moz-cite-prefix">On 2017年07月19日 23:34, Marek Olšák
      wrote:<br>
    </div>
    <blockquote type="cite">
      
      <div dir="auto">
        <div><br>
          <div class="gmail_extra"><br>
            <div class="gmail_quote">On Jul 19, 2017 3:36 AM, "zhoucm1"
              <<a href="mailto:david1.zhou@amd.com" target="_blank">david1.zhou@amd.com</a>>
              wrote:<br type="attribution">
              <blockquote class="m_-6147598457318751040quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div class="m_-6147598457318751040quoted-text"><br>
                  <br>
                  On 2017年07月19日 04:08, Marek Olšák wrote:<br>
                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    From: Marek Olšák <<a href="mailto:marek.olsak@amd.com" target="_blank">marek.olsak@amd.com</a>><br>
                    <br>
                    For lower overhead in the CS ioctl.<br>
                    Winsys allocators are not used with
                    interprocess-sharable resources.<br>
                  </blockquote>
                </div>
                Hi Marek,<br>
                <br>
                Could I know from how your this way reduces overhead in
                CS ioctl? reusing BO to short bo list?<br>
              </blockquote>
            </div>
          </div>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">The kernel part of the work hasn't been done
          yet. The idea is that nonsharable buffers don't have to be
          revalidated by TTM,</div>
      </div>
    </blockquote></div>
    OK, Maybe I only can see the whole picture of this idea when you
    complete kernel part.<br>
    Out of curious,  why/how can nonsharable buffers be revalidated by
    TTM without exposing like amdgpu_bo_make_resident api?<br></div></blockquote></div></div></div><div dir="auto"><br></div><div dir="auto">I think the idea is that all nonsharable buffers will be backed by the same reservation object, so TTM can skip buffer validation if no buffer has been moved. It's just an optimization for the current design.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF">
    <br>
    With mentioned in another thread, if we can expose make_resident
    api, we can remove bo_list, even we can remove reservation operation
    in CS ioctl.<br>
    And now, I think our bo list is a very bad design,<br>
    first, umd must create bo list for every command submission, this is
    a extra cpu overhead compared with traditional way.<br>
    second, kernel also have to iterate the list, when bo list is too
    long, like OpenCL program, they always throw several thousands BOs
    to bo list, reservation must keep these thousands ww_mutex safe, CPU
    overhead is too big.<br>
    <br>
    So I strongly suggest we should expose make_resident api to user
    space. if cannot, I want to know any specific reason to see if we
    can solve it.<br></div></blockquote></div></div></div><div dir="auto"><br></div><div dir="auto">Yeah, I think the BO list idea is likely to die sooner or later. It made sense for GL before bindless was a thing. Nowadays I don't see much value in it.</div><div dir="auto"><br></div><div dir="auto">MesaGL will keep tracking the BO list because it's a requirement for good GL performance (it determines whether to flush IBs before BO synchronization, it allows tracking fences for each BO, which are used to determine dependencies between IBs, and that all allows async SDMA and async compute for GL, which doesn't have separate queues).</div><div dir="auto"><br></div><div dir="auto">However, we don't need any BO list at the libdrm level and lower. I think a BO_CREATE flag that causes that the buffer is added to a kernel-side per-fd BO list would be sufficient. How the kernel manages its BO list should be its own implementation detail. Initially we can just move the current BO list management into the kernel.</div><div dir="auto"><br></div><div dir="auto">Marek</div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF">
    <br>
    <br>
    Regards,<br>
    David Zhou<div class="elided-text"><br>
    <blockquote type="cite">
      <div dir="auto">
        <div dir="auto"> so it should remove a lot of kernel overhead
          and the BO list remains the same.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Marek</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">
          <div class="gmail_extra">
            <div class="gmail_quote">
              <blockquote class="m_-6147598457318751040quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <br>
                Thanks,<br>
                David Zhou
                <div class="m_-6147598457318751040elided-text"><br>
                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <br>
                    v2: It shouldn't crash anymore, but the kernel will
                    reject the new flag.<br>
                    ---<br>
                      src/gallium/drivers/radeon/r60<wbr>0_buffer_common.c
                    |  7 +++++<br>
                      src/gallium/drivers/radeon/rad<wbr>eon_winsys.h   
                      | 20 +++++++++++---<br>
                      src/gallium/winsys/amdgpu/drm/<wbr>amdgpu_bo.c   
                       | 36 ++++++++++++++++---------<br>
                      src/gallium/winsys/radeon/drm/<wbr>radeon_drm_bo.c 
                     | 27 +++++++++++--------<br>
                      4 files changed, 62 insertions(+), 28 deletions(-)<br>
                    <br>
                    diff --git a/src/gallium/drivers/radeon/r<wbr>600_buffer_common.c
                    b/src/gallium/drivers/radeon/r<wbr>600_buffer_common.c<br>
                    index dd1c209..2747ac4 100644<br>
                    --- a/src/gallium/drivers/radeon/r<wbr>600_buffer_common.c<br>
                    +++ b/src/gallium/drivers/radeon/r<wbr>600_buffer_common.c<br>
                    @@ -160,20 +160,27 @@ void
                    r600_init_resource_fields(stru<wbr>ct
                    r600_common_screen *rscreen,<br>
                            }<br>
                            /* Tiled textures are unmappable. Always put
                    them in VRAM. */<br>
                            if ((res->b.b.target != PIPE_BUFFER
                    && !rtex->surface.is_linear) ||<br>
                                res->flags &
                    R600_RESOURCE_FLAG_UNMAPPABLE) {<br>
                                    res->domains =
                    RADEON_DOMAIN_VRAM;<br>
                                    res->flags |=
                    RADEON_FLAG_NO_CPU_ACCESS |<br>
                                             RADEON_FLAG_GTT_WC;<br>
                            }<br>
                      +     /* Only displayable single-sample textures
                    can be shared between<br>
                    +        * processes. */<br>
                    +       if (res->b.b.target == PIPE_BUFFER ||<br>
                    +           res->b.b.nr_samples >= 2 ||<br>
                    +           rtex->surface.micro_tile_mode !=
                    RADEON_MICRO_MODE_DISPLAY)<br>
                    +               res->flags |=
                    RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING;<br>
                    +<br>
                            /* If VRAM is just stolen system memory,
                    allow both VRAM and<br>
                             * GTT, whichever has free space. If a
                    buffer is evicted from<br>
                             * VRAM to GTT, it will stay there.<br>
                             *<br>
                             * DRM 3.6.0 has good BO move throttling, so
                    we can allow VRAM-only<br>
                             * placements even with a low amount of
                    stolen VRAM.<br>
                             */<br>
                            if (!rscreen->info.has_dedicated_<wbr>vram
                    &&<br>
                                (rscreen->info.drm_major < 3 ||
                    rscreen->info.drm_minor < 6) &&<br>
                                res->domains == RADEON_DOMAIN_VRAM) {<br>
                    diff --git a/src/gallium/drivers/radeon/r<wbr>adeon_winsys.h
                    b/src/gallium/drivers/radeon/r<wbr>adeon_winsys.h<br>
                    index 351edcd..0abcb56 100644<br>
                    --- a/src/gallium/drivers/radeon/r<wbr>adeon_winsys.h<br>
                    +++ b/src/gallium/drivers/radeon/r<wbr>adeon_winsys.h<br>
                    @@ -47,20 +47,21 @@ enum radeon_bo_domain { /*
                    bitfield */<br>
                          RADEON_DOMAIN_GTT  = 2,<br>
                          RADEON_DOMAIN_VRAM = 4,<br>
                          RADEON_DOMAIN_VRAM_GTT = RADEON_DOMAIN_VRAM |
                    RADEON_DOMAIN_GTT<br>
                      };<br>
                        enum radeon_bo_flag { /* bitfield */<br>
                          RADEON_FLAG_GTT_WC =        (1 << 0),<br>
                          RADEON_FLAG_NO_CPU_ACCESS = (1 << 1),<br>
                          RADEON_FLAG_NO_SUBALLOC =   (1 << 2),<br>
                          RADEON_FLAG_SPARSE =        (1 << 3),<br>
                    +    RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING = (1
                    << 4),<br>
                      };<br>
                        enum radeon_bo_usage { /* bitfield */<br>
                          RADEON_USAGE_READ = 2,<br>
                          RADEON_USAGE_WRITE = 4,<br>
                          RADEON_USAGE_READWRITE = RADEON_USAGE_READ |
                    RADEON_USAGE_WRITE,<br>
                            /* The winsys ensures that the CS submission
                    will be scheduled after<br>
                           * previously flushed CSs referencing this BO
                    in a conflicting way.<br>
                           */<br>
                    @@ -685,28 +686,33 @@ static inline enum
                    radeon_bo_domain radeon_domain_from_heap(enum
                    radeon_heap hea<br>
                          default:<br>
                              assert(0);<br>
                              return (enum radeon_bo_domain)0;<br>
                          }<br>
                      }<br>
                        static inline unsigned
                    radeon_flags_from_heap(enum radeon_heap heap)<br>
                      {<br>
                          switch (heap) {<br>
                          case RADEON_HEAP_VRAM_NO_CPU_ACCESS<wbr>:<br>
                    -        return RADEON_FLAG_GTT_WC |
                    RADEON_FLAG_NO_CPU_ACCESS;<br>
                    +        return RADEON_FLAG_GTT_WC |<br>
                    +               RADEON_FLAG_NO_CPU_ACCESS |<br>
                    +               RADEON_FLAG_NO_INTERPROCESS_S<wbr>HARING;<br>
                    +<br>
                          case RADEON_HEAP_VRAM:<br>
                          case RADEON_HEAP_VRAM_GTT:<br>
                          case RADEON_HEAP_GTT_WC:<br>
                    -        return RADEON_FLAG_GTT_WC;<br>
                    +        return RADEON_FLAG_GTT_WC |<br>
                    +               RADEON_FLAG_NO_INTERPROCESS_S<wbr>HARING;<br>
                    +<br>
                          case RADEON_HEAP_GTT:<br>
                          default:<br>
                    -        return 0;<br>
                    +        return RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING;<br>
                          }<br>
                      }<br>
                        /* The pb cache bucket is chosen to minimize
                    pb_cache misses.<br>
                       * It must be between 0 and 3 inclusive.<br>
                       */<br>
                      static inline unsigned
                    radeon_get_pb_cache_bucket_ind<wbr>ex(enum
                    radeon_heap heap)<br>
                      {<br>
                          switch (heap) {<br>
                          case RADEON_HEAP_VRAM_NO_CPU_ACCESS<wbr>:<br>
                    @@ -724,22 +730,28 @@ static inline unsigned
                    radeon_get_pb_cache_bucket_ind<wbr>ex(enum
                    radeon_heap heap)<br>
                        /* Return the heap index for winsys allocators,
                    or -1 on failure. */<br>
                      static inline int radeon_get_heap_index(enum
                    radeon_bo_domain domain,<br>
                                                              enum
                    radeon_bo_flag flags)<br>
                      {<br>
                          /* VRAM implies WC (write combining) */<br>
                          assert(!(domain & RADEON_DOMAIN_VRAM) ||
                    flags & RADEON_FLAG_GTT_WC);<br>
                          /* NO_CPU_ACCESS implies VRAM only. */<br>
                          assert(!(flags &
                    RADEON_FLAG_NO_CPU_ACCESS) || domain ==
                    RADEON_DOMAIN_VRAM);<br>
                      +    /* Resources with interprocess sharing don't
                    use any winsys allocators. */<br>
                    +    if (!(flags &
                    RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING))<br>
                    +        return -1;<br>
                    +<br>
                          /* Unsupported flags: NO_SUBALLOC, SPARSE. */<br>
                    -    if (flags & ~(RADEON_FLAG_GTT_WC |
                    RADEON_FLAG_NO_CPU_ACCESS))<br>
                    +    if (flags & ~(RADEON_FLAG_GTT_WC |<br>
                    +                  RADEON_FLAG_NO_CPU_ACCESS |<br>
                    +                  RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING))<br>
                              return -1;<br>
                            switch (domain) {<br>
                          case RADEON_DOMAIN_VRAM:<br>
                              if (flags & RADEON_FLAG_NO_CPU_ACCESS)<br>
                                  return RADEON_HEAP_VRAM_NO_CPU_ACCESS<wbr>;<br>
                              else<br>
                                  return RADEON_HEAP_VRAM;<br>
                          case RADEON_DOMAIN_VRAM_GTT:<br>
                              return RADEON_HEAP_VRAM_GTT;<br>
                    diff --git a/src/gallium/winsys/amdgpu/dr<wbr>m/amdgpu_bo.c
                    b/src/gallium/winsys/amdgpu/dr<wbr>m/amdgpu_bo.c<br>
                    index 97bbe23..06b8198 100644<br>
                    --- a/src/gallium/winsys/amdgpu/dr<wbr>m/amdgpu_bo.c<br>
                    +++ b/src/gallium/winsys/amdgpu/dr<wbr>m/amdgpu_bo.c<br>
                    @@ -31,20 +31,24 @@<br>
                        #include "amdgpu_cs.h"<br>
                        #include "os/os_time.h"<br>
                      #include "state_tracker/drm_driver.h"<br>
                      #include <amdgpu_drm.h><br>
                      #include <xf86drm.h><br>
                      #include <stdio.h><br>
                      #include <inttypes.h><br>
                      +#ifndef AMDGPU_GEM_CREATE_NO_INTERPROC<wbr>ESS_SHARING<br>
                    +#define AMDGPU_GEM_CREATE_NO_INTERPROC<wbr>ESS_SHARING
                    (1 << 6)<br>
                    +#endif<br>
                    +<br>
                      /* Set to 1 for verbose output showing committed
                    sparse buffer ranges. */<br>
                      #define DEBUG_SPARSE_COMMITS 0<br>
                        struct amdgpu_sparse_backing_chunk {<br>
                         uint32_t begin, end;<br>
                      };<br>
                        static struct pb_buffer *<br>
                      amdgpu_bo_create(struct radeon_winsys *rws,<br>
                                       uint64_t size,<br>
                    @@ -395,20 +399,22 @@ static struct amdgpu_winsys_bo
                    *amdgpu_create_bo(struct amdgpu_winsys *ws,<br>
                           if (initial_domain & RADEON_DOMAIN_VRAM)<br>
                            request.preferred_heap |=
                    AMDGPU_GEM_DOMAIN_VRAM;<br>
                         if (initial_domain & RADEON_DOMAIN_GTT)<br>
                            request.preferred_heap |=
                    AMDGPU_GEM_DOMAIN_GTT;<br>
                           if (flags & RADEON_FLAG_NO_CPU_ACCESS)<br>
                            request.flags |=
                    AMDGPU_GEM_CREATE_NO_CPU_ACCES<wbr>S;<br>
                         if (flags & RADEON_FLAG_GTT_WC)<br>
                            request.flags |=
                    AMDGPU_GEM_CREATE_CPU_GTT_USWC<wbr>;<br>
                    +   if (flags & RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING)<br>
                    +      request.flags |=
                    AMDGPU_GEM_CREATE_NO_INTERPROC<wbr>ESS_SHARING;<br>
                           r = amdgpu_bo_alloc(ws->dev, &request,
                    &buf_handle);<br>
                         if (r) {<br>
                            fprintf(stderr, "amdgpu: Failed to allocate
                    a buffer:\n");<br>
                            fprintf(stderr, "amdgpu:    size      :
                    %"PRIu64" bytes\n", size);<br>
                            fprintf(stderr, "amdgpu:    alignment : %u
                    bytes\n", alignment);<br>
                            fprintf(stderr, "amdgpu:    domains   :
                    %u\n", initial_domain);<br>
                            goto error_bo_alloc;<br>
                         }<br>
                      @@ -1127,21 +1133,21 @@ static void
                    amdgpu_buffer_set_metadata(str<wbr>uct pb_buffer
                    *_buf,<br>
                        static struct pb_buffer *<br>
                      amdgpu_bo_create(struct radeon_winsys *rws,<br>
                                       uint64_t size,<br>
                                       unsigned alignment,<br>
                                       enum radeon_bo_domain domain,<br>
                                       enum radeon_bo_flag flags)<br>
                      {<br>
                         struct amdgpu_winsys *ws = amdgpu_winsys(rws);<br>
                         struct amdgpu_winsys_bo *bo;<br>
                    -   unsigned usage = 0, pb_cache_bucket;<br>
                    +   unsigned usage = 0, pb_cache_bucket = 0;<br>
                           /* VRAM implies WC. This is not optional. */<br>
                         assert(!(domain & RADEON_DOMAIN_VRAM) ||
                    flags & RADEON_FLAG_GTT_WC);<br>
                           /* NO_CPU_ACCESS is valid with VRAM only. */<br>
                         assert(domain == RADEON_DOMAIN_VRAM || !(flags
                    & RADEON_FLAG_NO_CPU_ACCESS));<br>
                           /* Sub-allocate small buffers from slabs. */<br>
                         if (!(flags & (RADEON_FLAG_NO_SUBALLOC |
                    RADEON_FLAG_SPARSE)) &&<br>
                             size <= (1 <<
                    AMDGPU_SLAB_MAX_SIZE_LOG2) &&<br>
                    @@ -1182,48 +1188,52 @@ no_slab:<br>
                         /* This flag is irrelevant for the cache. */<br>
                         flags &= ~RADEON_FLAG_NO_SUBALLOC;<br>
                           /* Align size to page size. This is the
                    minimum alignment for normal<br>
                          * BOs. Aligning this here helps the cached
                    bufmgr. Especially small BOs,<br>
                          * like constant/uniform buffers, can benefit
                    from better and more reuse.<br>
                          */<br>
                         size = align64(size,
                    ws->info.gart_page_size);<br>
                         alignment = align(alignment,
                    ws->info.gart_page_size);<br>
                      -   int heap = radeon_get_heap_index(domain,
                    flags);<br>
                    -   assert(heap >= 0 && heap <
                    RADEON_MAX_CACHED_HEAPS);<br>
                    -   usage = 1 << heap; /* Only set one usage
                    bit for each heap. */<br>
                    +   bool use_reusable_pool = flags &
                    RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING;<br>
                      -   pb_cache_bucket =
                    radeon_get_pb_cache_bucket_ind<wbr>ex(heap);<br>
                    -   assert(pb_cache_bucket <
                    ARRAY_SIZE(ws->bo_cache.bucket<wbr>s));<br>
                    +   if (use_reusable_pool) {<br>
                    +       int heap = radeon_get_heap_index(domain,
                    flags);<br>
                    +       assert(heap >= 0 && heap <
                    RADEON_MAX_CACHED_HEAPS);<br>
                    +       usage = 1 << heap; /* Only set one
                    usage bit for each heap. */<br>
                      -   /* Get a buffer from the cache. */<br>
                    -   bo = (struct amdgpu_winsys_bo*)<br>
                    -        pb_cache_reclaim_buffer(&ws->b<wbr>o_cache,
                    size, alignment, usage,<br>
                    -                                pb_cache_bucket);<br>
                    -   if (bo)<br>
                    -      return &bo->base;<br>
                    +       pb_cache_bucket =
                    radeon_get_pb_cache_bucket_ind<wbr>ex(heap);<br>
                    +       assert(pb_cache_bucket <
                    ARRAY_SIZE(ws->bo_cache.bucket<wbr>s));<br>
                    +<br>
                    +       /* Get a buffer from the cache. */<br>
                    +       bo = (struct amdgpu_winsys_bo*)<br>
                    +            pb_cache_reclaim_buffer(&ws->b<wbr>o_cache,
                    size, alignment, usage,<br>
                    +                                   
                    pb_cache_bucket);<br>
                    +       if (bo)<br>
                    +          return &bo->base;<br>
                    +   }<br>
                           /* Create a new one. */<br>
                         bo = amdgpu_create_bo(ws, size, alignment,
                    usage, domain, flags,<br>
                                               pb_cache_bucket);<br>
                         if (!bo) {<br>
                            /* Clear the cache and try again. */<br>
                            pb_slabs_reclaim(&ws->bo_slabs<wbr>);<br>
                            pb_cache_release_all_buffers(&<wbr>ws->bo_cache);<br>
                            bo = amdgpu_create_bo(ws, size, alignment,
                    usage, domain, flags,<br>
                                                  pb_cache_bucket);<br>
                            if (!bo)<br>
                               return NULL;<br>
                         }<br>
                      -   bo->u.real.use_reusable_pool = true;<br>
                    +   bo->u.real.use_reusable_pool =
                    use_reusable_pool;<br>
                         return &bo->base;<br>
                      }<br>
                        static struct pb_buffer
                    *amdgpu_bo_from_handle(struct radeon_winsys *rws,<br>
                                                                   
                     struct winsys_handle *whandle,<br>
                                                                   
                     unsigned *stride,<br>
                                                                   
                     unsigned *offset)<br>
                      {<br>
                         struct amdgpu_winsys *ws = amdgpu_winsys(rws);<br>
                         struct amdgpu_winsys_bo *bo;<br>
                    diff --git a/src/gallium/winsys/radeon/dr<wbr>m/radeon_drm_bo.c
                    b/src/gallium/winsys/radeon/dr<wbr>m/radeon_drm_bo.c<br>
                    index 8027a5f..15e9d38 100644<br>
                    --- a/src/gallium/winsys/radeon/dr<wbr>m/radeon_drm_bo.c<br>
                    +++ b/src/gallium/winsys/radeon/dr<wbr>m/radeon_drm_bo.c<br>
                    @@ -907,21 +907,21 @@ static void
                    radeon_bo_set_metadata(struct pb_buffer *_buf,<br>
                        static struct pb_buffer *<br>
                      radeon_winsys_bo_create(struct radeon_winsys *rws,<br>
                                              uint64_t size,<br>
                                              unsigned alignment,<br>
                                              enum radeon_bo_domain
                    domain,<br>
                                              enum radeon_bo_flag flags)<br>
                      {<br>
                          struct radeon_drm_winsys *ws =
                    radeon_drm_winsys(rws);<br>
                          struct radeon_bo *bo;<br>
                    -    unsigned usage = 0, pb_cache_bucket;<br>
                    +    unsigned usage = 0, pb_cache_bucket = 0;<br>
                            assert(!(flags & RADEON_FLAG_SPARSE));
                    /* not supported */<br>
                            /* Only 32-bit sizes are supported. */<br>
                          if (size > UINT_MAX)<br>
                              return NULL;<br>
                            /* VRAM implies WC. This is not optional. */<br>
                          if (domain & RADEON_DOMAIN_VRAM)<br>
                              flags |= RADEON_FLAG_GTT_WC;<br>
                    @@ -962,46 +962,51 @@ no_slab:<br>
                          /* This flag is irrelevant for the cache. */<br>
                          flags &= ~RADEON_FLAG_NO_SUBALLOC;<br>
                            /* Align size to page size. This is the
                    minimum alignment for normal<br>
                           * BOs. Aligning this here helps the cached
                    bufmgr. Especially small BOs,<br>
                           * like constant/uniform buffers, can benefit
                    from better and more reuse.<br>
                           */<br>
                          size = align(size,
                    ws->info.gart_page_size);<br>
                          alignment = align(alignment,
                    ws->info.gart_page_size);<br>
                      -    int heap = radeon_get_heap_index(domain,
                    flags);<br>
                    -    assert(heap >= 0 && heap <
                    RADEON_MAX_CACHED_HEAPS);<br>
                    -    usage = 1 << heap; /* Only set one usage
                    bit for each heap. */<br>
                    +    bool use_reusable_pool = flags &
                    RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING;<br>
                      -    pb_cache_bucket =
                    radeon_get_pb_cache_bucket_ind<wbr>ex(heap);<br>
                    -    assert(pb_cache_bucket <
                    ARRAY_SIZE(ws->bo_cache.bucket<wbr>s));<br>
                    +    /* Shared resources don't use cached heaps. */<br>
                    +    if (use_reusable_pool) {<br>
                    +        int heap = radeon_get_heap_index(domain,
                    flags);<br>
                    +        assert(heap >= 0 && heap <
                    RADEON_MAX_CACHED_HEAPS);<br>
                    +        usage = 1 << heap; /* Only set one
                    usage bit for each heap. */<br>
                      -    bo = radeon_bo(pb_cache_reclaim_buf<wbr>fer(&ws->bo_cache,
                    size, alignment,<br>
                    -                                           usage,
                    pb_cache_bucket));<br>
                    -    if (bo)<br>
                    -        return &bo->base;<br>
                    +        pb_cache_bucket =
                    radeon_get_pb_cache_bucket_ind<wbr>ex(heap);<br>
                    +        assert(pb_cache_bucket <
                    ARRAY_SIZE(ws->bo_cache.bucket<wbr>s));<br>
                    +<br>
                    +        bo = radeon_bo(pb_cache_reclaim_buf<wbr>fer(&ws->bo_cache,
                    size, alignment,<br>
                    +                                             
                     usage, pb_cache_bucket));<br>
                    +        if (bo)<br>
                    +            return &bo->base;<br>
                    +    }<br>
                            bo = radeon_create_bo(ws, size, alignment,
                    usage, domain, flags,<br>
                                                pb_cache_bucket);<br>
                          if (!bo) {<br>
                              /* Clear the cache and try again. */<br>
                              if (ws->info.has_virtual_memory)<br>
                                  pb_slabs_reclaim(&ws->bo_slabs<wbr>);<br>
                              pb_cache_release_all_buffers(&<wbr>ws->bo_cache);<br>
                              bo = radeon_create_bo(ws, size, alignment,
                    usage, domain, flags,<br>
                                                    pb_cache_bucket);<br>
                              if (!bo)<br>
                                  return NULL;<br>
                          }<br>
                      -    bo->u.real.use_reusable_pool = true;<br>
                    +    bo->u.real.use_reusable_pool =
                    use_reusable_pool;<br>
                            mtx_lock(&ws->bo_handles_mutex<wbr>);<br>
                          util_hash_table_set(ws->bo_han<wbr>dles,
                    (void*)(uintptr_t)bo->handle, bo);<br>
                          mtx_unlock(&ws->bo_handles_mut<wbr>ex);<br>
                            return &bo->base;<br>
                      }<br>
                        static struct pb_buffer
                    *radeon_winsys_bo_from_ptr(str<wbr>uct radeon_winsys
                    *rws,<br>
                                                                       
                     void *pointer, uint64_t size)<br>
                  </blockquote>
                  <br>
                </div>
              </blockquote>
            </div>
          </div>
          <div class="gmail_extra" dir="auto"><br>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
  </div></div>

</blockquote></div><br></div></div></div>