<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
<br>
<div class="moz-cite-prefix">On 2017年07月20日 22:59, Marek Olšák
wrote:<br>
</div>
<blockquote
cite="mid:CAAxE2A6uFyqhWfJBNzojyMqFmk-AGyQmScwqScMjLutBQX5usA@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="auto">
<div><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Jul 19, 2017 10:21 PM, "zhoucm1"
<<a moz-do-not-send="true"
href="mailto:david1.zhou@amd.com">david1.zhou@amd.com</a>>
wrote:<br type="attribution">
<blockquote class="quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div class="quoted-text"> <br>
<br>
<div class="m_-6147598457318751040moz-cite-prefix">On
2017年07月19日 23:34, Marek Olšák wrote:<br>
</div>
<blockquote type="cite">
<div dir="auto">
<div><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Jul 19, 2017
3:36 AM, "zhoucm1" <<a
moz-do-not-send="true"
href="mailto:david1.zhou@amd.com"
target="_blank">david1.zhou@amd.com</a>>
wrote:<br type="attribution">
<blockquote
class="m_-6147598457318751040quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div
class="m_-6147598457318751040quoted-text"><br>
<br>
On 2017年07月19日 04:08, Marek Olšák
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"> From: Marek
Olšák <<a moz-do-not-send="true"
href="mailto:marek.olsak@amd.com"
target="_blank">marek.olsak@amd.com</a>><br>
<br>
For lower overhead in the CS ioctl.<br>
Winsys allocators are not used with
interprocess-sharable resources.<br>
</blockquote>
</div>
Hi Marek,<br>
<br>
Could I know from how your this way
reduces overhead in CS ioctl? reusing BO
to short bo list?<br>
</blockquote>
</div>
</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">The kernel part of the work
hasn't been done yet. The idea is that
nonsharable buffers don't have to be
revalidated by TTM,</div>
</div>
</blockquote>
</div>
OK, Maybe I only can see the whole picture of this
idea when you complete kernel part.<br>
Out of curious, why/how can nonsharable buffers be
revalidated by TTM without exposing like
amdgpu_bo_make_resident api?<br>
</div>
</blockquote>
</div>
</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">I think the idea is that all nonsharable buffers
will be backed by the same reservation object, so TTM can skip
buffer validation if no buffer has been moved. It's just an
optimization for the current design.</div>
<div dir="auto"><br>
</div>
<div dir="auto">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> <br>
With mentioned in another thread, if we can expose
make_resident api, we can remove bo_list, even we can
remove reservation operation in CS ioctl.<br>
And now, I think our bo list is a very bad design,<br>
first, umd must create bo list for every command
submission, this is a extra cpu overhead compared with
traditional way.<br>
second, kernel also have to iterate the list, when bo
list is too long, like OpenCL program, they always
throw several thousands BOs to bo list, reservation
must keep these thousands ww_mutex safe, CPU overhead
is too big.<br>
<br>
So I strongly suggest we should expose make_resident
api to user space. if cannot, I want to know any
specific reason to see if we can solve it.<br>
</div>
</blockquote>
</div>
</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">Yeah, I think the BO list idea is likely to die
sooner or later. It made sense for GL before bindless was a
thing. Nowadays I don't see much value in it.</div>
<div dir="auto"><br>
</div>
<div dir="auto">MesaGL will keep tracking the BO list because
it's a requirement for good GL performance (it determines
whether to flush IBs before BO synchronization, it allows
tracking fences for each BO, which are used to determine
dependencies between IBs, and that all allows async SDMA and
async compute for GL, which doesn't have separate queues).</div>
<div dir="auto"><br>
</div>
<div dir="auto">However, we don't need any BO list at the libdrm
level and lower. I think a BO_CREATE flag that causes that the
buffer is added to a kernel-side per-fd BO list would be
sufficient. How the kernel manages its BO list should be its
own implementation detail. Initially we can just move the
current BO list management into the kernel.</div>
</div>
</blockquote>
I guess this idea will make bo list worse, which just decrease umd
effort, but increase kernel driver complication.<br>
<br>
First, from your and Christian's comments, we can get this agreement
that bo list design is not a good way.<br>
My proposal of exposing amdgpu_bo_make_resident is to replace bo
list.<br>
If we can make all needed bo resident, then we don't need to
validate it again in cs ioctl, then we don't need their reservation
lock more. After job pushed to scheduler, then we can un-resident
BOs.<br>
Even we can make it for VM bo, then we don't need to check vm update
again while done in va map ioctl.<br>
<br>
If this is got done(eviction has been improved more), I cannot see
any obvious gap for performance.<br>
<br>
What do you think of this proposal of exposing
amdgpu_bo_make_resident api to user space? Or any other idea we can
discuss.<br>
<br>
If you all agree with, I can volunteer to try with UMD guys.<br>
<br>
Regards,<br>
David Zhou<br>
<br>
<blockquote
cite="mid:CAAxE2A6uFyqhWfJBNzojyMqFmk-AGyQmScwqScMjLutBQX5usA@mail.gmail.com"
type="cite">
<div dir="auto">
<div dir="auto"><br>
</div>
<div dir="auto">Marek</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> <br>
<br>
Regards,<br>
David Zhou
<div class="elided-text"><br>
<blockquote type="cite">
<div dir="auto">
<div dir="auto"> so it should remove a lot of
kernel overhead and the BO list remains the
same.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Marek</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote
class="m_-6147598457318751040quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> <br>
Thanks,<br>
David Zhou
<div
class="m_-6147598457318751040elided-text"><br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex"> <br>
v2: It shouldn't crash anymore, but
the kernel will reject the new flag.<br>
---<br>
src/gallium/drivers/radeon/r60<wbr>0_buffer_common.c
| 7 +++++<br>
src/gallium/drivers/radeon/rad<wbr>eon_winsys.h
| 20 +++++++++++---<br>
src/gallium/winsys/amdgpu/drm/<wbr>amdgpu_bo.c
| 36 ++++++++++++++++---------<br>
src/gallium/winsys/radeon/drm/<wbr>radeon_drm_bo.c
| 27 +++++++++++--------<br>
4 files changed, 62 insertions(+),
28 deletions(-)<br>
<br>
diff --git
a/src/gallium/drivers/radeon/r<wbr>600_buffer_common.c
b/src/gallium/drivers/radeon/r<wbr>600_buffer_common.c<br>
index dd1c209..2747ac4 100644<br>
--- a/src/gallium/drivers/radeon/r<wbr>600_buffer_common.c<br>
+++ b/src/gallium/drivers/radeon/r<wbr>600_buffer_common.c<br>
@@ -160,20 +160,27 @@ void
r600_init_resource_fields(stru<wbr>ct
r600_common_screen *rscreen,<br>
}<br>
/* Tiled textures are
unmappable. Always put them in VRAM.
*/<br>
if ((res->b.b.target !=
PIPE_BUFFER &&
!rtex->surface.is_linear) ||<br>
res->flags &
R600_RESOURCE_FLAG_UNMAPPABLE) {<br>
res->domains =
RADEON_DOMAIN_VRAM;<br>
res->flags |=
RADEON_FLAG_NO_CPU_ACCESS |<br>
RADEON_FLAG_GTT_WC;<br>
}<br>
+ /* Only displayable
single-sample textures can be shared
between<br>
+ * processes. */<br>
+ if (res->b.b.target ==
PIPE_BUFFER ||<br>
+ res->b.b.nr_samples
>= 2 ||<br>
+
rtex->surface.micro_tile_mode !=
RADEON_MICRO_MODE_DISPLAY)<br>
+ res->flags |=
RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING;<br>
+<br>
/* If VRAM is just stolen
system memory, allow both VRAM and<br>
* GTT, whichever has free
space. If a buffer is evicted from<br>
* VRAM to GTT, it will stay
there.<br>
*<br>
* DRM 3.6.0 has good BO
move throttling, so we can allow
VRAM-only<br>
* placements even with a
low amount of stolen VRAM.<br>
*/<br>
if
(!rscreen->info.has_dedicated_<wbr>vram
&&<br>
(rscreen->info.drm_major < 3
|| rscreen->info.drm_minor <
6) &&<br>
res->domains ==
RADEON_DOMAIN_VRAM) {<br>
diff --git
a/src/gallium/drivers/radeon/r<wbr>adeon_winsys.h
b/src/gallium/drivers/radeon/r<wbr>adeon_winsys.h<br>
index 351edcd..0abcb56 100644<br>
--- a/src/gallium/drivers/radeon/r<wbr>adeon_winsys.h<br>
+++ b/src/gallium/drivers/radeon/r<wbr>adeon_winsys.h<br>
@@ -47,20 +47,21 @@ enum
radeon_bo_domain { /* bitfield */<br>
RADEON_DOMAIN_GTT = 2,<br>
RADEON_DOMAIN_VRAM = 4,<br>
RADEON_DOMAIN_VRAM_GTT =
RADEON_DOMAIN_VRAM |
RADEON_DOMAIN_GTT<br>
};<br>
enum radeon_bo_flag { /*
bitfield */<br>
RADEON_FLAG_GTT_WC = (1
<< 0),<br>
RADEON_FLAG_NO_CPU_ACCESS = (1
<< 1),<br>
RADEON_FLAG_NO_SUBALLOC = (1
<< 2),<br>
RADEON_FLAG_SPARSE = (1
<< 3),<br>
+ RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING
= (1 << 4),<br>
};<br>
enum radeon_bo_usage { /*
bitfield */<br>
RADEON_USAGE_READ = 2,<br>
RADEON_USAGE_WRITE = 4,<br>
RADEON_USAGE_READWRITE =
RADEON_USAGE_READ |
RADEON_USAGE_WRITE,<br>
/* The winsys ensures that
the CS submission will be scheduled
after<br>
* previously flushed CSs
referencing this BO in a conflicting
way.<br>
*/<br>
@@ -685,28 +686,33 @@ static inline
enum radeon_bo_domain
radeon_domain_from_heap(enum
radeon_heap hea<br>
default:<br>
assert(0);<br>
return (enum
radeon_bo_domain)0;<br>
}<br>
}<br>
static inline unsigned
radeon_flags_from_heap(enum
radeon_heap heap)<br>
{<br>
switch (heap) {<br>
case
RADEON_HEAP_VRAM_NO_CPU_ACCESS<wbr>:<br>
- return RADEON_FLAG_GTT_WC |
RADEON_FLAG_NO_CPU_ACCESS;<br>
+ return RADEON_FLAG_GTT_WC |<br>
+
RADEON_FLAG_NO_CPU_ACCESS |<br>
+
RADEON_FLAG_NO_INTERPROCESS_S<wbr>HARING;<br>
+<br>
case RADEON_HEAP_VRAM:<br>
case RADEON_HEAP_VRAM_GTT:<br>
case RADEON_HEAP_GTT_WC:<br>
- return RADEON_FLAG_GTT_WC;<br>
+ return RADEON_FLAG_GTT_WC |<br>
+
RADEON_FLAG_NO_INTERPROCESS_S<wbr>HARING;<br>
+<br>
case RADEON_HEAP_GTT:<br>
default:<br>
- return 0;<br>
+ return
RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING;<br>
}<br>
}<br>
/* The pb cache bucket is chosen
to minimize pb_cache misses.<br>
* It must be between 0 and 3
inclusive.<br>
*/<br>
static inline unsigned
radeon_get_pb_cache_bucket_ind<wbr>ex(enum
radeon_heap heap)<br>
{<br>
switch (heap) {<br>
case
RADEON_HEAP_VRAM_NO_CPU_ACCESS<wbr>:<br>
@@ -724,22 +730,28 @@ static inline
unsigned
radeon_get_pb_cache_bucket_ind<wbr>ex(enum
radeon_heap heap)<br>
/* Return the heap index for
winsys allocators, or -1 on failure.
*/<br>
static inline int
radeon_get_heap_index(enum
radeon_bo_domain domain,<br>
enum radeon_bo_flag flags)<br>
{<br>
/* VRAM implies WC (write
combining) */<br>
assert(!(domain &
RADEON_DOMAIN_VRAM) || flags &
RADEON_FLAG_GTT_WC);<br>
/* NO_CPU_ACCESS implies VRAM
only. */<br>
assert(!(flags &
RADEON_FLAG_NO_CPU_ACCESS) || domain
== RADEON_DOMAIN_VRAM);<br>
+ /* Resources with
interprocess sharing don't use any
winsys allocators. */<br>
+ if (!(flags &
RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING))<br>
+ return -1;<br>
+<br>
/* Unsupported flags:
NO_SUBALLOC, SPARSE. */<br>
- if (flags &
~(RADEON_FLAG_GTT_WC |
RADEON_FLAG_NO_CPU_ACCESS))<br>
+ if (flags &
~(RADEON_FLAG_GTT_WC |<br>
+
RADEON_FLAG_NO_CPU_ACCESS |<br>
+
RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING))<br>
return -1;<br>
switch (domain) {<br>
case RADEON_DOMAIN_VRAM:<br>
if (flags &
RADEON_FLAG_NO_CPU_ACCESS)<br>
return
RADEON_HEAP_VRAM_NO_CPU_ACCESS<wbr>;<br>
else<br>
return
RADEON_HEAP_VRAM;<br>
case RADEON_DOMAIN_VRAM_GTT:<br>
return
RADEON_HEAP_VRAM_GTT;<br>
diff --git
a/src/gallium/winsys/amdgpu/dr<wbr>m/amdgpu_bo.c
b/src/gallium/winsys/amdgpu/dr<wbr>m/amdgpu_bo.c<br>
index 97bbe23..06b8198 100644<br>
--- a/src/gallium/winsys/amdgpu/dr<wbr>m/amdgpu_bo.c<br>
+++ b/src/gallium/winsys/amdgpu/dr<wbr>m/amdgpu_bo.c<br>
@@ -31,20 +31,24 @@<br>
#include "amdgpu_cs.h"<br>
#include "os/os_time.h"<br>
#include
"state_tracker/drm_driver.h"<br>
#include <amdgpu_drm.h><br>
#include <xf86drm.h><br>
#include <stdio.h><br>
#include <inttypes.h><br>
+#ifndef
AMDGPU_GEM_CREATE_NO_INTERPROC<wbr>ESS_SHARING<br>
+#define
AMDGPU_GEM_CREATE_NO_INTERPROC<wbr>ESS_SHARING
(1 << 6)<br>
+#endif<br>
+<br>
/* Set to 1 for verbose output
showing committed sparse buffer
ranges. */<br>
#define DEBUG_SPARSE_COMMITS 0<br>
struct
amdgpu_sparse_backing_chunk {<br>
uint32_t begin, end;<br>
};<br>
static struct pb_buffer *<br>
amdgpu_bo_create(struct
radeon_winsys *rws,<br>
uint64_t size,<br>
@@ -395,20 +399,22 @@ static struct
amdgpu_winsys_bo
*amdgpu_create_bo(struct
amdgpu_winsys *ws,<br>
if (initial_domain &
RADEON_DOMAIN_VRAM)<br>
request.preferred_heap |=
AMDGPU_GEM_DOMAIN_VRAM;<br>
if (initial_domain &
RADEON_DOMAIN_GTT)<br>
request.preferred_heap |=
AMDGPU_GEM_DOMAIN_GTT;<br>
if (flags &
RADEON_FLAG_NO_CPU_ACCESS)<br>
request.flags |=
AMDGPU_GEM_CREATE_NO_CPU_ACCES<wbr>S;<br>
if (flags &
RADEON_FLAG_GTT_WC)<br>
request.flags |=
AMDGPU_GEM_CREATE_CPU_GTT_USWC<wbr>;<br>
+ if (flags &
RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING)<br>
+ request.flags |=
AMDGPU_GEM_CREATE_NO_INTERPROC<wbr>ESS_SHARING;<br>
r =
amdgpu_bo_alloc(ws->dev,
&request, &buf_handle);<br>
if (r) {<br>
fprintf(stderr, "amdgpu:
Failed to allocate a buffer:\n");<br>
fprintf(stderr, "amdgpu:
size : %"PRIu64" bytes\n",
size);<br>
fprintf(stderr, "amdgpu:
alignment : %u bytes\n", alignment);<br>
fprintf(stderr, "amdgpu:
domains : %u\n", initial_domain);<br>
goto error_bo_alloc;<br>
}<br>
@@ -1127,21 +1133,21 @@ static
void amdgpu_buffer_set_metadata(str<wbr>uct
pb_buffer *_buf,<br>
static struct pb_buffer *<br>
amdgpu_bo_create(struct
radeon_winsys *rws,<br>
uint64_t size,<br>
unsigned
alignment,<br>
enum
radeon_bo_domain domain,<br>
enum
radeon_bo_flag flags)<br>
{<br>
struct amdgpu_winsys *ws =
amdgpu_winsys(rws);<br>
struct amdgpu_winsys_bo *bo;<br>
- unsigned usage = 0,
pb_cache_bucket;<br>
+ unsigned usage = 0,
pb_cache_bucket = 0;<br>
/* VRAM implies WC. This is
not optional. */<br>
assert(!(domain &
RADEON_DOMAIN_VRAM) || flags &
RADEON_FLAG_GTT_WC);<br>
/* NO_CPU_ACCESS is valid
with VRAM only. */<br>
assert(domain ==
RADEON_DOMAIN_VRAM || !(flags &
RADEON_FLAG_NO_CPU_ACCESS));<br>
/* Sub-allocate small buffers
from slabs. */<br>
if (!(flags &
(RADEON_FLAG_NO_SUBALLOC |
RADEON_FLAG_SPARSE)) &&<br>
size <= (1 <<
AMDGPU_SLAB_MAX_SIZE_LOG2)
&&<br>
@@ -1182,48 +1188,52 @@ no_slab:<br>
/* This flag is irrelevant for
the cache. */<br>
flags &=
~RADEON_FLAG_NO_SUBALLOC;<br>
/* Align size to page size.
This is the minimum alignment for
normal<br>
* BOs. Aligning this here
helps the cached bufmgr. Especially
small BOs,<br>
* like constant/uniform
buffers, can benefit from better and
more reuse.<br>
*/<br>
size = align64(size,
ws->info.gart_page_size);<br>
alignment = align(alignment,
ws->info.gart_page_size);<br>
- int heap =
radeon_get_heap_index(domain,
flags);<br>
- assert(heap >= 0 &&
heap < RADEON_MAX_CACHED_HEAPS);<br>
- usage = 1 << heap; /* Only
set one usage bit for each heap. */<br>
+ bool use_reusable_pool = flags
& RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING;<br>
- pb_cache_bucket =
radeon_get_pb_cache_bucket_ind<wbr>ex(heap);<br>
- assert(pb_cache_bucket <
ARRAY_SIZE(ws->bo_cache.bucket<wbr>s));<br>
+ if (use_reusable_pool) {<br>
+ int heap =
radeon_get_heap_index(domain,
flags);<br>
+ assert(heap >= 0
&& heap <
RADEON_MAX_CACHED_HEAPS);<br>
+ usage = 1 << heap; /*
Only set one usage bit for each
heap. */<br>
- /* Get a buffer from the
cache. */<br>
- bo = (struct amdgpu_winsys_bo*)<br>
-
pb_cache_reclaim_buffer(&ws->b<wbr>o_cache,
size, alignment, usage,<br>
-
pb_cache_bucket);<br>
- if (bo)<br>
- return &bo->base;<br>
+ pb_cache_bucket =
radeon_get_pb_cache_bucket_ind<wbr>ex(heap);<br>
+ assert(pb_cache_bucket <
ARRAY_SIZE(ws->bo_cache.bucket<wbr>s));<br>
+<br>
+ /* Get a buffer from the
cache. */<br>
+ bo = (struct
amdgpu_winsys_bo*)<br>
+
pb_cache_reclaim_buffer(&ws->b<wbr>o_cache,
size, alignment, usage,<br>
+
pb_cache_bucket);<br>
+ if (bo)<br>
+ return &bo->base;<br>
+ }<br>
/* Create a new one. */<br>
bo = amdgpu_create_bo(ws, size,
alignment, usage, domain, flags,<br>
pb_cache_bucket);<br>
if (!bo) {<br>
/* Clear the cache and try
again. */<br>
pb_slabs_reclaim(&ws->bo_slabs<wbr>);<br>
pb_cache_release_all_buffers(&<wbr>ws->bo_cache);<br>
bo = amdgpu_create_bo(ws,
size, alignment, usage, domain,
flags,<br>
pb_cache_bucket);<br>
if (!bo)<br>
return NULL;<br>
}<br>
-
bo->u.real.use_reusable_pool =
true;<br>
+ bo->u.real.use_reusable_pool
= use_reusable_pool;<br>
return &bo->base;<br>
}<br>
static struct pb_buffer
*amdgpu_bo_from_handle(struct
radeon_winsys *rws,<br>
struct winsys_handle
*whandle,<br>
unsigned *stride,<br>
unsigned *offset)<br>
{<br>
struct amdgpu_winsys *ws =
amdgpu_winsys(rws);<br>
struct amdgpu_winsys_bo *bo;<br>
diff --git
a/src/gallium/winsys/radeon/dr<wbr>m/radeon_drm_bo.c
b/src/gallium/winsys/radeon/dr<wbr>m/radeon_drm_bo.c<br>
index 8027a5f..15e9d38 100644<br>
--- a/src/gallium/winsys/radeon/dr<wbr>m/radeon_drm_bo.c<br>
+++ b/src/gallium/winsys/radeon/dr<wbr>m/radeon_drm_bo.c<br>
@@ -907,21 +907,21 @@ static void
radeon_bo_set_metadata(struct
pb_buffer *_buf,<br>
static struct pb_buffer *<br>
radeon_winsys_bo_create(struct
radeon_winsys *rws,<br>
uint64_t
size,<br>
unsigned
alignment,<br>
enum
radeon_bo_domain domain,<br>
enum
radeon_bo_flag flags)<br>
{<br>
struct radeon_drm_winsys *ws =
radeon_drm_winsys(rws);<br>
struct radeon_bo *bo;<br>
- unsigned usage = 0,
pb_cache_bucket;<br>
+ unsigned usage = 0,
pb_cache_bucket = 0;<br>
assert(!(flags &
RADEON_FLAG_SPARSE)); /* not
supported */<br>
/* Only 32-bit sizes are
supported. */<br>
if (size > UINT_MAX)<br>
return NULL;<br>
/* VRAM implies WC. This is
not optional. */<br>
if (domain &
RADEON_DOMAIN_VRAM)<br>
flags |=
RADEON_FLAG_GTT_WC;<br>
@@ -962,46 +962,51 @@ no_slab:<br>
/* This flag is irrelevant for
the cache. */<br>
flags &=
~RADEON_FLAG_NO_SUBALLOC;<br>
/* Align size to page size.
This is the minimum alignment for
normal<br>
* BOs. Aligning this here
helps the cached bufmgr. Especially
small BOs,<br>
* like constant/uniform
buffers, can benefit from better and
more reuse.<br>
*/<br>
size = align(size,
ws->info.gart_page_size);<br>
alignment = align(alignment,
ws->info.gart_page_size);<br>
- int heap =
radeon_get_heap_index(domain,
flags);<br>
- assert(heap >= 0 &&
heap < RADEON_MAX_CACHED_HEAPS);<br>
- usage = 1 << heap; /*
Only set one usage bit for each
heap. */<br>
+ bool use_reusable_pool = flags
& RADEON_FLAG_NO_INTERPROCESS_SH<wbr>ARING;<br>
- pb_cache_bucket =
radeon_get_pb_cache_bucket_ind<wbr>ex(heap);<br>
- assert(pb_cache_bucket <
ARRAY_SIZE(ws->bo_cache.bucket<wbr>s));<br>
+ /* Shared resources don't use
cached heaps. */<br>
+ if (use_reusable_pool) {<br>
+ int heap =
radeon_get_heap_index(domain,
flags);<br>
+ assert(heap >= 0
&& heap <
RADEON_MAX_CACHED_HEAPS);<br>
+ usage = 1 << heap; /*
Only set one usage bit for each
heap. */<br>
- bo =
radeon_bo(pb_cache_reclaim_buf<wbr>fer(&ws->bo_cache,
size, alignment,<br>
-
usage, pb_cache_bucket));<br>
- if (bo)<br>
- return &bo->base;<br>
+ pb_cache_bucket =
radeon_get_pb_cache_bucket_ind<wbr>ex(heap);<br>
+ assert(pb_cache_bucket <
ARRAY_SIZE(ws->bo_cache.bucket<wbr>s));<br>
+<br>
+ bo =
radeon_bo(pb_cache_reclaim_buf<wbr>fer(&ws->bo_cache,
size, alignment,<br>
+
usage, pb_cache_bucket));<br>
+ if (bo)<br>
+ return
&bo->base;<br>
+ }<br>
bo = radeon_create_bo(ws,
size, alignment, usage, domain,
flags,<br>
pb_cache_bucket);<br>
if (!bo) {<br>
/* Clear the cache and try
again. */<br>
if
(ws->info.has_virtual_memory)<br>
pb_slabs_reclaim(&ws->bo_slabs<wbr>);<br>
pb_cache_release_all_buffers(&<wbr>ws->bo_cache);<br>
bo = radeon_create_bo(ws,
size, alignment, usage, domain,
flags,<br>
pb_cache_bucket);<br>
if (!bo)<br>
return NULL;<br>
}<br>
-
bo->u.real.use_reusable_pool =
true;<br>
+ bo->u.real.use_reusable_pool
= use_reusable_pool;<br>
mtx_lock(&ws->bo_handles_mutex<wbr>);<br>
util_hash_table_set(ws->bo_han<wbr>dles,
(void*)(uintptr_t)bo->handle,
bo);<br>
mtx_unlock(&ws->bo_handles_mut<wbr>ex);<br>
return &bo->base;<br>
}<br>
static struct pb_buffer
*radeon_winsys_bo_from_ptr(str<wbr>uct
radeon_winsys *rws,<br>
void *pointer,
uint64_t size)<br>
</blockquote>
<br>
</div>
</blockquote>
</div>
</div>
<div class="gmail_extra" dir="auto"><br>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</div>
</blockquote>
<br>
</body>
</html>