[Mesa-dev] [PATCH 00/11] [RFC v2] Solve the mapping bug

Sat Jun 21 08:39:04 PDT 2014

Bruno Jiménez <brunojimen at gmail.com> writes:

> Hi,
>
> This is my second attempt to fix the mapping bug adding all the
> suggestions that Tom Stellard sent, and, so far, it seems that
> it is resolved.
>
> This series changes completely how OpenCL buffers are handled
> by the r600g driver. Before this, we would add them directly to
> a pool, and this pool would grow whenever we needed more space.
> But this process implied destroying the pool and creating a new
> one. There could be cases where a buffer would be mapped and
> the pool would grow, leaving one side of the mapping pointed
> to where the item was. This is the 'mapping bug'
>
> Now, Items will have an intermediate resource, where all mappings
> can be done, and when a buffer is going to be used with a kernel
> it is promoted to the pool. In the case where a promoted item
> is going to be mapped, it is previously demoted, so even if
> the pool changes its location due to growing, the map remains
> valid. In the case of a buffer mapped for reading, and used
> by a kernel to read from it, we will duplicate this buffer,
> having the intermediate buffer, where the user has its map, and
> an item in the pool, which is the one that the kernel is going
> to use.
>
> As a summary for v2:
> Patches 1-8: These are the main part of the series, and solve
>     the mapping bug.
>     Patches 1 and 7 now use less explicit castings
>     Patch 2 is new and introduces the 'is_item_in_pool'
>         function, which is used in patches 3 and 8
>
> Patch 9: Is a complete rewrite of v1 patch 8 using gallium
>     utils for double lists
>
> Patches 10 and 11: These are just a proof of concept for avoiding
>     transfers GPU <-> GPU when using all CL Read/Write functions.
>     They are v1 patch 9 splited in two to separate r600g changes
>     from clover changes.
>     Now, in clover's side it introduces and uses
>     'CLOVER_TRANSFER_MAP_DIRECTLY' so it doesen't collide with
>     any other OpenCL flag.
>
> Please review and Thanks :)
>
> Bruno Jiménez (11):
>   r600g/compute: Add an intermediate resource for OpenCL buffers
>   r600g/compute: Add an util function to know if an item is in the pool
>   r600g/compute: Add statuses to the compute_memory_items
>   r600g/compute: divide the item list in two
>   r600g/compute: Only move to the pool the buffers marked for promoting
>   r600g/compute: Avoid problems when promoting items mapped for reading
>   r600g/compute: Implement compute_memory_demote_item
>   r600g/compute: Map only against intermediate buffers
>   r600g/compute: Use gallium util functions for double lists
>   r600g/compute: Map directly the pool in some cases
>   clover: Use PIPE_TRANSFER_MAP_DIRECTLY when writing/reading buffers
>
>  src/gallium/drivers/r600/compute_memory_pool.c     | 294 ++++++++++++---------
>  src/gallium/drivers/r600/compute_memory_pool.h     |  31 ++-
>  src/gallium/drivers/r600/evergreen_compute.c       |  38 ++-
>  src/gallium/state_trackers/clover/api/transfer.cpp |   4 +-
>  src/gallium/state_trackers/clover/core/object.hpp  |   4 +
>  .../state_trackers/clover/core/resource.cpp        |   2 +
>  6 files changed, 233 insertions(+), 140 deletions(-)
>
> -- 
> 2.0.0
>

Hi Bruno, 

I'm sorry for having taken so long to comment on this, I haven't had
time to read through your patch series carefully until last night.

I'm not convinced that this is going to be a satisfactory solution, it
punishes well-behaving applications by duplicating memory usage under
some circumstances, and by slowing down buffer mapping with (in most
cases) unnecessary shadow copies: Even though using a buffer while
mapped is something to keep in mind because it's allowed by the spec
with some restrictions, I think it's reasonable to assume that we're
only going to see a comparatively small number of active buffer mappings
during the execution of any compute kernel, so it seems unfortunate to
me that the rest of mappings will have to pay for this one corner case.

The implementation of PIPE_TRANSFER_MAP_DIRECTLY introduced in PATCH 10
has somewhat worrying semantics: A mapping with this flag might become
stale unpredictably if a kernel is run, maybe from a different command
queue.  Clover's transfer functions don't hit that path right now on
single-threaded applications, but they might in the future as we start
accelerating the APIs we currently implement with soft_copy_op().  This
is a bug IMHO: even direct mappings should last until the corresponding
unmap call.

I'm not advocating a revert of the series because it fixes a serious
bug, but please don't push patches 10-11, we should probably start
looking for a different solution.  Some suggestions are:

 - Why do you even need a pool?  Wouldn't it be possible to create a
   huge RAT, e.g. covering a 4GB portion of the GPU memory and then use
   a special memory domain or some sort of flag to tell the kernel to
   allocate a buffer from that region (or relocate if it's already been
   allocated elsewhere)?  This is especially easy on hardware with
   virtual memory, as you could simply reserve an arbitrarily large
   block of virtual memory, bind it as e.g. RAT0, and then map other
   buffer objects into the block on-demand as they're bound to the
   compute pipeline -- There would be no need to move the actual bits
   around.  This is similar to the approach I used in my original
   proof-of-concept implementation of the compute API on nv50.

 - If you insist on using a pool, you could (mostly) avoid the storage
   duplication and the mapping copies by allocating buffer objects
   directly from the pool as it was before this series, and then keep
   some sort of reference count specific to the pool storage that would
   be incremented on map and decremented on unmap.  Once you need to
   grow the pool you'd keep the old storage around temporarily and
   migrate buffers to the new storage lazily as they are required or
   unmapped.  Once the reference count drops to zero you'd be free to
   release the backing BO to the system.  The fact that you'd keep both
   storage buffers around for a bit means that you'd be able to use DMA
   to migrate the pool contents instead of the CPU copies you're doing
   now, which is likely to be substantially more efficient.

 - On hardware with virtual memory, I guess you could also devise a
   copy-less mechanism for growing a given buffer object (used as pool
   storage) by mapping additional pages at the end of the same region
   without changing its virtual address, assuming you know the maximum
   size of the pool beforehand.

Heh, sorry for the long rant.  Are you on IRC by the way?

> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 229 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140621/ee5aaf9a/attachment.sig>