[Intel-gfx] Losing write domains and breaking flushing list
Keith Packard
keithp at keithp.com
Sat Oct 25 07:25:30 CEST 2008
Ok, first the obvious bugs.
i915_gem_object_set_domain computes new read and write domains for each
object. It does all of the necessary operations to transfer from/to CPU
domains, but all intra-GPU domain transfers are handled by pending flush
and invalidate data in the device structure. This allows a single flush
operation to handle all of the necessary activity for a single exec.
However. If the exec is aborted, then the domain information in the
objects will have been made invalid -- no flushes or invalidates will be
pushed into the ring. The old domains will now be lost, so future
operations will do the wrong thing.
i915_gem_object_set_domain is also used from the set_domain ioctl. In
this case, the domains will be mashed and a flush operation pushed into
the ring, but the object will not be placed on the active list when that
flush is posted. This means that no-one knows to wait for the operation
to complete before accessing the object.
Now for the more subtle bug.
When you call i915_gem_object_set_domain with write_domain == 0, then
sometimes (depending on the previous domain set), it will set
obj->write_domain to zero. If this object happened to be sitting on the
flushing list, it will now never get pulled off of that list -- the only
way off that list is to have some flush command which affects your
write_domain, and with a write_domain of zero, that's not going to
happen.
This will cause a BUG_ON when you try to close the driver as the
flushing list will not empty, even if all of the pending operations are
complete and every domain flushed.
A proposed solution (this patch is untested, although it does
compile).
I've split out the i915_gem_object_set_domain function into two pieces:
i915_gem_object_set_pending_domain and
i915_gem_object_set_current_domain. set_pending_domain does precisely
what set_domain used to do, but instead of modifying obj->write_domain
and obj->read_domains, it leaves the new values for those fields in
obj->pending_write_domain and obj->pending_read_domains. Yeah, this
conflates those fields between two different operations, but it's fine
in practice as the two users are obviously temporally separate.
I've also regularized the sequence of operations that one does to do
domain migration:
1. i915_gem_dev_prepare_set_domain. This sets up the device to
queue flushing and invalidation domains
2. i915_gem_object_set_pending_domain. This marks an object for
domain transition and pushes pending flush and invalidate bits
into the device.
3. i915_gem_dev_set_domain. This dumps a flush command into the
ring to flush and invalidate any pending device bits. This
returns the domains flushed.
4. <any other ring commands inserted here>. After this point, there
are no more error returns possible as the operation is
functionally complete.
5. i915_gem_object_set_current_domain. Do this for each object
which was given to i915_gem_object_set_pending_domain above.
6. If any domains are being flushed, or a batch buffer is queued to
the ring:
1. i915_add_request. If any domains are flushed, or any
batch buffer is added to the ring, this queues a marker
to let users wait for that operation to complete. Mark
any buffers involved with the new sequence number.
2. i915_gem_object_move_to_active. Each obj which has
flushing or batchbuffer involvement should end up on the
active list.
Ok, so the actual modification of obj->read_domains and
obj->write_domain is now delayed until after the flush commands (and
possibly the batch buffer) have been put into the ring, at which point
the operation is complete and we're just cleaning up and marking what
happened. That happens without sleeping and without any error returns.
Any obj with a write_domain must either end up on the flushing list (if
no flush or batch buffer stuff was dumped into the ring), or it must end
up on the active list.
Similarly, any obj without a write_domain must not end up on the
flushing list. obj with some involvement in the ring must end on the
active list.
Those invariants seem easy to check for at least.
Here's a patch which purports to do all of this. Review appreciated.
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d8f57b2..c3c7639 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -32,9 +32,12 @@
#include <linux/swap.h>
static int
-i915_gem_object_set_domain(struct drm_gem_object *obj,
- uint32_t read_domains,
- uint32_t write_domain);
+i915_gem_object_set_pending_domain(struct drm_gem_object *obj,
+ uint32_t read_domains,
+ uint32_t write_domain);
+static void
+i915_gem_object_set_current_domain(struct drm_gem_object *obj);
+
static int
i915_gem_object_set_domain_range(struct drm_gem_object *obj,
uint64_t offset,
@@ -46,6 +49,7 @@ i915_gem_set_domain(struct drm_gem_object *obj,
struct drm_file *file_priv,
uint32_t read_domains,
uint32_t write_domain);
+
static int i915_gem_object_get_page_list(struct drm_gem_object *obj);
static void i915_gem_object_free_page_list(struct drm_gem_object *obj);
static int i915_gem_object_wait_rendering(struct drm_gem_object *obj);
@@ -946,13 +950,18 @@ i915_gem_object_unbind(struct drm_gem_object *obj)
* also ensure that all pending GPU writes are finished
* before we unbind.
*/
- ret = i915_gem_object_set_domain(obj, I915_GEM_DOMAIN_CPU,
- I915_GEM_DOMAIN_CPU);
+ ret = i915_gem_object_set_pending_domain(obj, I915_GEM_DOMAIN_CPU,
+ I915_GEM_DOMAIN_CPU);
if (ret) {
DRM_ERROR("set_domain failed: %d\n", ret);
return ret;
}
+ /* Getting to the CPU domain is synchronous, so
+ * there won't be any pending flushes to queue
+ */
+ i915_gem_object_set_current_domain(obj);
+
if (obj_priv->agp_mem != NULL) {
drm_unbind_agp(obj_priv->agp_mem);
drm_free_agp(obj_priv->agp_mem, obj->size / PAGE_SIZE);
@@ -1322,9 +1331,9 @@ i915_gem_clflush_object(struct drm_gem_object *obj)
* drm_agp_chipset_flush
*/
static int
-i915_gem_object_set_domain(struct drm_gem_object *obj,
- uint32_t read_domains,
- uint32_t write_domain)
+i915_gem_object_set_pending_domain(struct drm_gem_object *obj,
+ uint32_t read_domains,
+ uint32_t write_domain)
{
struct drm_device *dev = obj->dev;
struct drm_i915_gem_object *obj_priv = obj->driver_private;
@@ -1339,6 +1348,12 @@ i915_gem_object_set_domain(struct drm_gem_object *obj,
obj->write_domain, write_domain);
#endif
/*
+ * Start out assuming nothing will change
+ */
+ obj->pending_read_domains = obj->read_domains;
+ obj->pending_write_domain = obj->write_domain;
+
+ /*
* If the object isn't moving to a new write domain,
* let the object stay in multiple read domains
*/
@@ -1383,7 +1398,7 @@ i915_gem_object_set_domain(struct drm_gem_object *obj,
}
if ((write_domain | flush_domains) != 0)
- obj->write_domain = write_domain;
+ obj->pending_write_domain = write_domain;
/* If we're invalidating the CPU domain, clear the per-page CPU
* domain list as well.
@@ -1395,24 +1410,46 @@ i915_gem_object_set_domain(struct drm_gem_object *obj,
DRM_MEM_DRIVER);
obj_priv->page_cpu_valid = NULL;
}
- obj->read_domains = read_domains;
+ obj->pending_read_domains = read_domains;
dev->invalidate_domains |= invalidate_domains;
dev->flush_domains |= flush_domains;
#if WATCH_BUF
DRM_INFO("%s: read %08x write %08x invalidate %08x flush %08x\n",
__func__,
- obj->read_domains, obj->write_domain,
+ obj->pending_read_domains, obj->pending_write_domain,
dev->invalidate_domains, dev->flush_domains);
#endif
return 0;
}
/**
- * Set the read/write domain on a range of the object.
+ * Mark pending domains as current
+ *
+ * Once the flush commands to perform domain transitions have been
+ * posted to the ring, call this function to mark the new effective
+ * read and write domains in an object
+ */
+static void
+i915_gem_object_set_current_domain(struct drm_gem_object *obj)
+{
+ /*
+ * Check to make sure that if we're losing a GPU write domain that
+ * we're flushing it
+ */
+ BUG_ON((obj->write_domain & !(I915_GEM_DOMAIN_CPU|I915_GEM_DOMAIN_GTT))
+ && obj->write_domain != obj->pending_write_domain
+ && (obj->dev->flush_domains & obj->write_domain) == 0);
+
+ obj->read_domains = obj->pending_read_domains;
+ obj->write_domain = obj->pending_write_domain;
+}
+
+/**
+ * Set the pending read/write domain on a range of the object.
*
* Currently only implemented for CPU reads, otherwise drops to normal
- * i915_gem_object_set_domain().
+ * i915_gem_object_set_pending_domain().
*/
static int
i915_gem_object_set_domain_range(struct drm_gem_object *obj,
@@ -1424,14 +1461,12 @@ i915_gem_object_set_domain_range(struct drm_gem_object *obj,
struct drm_i915_gem_object *obj_priv = obj->driver_private;
int ret, i;
+ BUG_ON(read_domains != I915_GEM_DOMAIN_CPU);
+ BUG_ON(write_domain != 0);
+
if (obj->read_domains & I915_GEM_DOMAIN_CPU)
return 0;
- if (read_domains != I915_GEM_DOMAIN_CPU ||
- write_domain != 0)
- return i915_gem_object_set_domain(obj,
- read_domains, write_domain);
-
/* Wait on any GPU rendering to the object to be flushed. */
if (obj->write_domain & ~(I915_GEM_DOMAIN_CPU | I915_GEM_DOMAIN_GTT)) {
ret = i915_gem_object_wait_rendering(obj);
@@ -1459,6 +1494,17 @@ i915_gem_object_set_domain_range(struct drm_gem_object *obj,
return 0;
}
+/* Zero the gloabl flush/invalidate flags. These
+ * will be modified as each object is bound to the
+ * gtt
+ */
+static void i915_gem_dev_prepare_set_domain(struct drm_device *dev)
+{
+ dev->invalidate_domains = 0;
+ dev->flush_domains = 0;
+
+}
+
/**
* Once all of the objects have been set in the proper domain,
* perform the necessary flush and invalidate operations.
@@ -1484,8 +1530,6 @@ i915_gem_dev_set_domain(struct drm_device *dev)
i915_gem_flush(dev,
dev->invalidate_domains,
dev->flush_domains);
- dev->invalidate_domains = 0;
- dev->flush_domains = 0;
}
return flush_domains;
@@ -1811,12 +1855,7 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
return -EBUSY;
}
- /* Zero the gloabl flush/invalidate flags. These
- * will be modified as each object is bound to the
- * gtt
- */
- dev->invalidate_domains = 0;
- dev->flush_domains = 0;
+ i915_gem_dev_prepare_set_domain(dev);
/* Look up object handles and perform the relocations */
for (i = 0; i < args->buffer_count; i++) {
@@ -1863,9 +1902,9 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
}
/* make sure all previous memory operations have passed */
- ret = i915_gem_object_set_domain(obj,
- obj->pending_read_domains,
- obj->pending_write_domain);
+ ret = i915_gem_object_set_pending_domain(obj,
+ obj->pending_read_domains,
+ obj->pending_write_domain);
if (ret)
goto err;
}
@@ -1893,8 +1932,6 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
~0);
#endif
- (void)i915_add_request(dev, flush_domains);
-
/* Exec the batchbuffer */
ret = i915_dispatch_gem_execbuffer(dev, args, exec_offset);
if (ret) {
@@ -1906,7 +1943,7 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
* Ensure that the commands in the batch buffer are
* finished before the interrupt fires
*/
- flush_domains = i915_retire_commands(dev);
+ flush_domains |= i915_retire_commands(dev);
i915_verify_inactive(dev, __FILE__, __LINE__);
@@ -1924,6 +1961,7 @@ i915_gem_execbuffer(struct drm_device *dev, void *data,
struct drm_gem_object *obj = object_list[i];
struct drm_i915_gem_object *obj_priv = obj->driver_private;
+ i915_gem_object_set_current_domain(obj);
i915_gem_object_move_to_active(obj);
obj_priv->last_rendering_seqno = seqno;
#if WATCH_LRU
@@ -2171,16 +2209,28 @@ i915_gem_set_domain(struct drm_gem_object *obj,
struct drm_device *dev = obj->dev;
int ret;
uint32_t flush_domains;
+ uint32_t seqno;
BUG_ON(!mutex_is_locked(&dev->struct_mutex));
- ret = i915_gem_object_set_domain(obj, read_domains, write_domain);
+ i915_gem_dev_prepare_set_domain(dev);
+
+ ret = i915_gem_object_set_pending_domain(obj, read_domains,
+ write_domain);
if (ret)
return ret;
+
flush_domains = i915_gem_dev_set_domain(obj->dev);
- if (flush_domains & ~(I915_GEM_DOMAIN_CPU|I915_GEM_DOMAIN_GTT))
- (void) i915_add_request(dev, flush_domains);
+ i915_gem_object_set_current_domain(obj);
+
+ if (flush_domains & ~(I915_GEM_DOMAIN_CPU|I915_GEM_DOMAIN_GTT)) {
+ struct drm_i915_gem_object *obj_priv = obj->driver_private;
+
+ seqno = i915_add_request(dev, flush_domains);
+ i915_gem_object_move_to_active(obj);
+ obj_priv->last_rendering_seqno = seqno;
+ }
return 0;
}
--
keith.packard at intel.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20081024/e13399c0/attachment.sig>
More information about the Intel-gfx
mailing list