[Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

Thu Jul 2 04:00:59 PDT 2015

On Thu, Jul 02, 2015 at 11:42:44AM +0100, Tvrtko Ursulin wrote:
> I am not super familiar with low level mapping business. But it
> looks correct to me. Just one question would be if there are any
> downsides to WC mapping? If in the read case it would be any
> advantage not to ask for WC?

Read-side WC behaves as UC. It's bad, but since we have no other
coherent access we have no choice. We rely on userspace not to do this,
and our powers of persuasion that copying to a coherent buffer first
using the GPU and then using the CPU for readback is about 8x faster
(incl. the sync overhead) then having to use UC.

> >+static int
> >+i915_gem_gtt_pread_pwrite(struct drm_device *dev,
> >+			  struct drm_i915_gem_object *obj, uint64_t size,
> >+			  uint64_t data_offset, uint64_t data_ptr, bool write)
> >+{
> >+	struct drm_i915_private *dev_priv = dev->dev_private;
> >+	char __user *user_data;
> >+	ssize_t remain;
> >+	loff_t offset, page_base;
> >+	int page_offset, page_length, ret = 0;
> >+
> >+	ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
> >+	if (ret)
> >+		goto out;
> >+
> >+	ret = i915_gem_object_set_to_gtt_domain(obj, write);
> >+	if (ret)
> >+		goto out_unpin;
> >+
> >+	ret = i915_gem_object_put_fence(obj);
> >+	if (ret)
> >+		goto out_unpin;
> >+
> >+	user_data = to_user_ptr(data_ptr);
> >+	remain = size;
> 
> Strictly speaking uint64_t can overflow ssize_t, compiler does not
> care in this case?

compiler is too quiet at times. Killing ssize_t and even loff_t here
makes sense.

> >+	offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
> >+
> >+	if (write)
> >+		intel_fb_obj_invalidate(obj, ORIGIN_GTT);

Smells wrong, better off combining the invalidate with the right flags
in the caller.

> >+	mutex_unlock(&dev->struct_mutex);
> >+	if (!write && likely(!i915.prefault_disable))
> >+		ret = fault_in_multipages_writeable(user_data, remain);
> 
> A bit confusing read/write inversion. :) But correct. Just wondering
> if it would make sense to invert the boolean at least in
> slow_user_access. Or in fact just call it pwrite instead of write to
> reflect pwrite == true is _reading_ from user memory.
> 
> >+	while (remain > 0) {
> >+		/* Operation in this page
> >+		 *
> >+		 * page_base = page offset within aperture
> >+		 * page_offset = offset within page
> >+		 * page_length = bytes to copy for this page
> >+		 */
> >+		page_base = offset & PAGE_MASK;
> >+		page_offset = offset_in_page(offset);
> >+		page_length = remain;
> >+		if ((page_offset + remain) > PAGE_SIZE)
> >+			page_length = PAGE_SIZE - page_offset;
> 
> It would save some arithmetics and branching to pull out the first
> potentially unaligned copy out of the loop and then page_offset
> would be zero for 2nd page onwards.

Honestly, keeping the loop simple is preferable :) What I do locally if

page_offset = offset_in_page();
offset &= PAGE_MASK;
while (remain > 0) {
	len = min(remain, PAGE_SIZE - page_offset);
	....
	user_data += len;
	remain -= len;
	offset += PAGE_SIZE;
	page_offset = 0;
}

> 
> >+		/* This is a slow read/write as it tries to read from
> >+		 * and write to user memory which may result into page
> >+		 * faults
> >+		 */
> >+		ret = slow_user_access(dev_priv->gtt.mappable, page_base,
> >+				       page_offset, user_data,
> >+				       page_length, write);
> >+
> >+		if (ret) {
> >+			ret = -EINVAL;

Would be better to return the right errno from slow_user_access or if it
is a bool, treat is a bool!

> >+			break;
> >+		}
> >+
> >+		remain -= page_length;
> >+		user_data += page_length;
> >+		offset += page_length;
> >+	}
> >+
> >+	mutex_lock(&dev->struct_mutex);
> 
> Caller had it interruptible.

On second access, it is preferrable not to risk interruption -
especially when catching up with state to return to userspace.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre