[Intel-gfx] [PATCH 2/2] drm/i915: Soften error messages in i915_gem_object_create_from_data()

Chris Wilson chris at chris-wilson.co.uk
Tue Jul 28 10:18:56 PDT 2015


On Tue, Jul 28, 2015 at 06:09:16PM +0100, Dave Gordon wrote:
> On 28/07/15 17:34, Chris Wilson wrote:
> >On Tue, Jul 28, 2015 at 05:29:09PM +0100, Dave Gordon wrote:
> >>On 28/07/15 14:27, Chris Wilson wrote:
> >>>Since we already return -EFAULT to the user, emitting an error message
> >>>*and* WARN is overkill. If the caller is upset, they can do so, but for
> >>>the purposes of debugging we need only log the erroneous values.
> >>>
> >>>Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >>>Cc:: Alex Dai <yu.dai at intel.com>
> >>>Cc: Dave Gordon <david.s.gordon at intel.com>
> >>>Cc: Tom O'Rourke <Tom.O'Rourke at intel.com>
> >>>---
> >>>  drivers/gpu/drm/i915/i915_gem.c | 4 ++--
> >>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>>index c1ded76a6eb4..2039798f4403 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>>@@ -5243,8 +5243,8 @@ i915_gem_object_create_from_data(struct drm_device *dev,
> >>>
> >>>  	i915_gem_object_unpin_pages(obj);
> >>>
> >>>-	if (WARN_ON(bytes != size)) {
> >>>-		DRM_ERROR("Incomplete copy, wrote %zu of %zu", bytes, size);
> >>>+	if (bytes != size) {
> >>>+		DRM_DEBUG_GEM("Incomplete copy, wrote %zu of %zu", bytes, size);
> >>>  		ret = -EFAULT;
> >>>  		goto fail;
> >>>  	}
> >>
> >>I agree the WARN_ON() is overkill, but I think maybe the DRM_ERROR()
> >>is useful. The (one current) caller will report an error, but at
> >>that level it's just:
> >>
> >>         DRM_ERROR("Failed to fetch GuC firmware from %s\n", ...)
> >>
> >>with no more detail as to whether that was due to file-not-found,
> >>bad-file-size, out-of-memory, failed-to-get-pages, or any of the
> >>other errors that might arise.
> >>
> >>At present this code is only called once, and I think this copy
> >>failure "shouldn't ever happen", so it won't be filling the logfile.
> >>But emitting the error here for the truly unexpected case (as
> >>opposed to the commonplace "out-of-memory" and suchlike) helps
> >>current and future callers avoid doing the detailed failure
> >>analysis.
> >
> >The message is still there though, it's just a debug message for the
> >developer. And all it could mean is that the caller passed in a bad
> >value for the size.
> >-Chris
> 
> I'm not sure you could trigger it with a bad size. The copy-target
> has just been allocated (in this function) and so is known to be the
> right size. If the size specified is larger than the data source
> buffer, the copy will likely have OOPSed anyway - or just continued
> with whatever happens to be next in memory.
> 
> So AFAICT the bytes vs. size mismatch can only happen if something
> is broken in the GEM (or SG-list) framework, hence worth announcing.
> If you reload the driver with debug enabled, it probably won't
> happen again (and you would have to reload, not just change the
> debug level, because at present this is only called during driver
> load).

Hmm, I was thinking that the access of data[size] would be checked and
the likely cause of the failure (thinking this was basically just
another copy_from_user check). I disagree that this one failure is any
more significant that any of the other internal failure cases then.

> If not an actual ERROR, then maybe a NOTICE is the right level:

No.
 
>     DRM_NOTICE_IF(bytes != size,
>         "Incomplete copy, wrote %zu of %zu", bytes, size);
> 
> so the the support engineer who sees the "failed to fetch" message
> in the log at least gets some hint about what particular weirdness
> has occurred on this occasion. (Other cases are easier: missing
> firmware file is obvious, wrong version gets a specific ERROR in the
> log; and these errors would be persistent. But a copy-fail would be
> a puzzle.)

That equally applies to a debug message. And even more so since EFAULT
here is unique. Under my assumption bytes would be recognisable in the
context of the caller, however if it ever happens it will be an equally
opaque value.

Given your arguments here, I think we should just remove it.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list