[Intel-gfx] [PATCH 09/15] drm/i915: GuC submission setup, phase 1

Fri Jun 19 10:02:33 PDT 2015

On 15/06/15 22:32, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:27PM +0100, Dave Gordon wrote:
>> +static struct drm_i915_gem_object *gem_allocate_guc_obj(struct drm_device *dev,
>> +							u32 size)
>> +{
>> +	struct drm_i915_gem_object *obj;
>> +
>> +	obj = i915_gem_alloc_object(dev, size);
>> +	if (!obj)
>> +		return NULL;
> 
> Does it need to be a shmemfs object?

It needs to be permanently in RAM, so probably not. But I don't see an
allocator that gives you a GEM memory object /without/ backing store. Do
we have one?

>> +	if (i915_gem_object_get_pages(obj)) {
>> +		drm_gem_object_unreference(&obj->base);
>> +		return NULL;
>> +	}
> 
> This is a random function call.

Which is? Unreferencing the newly-allocated object before returning?
Otherwise it will leak :(

Presumably if the object didn't have backing store, the get_pages()
would be unnecessary as they would already be resident? Or would they
not exist until the first get_pages call instantiated them?

>> +	if (i915_gem_obj_ggtt_pin(obj, PAGE_SIZE,
>> +			PIN_OFFSET_BIAS | GUC_WOPCM_SIZE_VALUE)) {
>> +		drm_gem_object_unreference(&obj->base);
>> +		return NULL;
> 
> How about reporting the right error code?

Doesn't add anything. Allocation failure is going to be fatal anyway.
And i915_gem_alloc_object() just returns NULL for failure, so we'd have
to *make up* an error code for that case :(

Oh, and ERR_PTR/PTR_ERR are ugly.

>> +	}
>> +
>> +	return obj;
>> +}
>> +
>> +/**
>> + * gem_release_guc_obj() - Release gem object allocated for GuC usage
>> + * @obj:	gem obj to be released
>> +  */
>> +static void gem_release_guc_obj(struct drm_i915_gem_object *obj)
>> +{
>> +	if (!obj)
>> +		return;
>> +
>> +	if (i915_gem_obj_is_pinned(obj))
>> +		i915_gem_object_ggtt_unpin(obj);
> 
> What?

The object /should/ be pinned when we arrive here, thanks to the
i915_gem_obj_ggtt_pin() call discussed above. We could just always
unpin, and see whether we trigger this:

        WARN_ON(vma->pin_count == 0);

inside i915_gem_object_ggtt_unpin(). The test is just in case there's an
error path where the object being released wasn't in fact pinned.

>> +	drm_gem_object_unreference(&obj->base);
>> +}
>> +
>> +/*
>> + * Set up the memory resources to be shared with the GuC.  At this point,
>> + * we require just one object that can be mapped through the GGTT.
>> + */
>> +int i915_guc_submission_init(struct drm_device *dev)
>> +{
>> +	struct drm_i915_private *dev_priv = dev->dev_private;
> 
> Bleh.

Cross-file interface, nonstatic, hence passes 'dev'; also it needs 'dev'
anyway, so there's no gain in passing dev_priv. And dev_priv
(i.e. struct drm_i915_private) isn't even in scope when this function is
declared in the header file.

>> +	const size_t ctxsize = sizeof(struct guc_context_desc);
>> +	const size_t poolsize = MAX_GUC_GPU_CONTEXTS * ctxsize;
>> +	const size_t gemsize = round_up(poolsize, PAGE_SIZE);
>> +	struct intel_guc *guc = &dev_priv->guc;
>> +
>> +	if (!i915.enable_guc_submission)
>> +		return 0; /* not enabled  */
>> +
>> +	if (guc->ctx_pool_obj)
>> +		return 0; /* already allocated */
> 
> Eh? Where have you hooked into... So looking at that, it looks like you
> want to move this into the device initialisation rather than guc
> firmware load. To me at least they are conceptually separate stages, and
> judging by the above combining them has resulted in very clumsy code.

So ... we don't want to allocate any GuC-related structures unless we're
going at least try to use the GuC, so this has to come /after/ firmware
fetch and validation. OTOH we can't actually fire up the GuC until after
these structures are allocated, because the GGTT address of the
ctx_pool_obj has to be passed to the GuC firmware as one of its startup
parameters. Thus, the only place to do this allocation is in between
deciding to transfer the f/w to the GuC and actually doing so.

Hence, the GuC loading code calls this each time we're about to squirt
the f/w into the GuC; but, we don't need to allocate this more than once
(OTOH it would be a violation of modularity for the loader to know that;
only the submission code needs to know that little detail). So we end up
with the above do-this-only-once code.

>> +	guc->ctx_pool_obj = gem_allocate_guc_obj(dev_priv->dev, gemsize);
>> +	if (!guc->ctx_pool_obj)
>> +		return -ENOMEM;
>> +
>> +	spin_lock_init(&dev_priv->guc.host2guc_lock);
>> +
>> +	ida_init(&guc->ctx_ids);
>> +
>> +	memset(guc->doorbell_bitmap, 0, sizeof(guc->doorbell_bitmap));
>> +	guc->db_cacheline = 0;
> 
> Before you relied on guc being zeroed, and now you memset it again.

Hmm ... perhaps we should rezero some of these things /every/ time instead?

/me examines code ...

Nope; it looks like everything is once again zero at the point when this
function takes the early exit.

>> +	return 0;
>> +}
>> +
>> +void i915_guc_submission_fini(struct drm_device *dev)
>> +{
>> +	struct drm_i915_private *dev_priv = dev->dev_private;
>> +	struct intel_guc *guc = &dev_priv->guc;
>> +
>> +	gem_release_guc_obj(dev_priv->guc.log_obj);
>> +	guc->log_obj = NULL;
>> +
>> +	if (guc->ctx_pool_obj)
>> +		ida_destroy(&guc->ctx_ids);
> 
> Interesting guard. Maybe just make the GuC controller a pointer from
> i915 and then you can do a more natural if (i915->guc == NULL) return;

That test was because there's no way to tell from ctx_ids itself whether
it was initialised (in any case, it's a black box as far as I'm
concerned). But the init code above guarantees that iff the pool was
allocated, then the rest of the initialisation was also done, so we
should call ida_destroy().

I wouldn't object to changing the intel_guc to a separate allocation
rather than embedding it. We'd need to add a backpointer though as we
currently use container_of() inside the guc_to_i915 macro. But you'd
still end up with something like the above, because the allocation of
the ctx_pool is still a separate step that can fail after the intel_guc
structure has been allocated; and you need the f/w-loading-related data
very early. The only saving is a small amount of memory, for an cost of
extra memory dereference at various points. So probably not worth it.

>> +	gem_release_guc_obj(guc->ctx_pool_obj);
>> +	guc->ctx_pool_obj = NULL;
>> +}
>> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
>> index 0b44265..06b68c2 100644
>> --- a/drivers/gpu/drm/i915/intel_guc.h
>> +++ b/drivers/gpu/drm/i915/intel_guc.h
>> @@ -171,4 +171,8 @@ extern void intel_guc_ucode_init(struct drm_device *dev);
>>  extern int intel_guc_ucode_load(struct drm_device *dev, bool wait);
>>  extern void intel_guc_ucode_fini(struct drm_device *dev);
>>  
>> +/* i915_guc_submission.c */
>> +int i915_guc_submission_init(struct drm_device *dev);
>> +void i915_guc_submission_fini(struct drm_device *dev);
>> +
>>  #endif
>> diff --git a/drivers/gpu/drm/i915/intel_guc_loader.c b/drivers/gpu/drm/i915/intel_guc_loader.c
>> index 16eef4c..0f74876 100644
>> --- a/drivers/gpu/drm/i915/intel_guc_loader.c
>> +++ b/drivers/gpu/drm/i915/intel_guc_loader.c
>> @@ -111,6 +111,21 @@ static void set_guc_init_params(struct drm_i915_private *dev_priv)
>>  			i915.guc_log_level << GUC_LOG_VERBOSITY_SHIFT;
>>  	}
>>  
>> +	/* If GuC scheduling is enabled, setup params here. */
>> +	if (i915.enable_guc_submission) {
>> +		u32 pgs = i915_gem_obj_ggtt_offset(dev_priv->guc.ctx_pool_obj);
>> +		u32 ctx_in_16 = MAX_GUC_GPU_CONTEXTS / 16;
> 
> So really you didn't need to pin the ctx_pool_obj until this point?

Possibly. But that's not long after the allocation above (it's called
from the next function that the caller of i915_guc_submission_init()
calls after a successful return from that function).

intel_guc_ucode_load()
  i915_guc_submission_init()
    gem_allocate_guc_obj() -- returns pinned object
  guc_ucode_xfer()
    set_guc_init_params() -- needs GGTT address of pinned object

And we really don't want any extra failure paths at this depth. Better
to pin the object early and bail out if there's a problem. Its going to
be pinned for its entire lifetime anyway, so I don't think there's a
problem with pinning it a few microseconds early, especially /during
first open/ when there's no contention for GGTT space.

>> +		pgs >>= PAGE_SHIFT;
>> +		params[GUC_CTL_CTXINFO] = (pgs << GUC_CTL_BASE_ADDR_SHIFT) |
>> +			(ctx_in_16 << GUC_CTL_CTXNUM_IN16_SHIFT);
>> +
>> +		params[GUC_CTL_FEATURE] |= GUC_CTL_KERNEL_SUBMISSIONS;
>> +
>> +		/* Unmask this bit to enable GuC scheduler */
>> +		params[GUC_CTL_FEATURE] &= ~GUC_CTL_DISABLE_SCHEDULER;

This line deserves the comment firstly because we explicitly set this
bit earlier in the function, but have now decided to clear it again (it
was tidier than having unbalanced if-else legs); and secondly to help
people not get confused by the number of negations (&= ~x means clear
something, but what we're clearing has negative semantics "disable"). So
it does convey our intent ("why? to enable the GuC scheduler") as well
as how ("*un*mask this bit").

[aside]
At least the GuC parameter semantics are not as ugly as some workarounds
in the BSpec, where I regularly find things such as "this workaround
disables feature A when using option B but need not be applied if
condition C holds unless condition D is false or feature E is disabled.
The workaround must not be applied in mode F." *Bleeuurgh*
[/aside]

> /* Enable multiple context submission through the GuC */
> params[GUC_CTL_FEATURE] &= ~GUC_CTL_DISABLE_SCHEDULER;
> params[GUC_CTL_FEATURE] |= GUC_CTL_KERNEL_SUBMISSIONS;
> 
> Try to keep comments to explain why rather than what. Most of the
> comments here fall into the "i++; // postincrement i" category.
> -Chris

Most of the "what" comments in this file are associated with accesses to
specific h/w registers, which therefore have semantic effect beyond what
is explicit in the code. For example this comment:

/* tell all command streamers to forward interrupts and vblank to GuC */
irqs = _MASKED_FIELD(GFX_FORWARD_VBLANK_MASK,GFX_FORWARD_VBLANK_ALWAYS);
irqs |= _MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING);
for_each_ring(ring, dev_priv, i)
    I915_WRITE(RING_MODE_GEN7(ring), irqs);

helps the reader what the /effect/ of the writes is intended to be. It's
quite different from:

/* write bitmask to GEN7 ring mode register */
I915_WRITE(RING_MODE_GEN7(ring),MASKED_BIT_ENABLE(GFX_INTERRUPT_STEERING));

and means you may not have to dig through the BSpec to find out what the
less helpfully-named bits actually do. And this:

        I915_WRITE(DE_GUCRMR, ~0);

would be incomprehensible without reading the BSpec ... or the comment

	/* tell DE to send nothing to GuC */

.Dave.