[Intel-gfx] [PATCH 2/4] drm/i915: mark the global default (intel_)context as such

Dave Gordon david.s.gordon at intel.com
Thu Dec 17 11:00:02 PST 2015


On 17/12/15 12:27, Chris Wilson wrote:
> On Thu, Dec 17, 2015 at 11:09:54AM +0000, Nick Hoath wrote:
>> On 16/12/2015 19:30, Chris Wilson wrote:
>>> On Wed, Dec 16, 2015 at 07:22:52PM +0000, Dave Gordon wrote:
>>>> On 16/12/15 18:57, Chris Wilson wrote:
>>>>> On Wed, Dec 16, 2015 at 06:36:49PM +0000, Dave Gordon wrote:
>>>>>> Some of the LRC-specific context-destruction code has to special-case
>>>>>> the global default context, because the HWSP is part of that context. At
>>>>>> present it deduces it indirectly by checking for the backpointer from
>>>>>> the engine to the context, but that's an unsafe assumption if the setup
>>>>>> and teardown code is reorganised. (It could also test !ctx->file_priv,
>>>>>> but again that's a detail that might be subject to change).
>>>>>>
>>>>>> So here we explicitly flag the default context at the point of creation,
>>>>>> and then reorganise the code in intel_lr_context_free() not to rely on
>>>>>> the ring->default_pointer (still) being set up; to iterate over engines
>>>>>> in reverse (as this is teardown code); and to reduce the nesting level
>>>>>> so it's easier to read.
>>>>>>
>>>>>> Signed-off-by: Dave Gordon <david.s.gordon at intel.com>
>>>>>
>>>>> #define intel_context_is_global(ctx) ((ctx)->file_priv == NULL)

Why not
#define intel_context_is_global(ctx)	((ctx)->is_global) /* ! */

>>>> The last sentence of the first paragraph of the commit message above
>>>> notes that we *could* use that as a test, but I don't regard it as a
>>>> safe test, in either direction. That is, it could give a false
>>>> negative if we someday associate some (internal) fd with the default
>>>> context, or (more likely) a false positive if the file association
>>>> were broken and the pointer nulled in an earlier stage of the
>>>> teardown of a non-global (user-created) context.
>>>>
>>>> int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>>>>                                     struct drm_file *file)
>>>> {
>>>>          struct drm_i915_gem_context_destroy *args = data;
>>>>          struct drm_i915_file_private *file_priv = file->driver_priv;
>>>>          struct intel_context *ctx;
>>>>          int ret;
>>>>
>>>>          if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
>>>>                  return -ENOENT;
>>>>
>>>>          ret = i915_mutex_lock_interruptible(dev);
>>>>          if (ret)
>>>>                  return ret;
>>>>
>>>>          ctx = i915_gem_context_get(file_priv, args->ctx_id);
>>>>          if (IS_ERR(ctx)) {
>>>>                  mutex_unlock(&dev->struct_mutex);
>>>>                  return PTR_ERR(ctx);
>>>>          }
>>>>
>>>>          idr_remove(&ctx->file_priv->context_idr, ctx->user_handle);
>>>>          i915_gem_context_unreference(ctx);
>>>>          mutex_unlock(&dev->struct_mutex);
>>>>
>>>>          DRM_DEBUG_DRIVER("HW context %d destroyed\n", args->ctx_id);
>>>>          return 0;
>>>> }
>>>>
>>>> At present, i915_gem_context_destroy_ioctl() above removes the
>>>> context from the file's list-of-contexts but DOESN'T clear the
>>>> ctx->file_priv, which means there's a somewhat inconsistent (but
>>>> transient) state during which a soon-to-be-destroyed context links
>>>> to a file, but the file doesn't have a link back. It probably
>>>> doesn't matter, because the code holds the mutex across the two
>>>> operations ...
>>>
>>> And that the ctx was created to belong to the file still holds true.
>>>
>>>> ... unless of course the context's refcount isn't 1 at this point,
>>>> in which case I suppose someone else *might* go from the context to
>>>> the file and then be mystified as to why the context isn't on the
>>>> list ...
>>>>
>>>> ... and if we changed the code above, then file_priv would *always*
>>>> be NULL by the time the destructor was called!
>>>>
>>>> So it's surely safer to have a flag that explicitly says "I'm the
>>>> global default context" than to guess based on some other contingent
>>>> property.
>>>
>>> No, we have a flag that says this context was created belonging to a
>>> file, with the corollary that only one context doesn't belong to any
>>> file.
>> Using pointers like this to provide 'magic' secondary state information
>> just adds to the fragility of the driver.
>
> It's primary. Ownership information is fundamental to the driver.
> The change is irrelevant to the bugfix.
>
> If you want to make such a big change, eliminate the default_ctx from
> execlists.
> -Chris

No, we need the default (or global) context for idling the engines, as 
well as for sending initialisation commands during startup. We can't 
make the GPU stop using any given (user) context within a known bounded 
time except by telling it to switch to another context. Ergo, to stop 
using ANY user context, there must exist a non-user context that we can 
switch to.

And this is not a BIG change, it's a trivial change, to identify THE 
GLOBAL CONTEXT by means of *specific* information contained within that 
context, rather than by relying on information stored in a different 
structure.

Your suggested #define also does that much, but in a less obvious way 
that I consider lays a trap for a future maintainer who doesn't realise 
that the code relies on a subtle distinction between "a context created 
internally by the driver, without an owning fd" and "a context that has 
become detached from the fd that originally created it".

Eliminating such obscure and undocumented dependencies enables us to 
more safely restructure and simplify areas that currently have 
excessively high coupling, in terms of the nonlocal assumptions that 
each part makes about what other pieces of code have done/will do.
The delicate and fragile load/unload dance is a prime example.

.Dave.


More information about the Intel-gfx mailing list