[Intel-gfx] [PATCH v10] drm/i915: Extend LRC pinning to cover GPU context writeback

Mon Jan 18 07:02:25 PST 2016

Hi guys,

On 15/01/16 10:59, Nick Hoath wrote:
> On 14/01/2016 12:37, Nick Hoath wrote:
>> On 14/01/2016 12:31, Chris Wilson wrote:
>>> On Thu, Jan 14, 2016 at 11:56:07AM +0000, Nick Hoath wrote:
>>>> On 14/01/2016 11:36, Chris Wilson wrote:
>>>>> On Wed, Jan 13, 2016 at 04:19:45PM +0000, Nick Hoath wrote:
>>>>>> +    if (ctx->engine[ring->id].dirty) {
>>>>>> +        struct drm_i915_gem_request *req = NULL;
>>>>>> +
>>>>>> +        /**
>>>>>> +         * If there is already a request pending on
>>>>>> +         * this ring, wait for that to complete,
>>>>>> +         * otherwise create a switch to idle request
>>>>>> +         */
>>>>>> +        if (list_empty(&ring->request_list)) {
>>>>>> +            int ret;
>>>>>> +
>>>>>> +            ret = i915_gem_request_alloc(
>>>>>> +                    ring,
>>>>>> +                    ring->default_context,
>>>>>> +                    &req);
>>>>>> +            if (!ret)
>>>>>> +                i915_add_request(req);
>>>>>> +            else
>>>>>> +                DRM_DEBUG("Failed to ensure context saved");
>>>>>> +        } else {
>>>>>> +            req = list_first_entry(
>>>>>> +                    &ring->request_list,
>>>>>> +                    typeof(*req), list);
>>>>>> +        }
>>>>>> +        if (req) {
>>>>>> +            ret = i915_wait_request(req);
>>>>>> +            if (ret != 0) {
>>>>>> +                /**
>>>>>> +                 * If we get here, there's probably been a ring
>>>>>> +                 * reset, so we just clean up the dirty flag.&
>>>>>> +                 * pin count.
>>>>>> +                 */
>>>>>> +                ctx->engine[ring->id].dirty = false;
>>>>>> +                __intel_lr_context_unpin(
>>>>>> +                    ring,
>>>>>> +                    ctx);
>>>>>> +            }
>>>>>> +        }
>>>>>
>>>>> If you were to take a lr_context_pin on the last_context, and only
>>>>> release that pin when you change to a new context, you do not need to
>>>>
>>>> That what this patch does.
>>>>
>>>>> introduce a blocking context-close, nor do you need to introduce the
>>>>> usage of default_context.
>>>>
>>>> The use of default_context here is to stop a context hanging around
>>>> after it is no longer needed.
>>>
>>> By blocking, which is not acceptable. Also we can eliminate the
>>> default_context and so pinning that opposed to the last_context serves
>>> no purpose other than by chance having a more preferrable position when
>>> it comes to defragmentation. But you don't enable that anyway and we
>>
>> Enabling the shrinker on execlists is something I'm working on which is
>> predicated on this patch. Also why is blocking on closing a context not
>> acceptable?
>>
> 
> As a clarification: Without rewriting the execlist code to not submit or 
> cleanup from an interrupt handler, we can't use refcounting to allow non 
> blocking closing.

I am trying to understand this issue so please bear with me if I got it
wrong. And also it is not tested since I don't have a suitable system.

But... would something like the below be an interesting step towards an
acceptable solution?

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 2cfcf9401971..63bb251edffd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2927,6 +2927,8 @@ void i915_gem_reset(struct drm_device *dev)
 void
 i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
+       struct drm_i915_gem_request *prev_req, *next, *req;
+
        WARN_ON(i915_verify_lists(ring->dev));
 
        /* Retire requests first as we use it above for the early return.
@@ -2934,17 +2936,17 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
         * the requests lists without clearing the active list, leading to
         * confusion.
         */
-       while (!list_empty(&ring->request_list)) {
-               struct drm_i915_gem_request *request;
-
-               request = list_first_entry(&ring->request_list,
-                                          struct drm_i915_gem_request,
-                                          list);
-
-               if (!i915_gem_request_completed(request, true))
+       list_for_each_entry_safe(req, next, &ring->request_list, list) {
+               if (!i915_gem_request_completed(req, true))
                        break;
 
-               i915_gem_request_retire(request);
+               if (!i915.enable_execlists || !i915.enable_guc_submission) {
+                       i915_gem_request_retire(req);
+               } else {
+                       prev_req = list_prev_entry(req, list);
+                       if (prev_req)
+                               i915_gem_request_retire(prev_req);
+               }
        }

To explain, this attempts to ensure that in GuC mode requests are only
unreferenced if there is a *following* *completed* request.

This way, regardless of whether they are using the same or different
contexts, we can be sure that the GPU has either completed the 
context writing, or that the unreference will not cause the final
unpin of the context.

With the above snippet it would leave the last context pinned, but,
that could also be improved by either appending a default empty
context when we get to the end of the list, or perhaps periodically
from the retire worker only to lessen the ctx switch traffic.

Thoughts, comments?

Regards,

Tvrtko