[Intel-gfx] [PATCH] drm/i915: Do not invalidate obj->pages under mempressure

Tue Feb 10 16:55:34 PST 2015

On 02/09/2015 08:46 AM, Chris Wilson wrote:
> On Sun, Feb 08, 2015 at 03:27:13PM -0800, Sean V Kelley wrote:
>>
>>
>> On 01/16/2015 08:05 PM, Daniel Vetter wrote:
>>> On Thu, Jan 15, 2015 at 08:44:00PM +0000, Chris Wilson wrote:
>>>> On Thu, Jan 15, 2015 at 08:36:15PM +0100, Daniel Vetter wrote:
>>>>> On Wed, Jan 14, 2015 at 9:34 PM, Chris Wilson
>>>>> <chris at chris-wilson.co.uk> wrote:
>>>>>> This (partially) reverts
>>>>>>
>>>>>> commit 5537252b6b6d71fb1a8ed7395a8e5babf91953fd Author: Chris
>>>>>> Wilson <chris at chris-wilson.co.uk> Date:   Tue Mar 25 13:23:06
>>>>>> 2014 +0000
>>>>>>
>>>>>> drm/i915: Invalidate our pages under memory pressure
>>>>>
>>>>> Shouldn't we also revert the hunk in i915_gem_free_objects?
>>>>> Without the truncate vs. invalidate disdinction it seems to
>>>>> have lost it's reason for existence ...
>>>>
>>>> No, setting MADV_DONTNEED has other nice properties during
>>>> put_pages() - I think it is useful in its own right, for example
>>>> that is where my page stealing code goes...
>>>
>>> Well right now I can't make sense of this bit any more (tbh I
>>> didn't with the other code either, but overlooked that while
>>> reviewing). When it's just there for future work but atm dead code
>>> I prefer for it to get removed. -Daniel
>>
>>
>> So can we also revert the hunk in i915_gem_free_objects?  I would like
>> to get this patch merged, it looks like that is the primary concern.
> 
> A problem I have is that the test written to hit the exact condition
> considered in the changelog does not ellict the bug. 
> 
> Can you test whether
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 39e032615b31..6269204ba16f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -1030,6 +1030,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
>                         /* update for the implicit flush after a batch */
>                         obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS;
>                 }
> +               obj->dirty = 1;
>                 if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
>                         i915_gem_request_assign(&obj->last_fenced_req, req);
>                         if (entry->flags & __EXEC_OBJECT_HAS_FENCE) {
> 
> makes the bug go away. If so, I think the bug is in the caller not
> setting reloc domains correctly.

I think you may be right.  This implies a caller issue, because
essentially you are forcing a write back here as if it were in the
write domain.

No corruption seen.  I will add reloc domains to my growing audit list.

via

drm-intel-nightly: 2014y-12m-08d-22h-24m-34s UTC integration manifest

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 0c25f62..4cb2755 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -970,6 +970,8 @@ i915_gem_execbuffer_move_to_active(struct list_head
*vmas,
                        /* update for the implicit flush after a batch */
                        obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS;
                }
+
+               obj->dirty = 1;
                if (entry->flags & EXEC_OBJECT_NEEDS_FENCE) {
                        i915_gem_request_assign(&obj->last_fenced_req, req);
                        if (entry->flags & __EXEC_OBJECT_HAS_FENCE) {


Sean


> -Chris
>