[Intel-gfx] [PATCH 2/3] drm/i915: Infrastructure for supporting different GGTT views per object

Thu Dec 4 04:17:51 PST 2014

On 12/04/2014 10:59 AM, Daniel Vetter wrote:
> On Thu, Dec 04, 2014 at 10:26:14AM +0000, Chris Wilson wrote:
>> On Thu, Dec 04, 2014 at 10:19:09AM +0000, Tvrtko Ursulin wrote:
>>>
>>> On 12/04/2014 09:53 AM, Chris Wilson wrote:
>>>> On Wed, Dec 03, 2014 at 02:59:25PM +0000, Tvrtko Ursulin wrote:
>>>>> +void i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
>>>>> +		   u32 flags)
>>>>> +{
>>>>> +	struct sg_table *pages = i915_ggtt_view_pages(vma);
>>>>> +
>>>>> +	if (pages && !IS_ERR(pages)) {
>>>>> +		vma->bind_vma(vma, pages, cache_level, flags);
>>>>> +
>>>>> +		if (vma->ggtt_view.type != I915_GGTT_VIEW_NORMAL) {
>>>>> +			sg_free_table(pages);
>>>>> +			kfree(pages);
>>>>> +		}
>>>>> +	}
>>>>> +}
>>>>
>>>> Stop. Even if the failure path is impossible with the present
>>>> implementation, here you are masking the error only to go and pretend
>>>> the binding succeeded.
>>>>
>>>> Don't be lazy, this is a very nasty bug that should be hit during igt -
>>>> or else you are not testing well enough.
>>>
>>> Fair comment, even if a bit too assuming. I actually had this as
>>> TODO but somehow lost it.
>>>
>>> I don't have any ideas on how to provoke this to fail from an IGT?
>>> Even with future implementations it boils down to a couple of small
>>> allocations which would have to fail reliably.
>>
>> We have quite a few thrash tests now that are fairly good at getting
>> even the small allocations to fail.
>>
>> What we don't have is a single-fd, multi-ctx thrash test (well except
>> for some GL tests...)
>
> But none of these tests result in permanent memory failures (only the
> occasional ioctl restart when waiting for gpu rendering). And sg table
> alloc only recurses through the shrinker so that can't happen. So I think
> we just have to get by with review.
>
> We did have issues with sg table allocations in stress tests though,
> before we've added the recursive shrinker locking, hence sg table alloc
> can indeed go south.

I looked at propagating errors from i915_vma_bind() out to callers and 
it is mostly all fine apart from the i915_gem_restore_gtt_mappings 
during i915_drm_resume.

I don't see how this is fixable apart by going back and having sgls stay 
around for the lifetime of their VMAs. It shouldn't be such a big deal - 
they are not so big even with non-coalesced entries.

Thoughts?

Regards,

Tvrtko