[Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: ttm for stolen (rev5)

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Jun 22 09:05:28 UTC 2022


On 21/06/2022 20:11, Robert Beckett wrote:
> 
> 
> On 21/06/2022 18:37, Patchwork wrote:
>> *Patch Details*
>> *Series:*    drm/i915: ttm for stolen (rev5)
>> *URL:*    https://patchwork.freedesktop.org/series/101396/ 
>> <https://patchwork.freedesktop.org/series/101396/>
>> *State:*    failure
>> *Details:* 
>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html 
>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html>
>>
>>
>>   CI Bug Log - changes from CI_DRM_11790 -> Patchwork_101396v5
>>
>>
>>     Summary
>>
>> *FAILURE*
>>
>> Serious unknown changes coming with Patchwork_101396v5 absolutely need 
>> to be
>> verified manually.
>>
>> If you think the reported changes have nothing to do with the changes
>> introduced in Patchwork_101396v5, please notify your bug team to allow 
>> them
>> to document this new failure mode, which will reduce false positives 
>> in CI.
>>
>> External URL: 
>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/index.html
>>
>>
>>     Participating hosts (40 -> 41)
>>
>> Additional (2): fi-icl-u2 bat-dg2-9
>> Missing (1): fi-bdw-samus
>>
>>
>>     Possible new issues
>>
>> Here are the unknown changes that may have been introduced in 
>> Patchwork_101396v5:
>>
>>
>>       IGT changes
>>
>>
>>         Possible regressions
>>
>>   * igt at i915_selftest@live at reset:
>>       o bat-adlp-4: PASS
>>         
>> <https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_11790/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>
>>         -> DMESG-FAIL
>>         
>> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_101396v5/bat-adlp-4/igt@i915_selftest@live@reset.html> 
>>
>>
> 
> I keep hitting clobbered pages during engine resets on bat-adlp-4.
> It seems to happen most of the time on that machine and occasionally on 
> bat-adlp-6.
> 
> Should bat-adlp-4 be considered an unreliable machine like bat-adlp-6 is 
> for now?
> 
> Alternatively, seeing the history of this in
> 
> commit 3da3c5c1c9825c24168f27b021339e90af37e969 "drm/i915: Exclude low 
> pages (128KiB) of stolen from use"
> 
> could this be an indication that maybe the original issue is worse on 
> adlp machines?
> I have only ever seen page page 135 or 136 clobbered across many runs 
> via trybot, so it looks fairly consistent.
> Though excluding the use of over 540K of stolen might be too severe.

Don't know but I see that on the latest version you even hit pages 165/166.

Any history of hitting this in CI without your series? If not, are there 
some other changes which could explain it? Are you touching the selftest 
itself?

Hexdump of the clobbered page looks quite complex. Especially 
POISON_FREE. Any idea how that ends up there?

Btw what is the benefit of converting stolen to start with? It's not 
much of a backend since it just uses the drm range manager. So quite 
thin and uneventful. Diffstats for the series also do not look like you 
end up with much code reduction?

Regards,

Tvrtko


More information about the Intel-gfx mailing list