[Intel-gfx] [Fwd: Re: Bug scrub status]

Imre Deak imre.deak at intel.com
Tue Nov 27 16:33:29 CET 2012


Hi,

at Intel we have a weekly bug scrub effort, where a dedicated group is
responsible for tracking regressions, triaging new bugs, fixing bugs
and keeping bugs up to date. At the end of the week we'll post a summary
of this effort, following is the first such:

In general:
-----------

- Bug#55984:
We haven't found the root cause for this, we spent most of the time to
assist Chris and Daniel to try different candidate fixes and bisect
things. How easy it was to reproduce it was influenced a lot by the
environment, SNA vs. UXA, compiz/metacity vs. no composition seemed to
affect it. A further complication is that we hit the GPU hang due to
different reasons.

>From Chris' comments I understand the search still continues, though one
set of the reports should be RC6 related and thus fixed by disabling RC6
on ILK. 

One observation is that we are fighting bugs somewhat opportunistically
not aiming at finding the root cause of the problem. The
reasons/solutions for this as we see it:
- Lack of time
  1 week is a short time and the general expectation is to solve
symptoms ASAP w/o "wasting" a lot of time to understand the problem
better. Proposed solution: better appreciation of finding the underlying
issues, with more time allocated for this.

- Lack of tools
  With the existing tools (apitrace, drm error status, libdrm aub dumps)
we can't fight certain bugs related to the kernel driver/HW like
Bug#55984. Proposed solution: a new tool tracing bo contents, exec buf
and other relevant IOCTLs from the kernel driver, to produce a
replayable trace. Initial investigation started on this and it looks
doable, would be great to get some input about its feasibility /
usefulness from Chris and Daniel or other people on the list.

On the positive side this week was a great opportunity for us to learn a
lot about the inner workings of the driver/HW.


In detail:
----------

Ville:
- Bug#54911:
Tracked this down to invalid EDID handling and is planning to revisit it
once Egbert Eich's EDID patchset settles down.

- A new bug found where the GPU ring tail ptr wraps around and gets
within a cacheline distance from the head. According to the spec this
results in undefined behavior, a patch will be sent to fix this.

- New bug(s) will be opened for GPU hangs that are most probably not
related to Bug#55984.

Mika:
- From the internal bugzilla bugs older than 3 month are set to wontfix,
most of which eventually closed by the reporter. There are still a
number of these left to go through.

- Started to work on a script to search and retrieve i915_error_state
attachments from each. This will allow us for example to find
similarities between bugs and mark them as duplicates.

Imre:
- Nothing else besides assisting Chris and Daniel with reproducing
#55984 and bisecting it to a particular commit. But this commit is
probably not the root cause, simply disabling RC6 got rid of the problem
for me.

--Imre






More information about the Intel-gfx mailing list