[Intel-gfx] [PATCH 1/4] drm/i915: Teach hangcheck about long operations on rings

Mon Nov 30 10:04:54 PST 2015

On 30/11/15 17:11, Chris Wilson wrote:
> On Mon, Nov 30, 2015 at 06:53:06PM +0200, Mika Kuoppala wrote:
>> Some operations that happen in ringbuffer, like flushing,
>> can take significant amounts of time. After some intense
>> shader tests, the PIPE_CONTROL with flush can apparently last
>> longer time than what is our hangcheck tick (1500ms). If
>> this happens twice in a row, even with subsequent batches,
>> the hangcheck score decaying mechanism can't cope and
>> hang is declared.
>>
>> Strip out actual head checking to a separate function and if
>> actual head has not moved, check if it is lingering inside the
>> ringbuffer as opposed to batch. If so, treat it as if it would be
>> inside loop to only slightly increment the hangcheck score.
>
> The PIPE_CONTROL in the ring after the batch, is equivalent to the batch
> performing its own PIPE_CONTROL as the last instruction. It does not
> make sense to distinguish the two.
> -Chris

It's equivalent in terms of outcome, but not when checking what's 
happening. The driver controls insertion of PIPE_CONTROLs in the ring, 
but not in batches. If execution is at the ring level, we know it's 
running instructions that the driver put there, and we know that it 
*will* then progress to the next batch (assuming the hardware's not 
stuck). OTOH if execution is inside a batch then we don't know what 
sequence of instructions it's running, and we can't guarantee that the 
batch will ever terminate. So, a reduced penalty if executing 
driver-supplied code makes sense.

.Dave.