GPU lockup dumping

Thu May 24 00:58:11 PDT 2012

On 23.05.2012 19:02, Jerome Glisse wrote:
> On Wed, May 23, 2012 at 12:41 PM, Dave Airlie<airlied at gmail.com>  wrote:
>> On Wed, May 23, 2012 at 5:26 PM, Jerome Glisse<j.glisse at gmail.com>  wrote:
>>> On Wed, May 23, 2012 at 12:08 PM, Dave Airlie<airlied at gmail.com>  wrote:
>>>> On Wed, May 23, 2012 at 3:48 PM, Jerome Glisse<j.glisse at gmail.com>  wrote:
>>>>> On Wed, May 23, 2012 at 8:34 AM, Christian König
>>>>> <deathsimple at vodafone.de>  wrote:
>>>>>> On 23.05.2012 11:27, Dave Airlie wrote:
>>>>>>> On Thu, May 17, 2012 at 7:28 PM,<j.glisse at gmail.com>    wrote:
>>>>>>>> So here is improved patchset, where i splited ground work necessary
>>>>>>>> for the dumping into their own patch. The debugfs improvement could
>>>>>>>> probably be usefull to intel instead of having i915 have it's own
>>>>>>>> debugfs file stuff.
>>>>>>>>
>>>>>>>> The lockup dumping public api have been move into radeon_drm.h
>>>>>>>>
>>>>>>>> Stressing the fact again that dump are self contained ie they have
>>>>>>>> all the data needed to be replayed (vertex, indices, shader, texture,
>>>>>>>> ...).
>>>>>>>>
>>>>>>>> Would really like to get this into 3.5, the new API is pretty much
>>>>>>>> straightforward and userspace tools can easily be made to convert
>>>>>>>> it to other format. The change to the driver is self contained.
>>>>>>> I really don't like introducing this at this stage into 3.5,
>>>>>>>
>>>>>>> I'd really like a good review of the API and what information we provide
>>>>>>> along with how extensible it is.
>>>>>>>
>>>>>>> I'm still not convinced replay is what we want in the field, I know its
>>>>>>> what
>>>>>>> *you* want, but I think apitrace stuff in userspace pretty much covers
>>>>>>> the replaying situation. So I'd have to look at this and see how easy
>>>>>>> it makes disecting command streams etc.
>>>>>>>
>>>>>>> Dave.
>>>>>>
>>>>>> I agree that it might not be a good idea to push that into 3.5, since at
>>>>>> least I (and I also think Alex) didn't had time to look into it yet. On the
>>>>>> other hand the patches look quite reasonable.
>>>>>>
>>>>>> But I still wanted to throw in a requirement from my day to day work, maybe
>>>>>> that helps finding a more general solution:
>>>>>> When we start to work with more parts of the chip it might be necessary to
>>>>>> dump everything that is currently "in the fly". For example I had a whole
>>>>>> bunch of problems where copying data around with a 3D Blit and then missing
>>>>>> a sync between this job and a job on another rings causes a "hiccup" in the
>>>>>> hardware.
>>>>>>
>>>>>> I know that this isn't your focus and that is absolutely ok with me, cause
>>>>>> the format you are introducing is just used in debugfs and so not part of
>>>>>> any stable API (at least not in my understanding), but you should still keep
>>>>>> in mind that we might need to extend it into that direction in the future.
>>>>>>
>>>>>> Christian.
>>>>> Note that my format is also done with that in mind, it can capture ib
>>>>> from all rings. The only thing i don't think worth capturing are the
>>>>> ring themself because there would be no way to replay them without
>>>>> adding some new special API.
>>>> I'd like to dump the rings as well, as I said I'd rather we didn't
>>>> limit this to replay, but make it useful for getting as much info as
>>>> possible out
>>>>
>>>> Dave.
>>> Ring will contains very little, like ib schedule and fence, i don't
>>> see how useful this can be.
>>>
>> In case we have a bug in our ib scheduling or fencing :-0
>>
>> Dave.
> Well i think we have several kind of lockup, the most basic one is
> userspace sending broken shader, vertex, or something in that line.
> The more complex one is timing related, like a bo move or some cache
> invalidation that didn't happen properly and GPU endup reading either
> wrong data or old cached data. I don't see how to capture useful
> information for this second case, beside doing snapshot of memory.
>
> For multi-ring i agree that dumping the ring might prove useful spot
> inter-ring semaphore deadlock, or possibly inter-ring absence of
> synchronization (but that would be a bad kernel bug).

I don't think that we need the actual data from the rings neither (at 
least as long as we keep the radeon_ring_* debugfs files). But it would 
still be nice to know weather or not there was a sync between the rings. 
See the patches I just send to you (sorry, actually send more patches 
than I wanted to send), storing the new sync_seq array within the debug 
output should enable us to actually figure out the dependencies and 
order between different IBs.

Cheers,
Christian.