[PATCH] drm/radeon: make 64bit fences more robust

Jerome Glisse j.glisse at gmail.com
Mon Sep 10 09:07:53 PDT 2012


On Mon, Sep 10, 2012 at 11:52 AM, Jerome Glisse <j.glisse at gmail.com> wrote:
> On Mon, Sep 10, 2012 at 11:38 AM, Michel Dänzer <michel at daenzer.net> wrote:
>> On Mon, 2012-09-10 at 14:02 +0200, Christian König wrote:
>>> On 10.09.2012 13:12, Michel Dänzer wrote:
>>> > On Mon, 2012-09-10 at 11:13 +0200, Christian König wrote:
>>> >> Only increase the higher 32bits if we really detect a wrap around.
>>> >>
>>> >> Fixes:
>>> >> https://bugs.freedesktop.org/show_bug.cgi?id=54129
>>> >> https://bugs.freedesktop.org/show_bug.cgi?id=54662
>>> >>
>>> >> Possible fixes:
>>> >> https://bugzilla.redhat.com/show_bug.cgi?id=846505
>>> >> https://bugzilla.redhat.com/show_bug.cgi?id=845639
>>> >>
>>> >> Signed-off-by: Christian König <deathsimple at vodafone.de>
>>> >> Cc: stable at vger.kernel.org
>>> >> ---
>>> >>   drivers/gpu/drm/radeon/radeon_fence.c |    6 +++---
>>> >>   1 file changed, 3 insertions(+), 3 deletions(-)
>>> >>
>>> >> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
>>> >> index 7b737b9..4781e13 100644
>>> >> --- a/drivers/gpu/drm/radeon/radeon_fence.c
>>> >> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
>>> >> @@ -160,7 +160,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>>> >>    do {
>>> >>            seq = radeon_fence_read(rdev, ring);
>>> >>            seq |= last_seq & 0xffffffff00000000LL;
>>> >> -          if (seq < last_seq) {
>>> >> +          if (seq < (last_seq - 0x80000000LL)) {
>>> >>                    seq += 0x100000000LL;
>>> >>            }
>>> > Can you provide a bit more explanation for this change? In particular,
>>> > how could the code previously detect a wraparound when there was none,
>>> > and why is this the proper fix?
>>>
>>> Honestly I also don't really understand how this bug happened in the
>>> first place.
>>>
>>> We extend the 32bit fences supported by hardware by testing if a
>>> previously read fence value is smaller than the value we read now:
>>>
>>> >             if (seq < last_seq) {
>>>
>>> But the problem seems to be that on some systems we do get fence values
>>> that are decreasing, e.g. instead of 5, 6, 7, 8 we get 5, 7, 6, 8 (or
>>> maybe 5, 6, 0, 7, 8 because somebody accidentally overwrites the fence
>>> value).
>>
>> Maybe some kind of race involving radeon_fence_write()?
>>
>>
>>> It might be related to a hardware bug, or the algorithm is flawed in a
>>> way I currently don't see. Anyway the old code we had wasn't so picky
>>> about such problems and the patch just tries to make the current code as
>>> robust as the old code was, which indeed seems to solve the problems we see.
>>>
>>> The wrap around detection still works (tested by setting the initial
>>> fence value to 0xfffffff0 and letting it wrap around shortly after
>>> start), so I think it we can safely commit this.
>>
>> Without knowing exactly what kind of hardware fence value pattern caused
>> the problem, we can't be sure that the wraparound handling will work
>> reliably, or that the values going backwards won't cause other problems.
>> I think it would be good to get more real-world data on that.
>>
>
> As i said in my email this patch just postpone the issue to last_fence
>>= 0x1 8000 0001 if fence value we read back is sometimes randomly 0.
> If we received fence value out of order (which i highly doubt as old
> code would have had same issue thought on smaller scale) then if fence
> value 0x1 8000 0001 is received before fence value 0x1 8000 0000 we
> are right back to all future fence considered as signalled (again this
> will take month of uptime).

Actually thinking back about it if fence are just received out of
order then this patch corner case is if we received 0x1 ffff ffff
after receiving 0x1 0000 0000, what will happen is that the 0x1 0000
0000 is the wrap over that will trigger upper 32bits to be incremented
so fence become 0x2 0000 0000 then we got 0xffff ffff which with |
become 0x2 ffff ffff then we get next fence value 0x0000 0001 and
again we increment upper 32bits so last seq become 0x3 0000 0001.

Again this will happen after month of uptime and all it does is
decrement the amount of uptime for which 64bit fence are fine ie at
worst we over increment by 0x2 0000 0000 instead of 0x1 0000 0000 on
wrap around.

Cheers,
Jerome

>
> All this probably lead to questioning the usefulness of 64bits fence.
>
> Cheers,
> Jerome


More information about the dri-devel mailing list