[PATCH 2/3] drm/radeon: handle lockup in delayed work, v3

Christian König deathsimple at vodafone.de
Mon Aug 18 09:20:22 PDT 2014


Am 18.08.2014 um 17:28 schrieb Maarten Lankhorst:
> Op 18-08-14 om 17:12 schreef Christian König:
>> Am 18.08.2014 um 16:45 schrieb Maarten Lankhorst:
>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at canonical.com>
>>> ---
>>> V1 had a nasty bug breaking gpu lockup recovery. The fix is not
>>> allowing radeon_fence_driver_check_lockup to take exclusive_lock,
>>> and kill it during lockup recovery instead.
>>> V2 used delayed work that ran during lockup recovery, but required
>>> read lock. I've fixed this by downgrading the write, and retrying if
>>> recovery fails.
>>> ---
>>>    drivers/gpu/drm/radeon/radeon.h       |   2 +
>>>    drivers/gpu/drm/radeon/radeon_fence.c | 115 +++++++++++++++++-----------------
>>>    2 files changed, 61 insertions(+), 56 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
>>> index 1d806983ec7b..29504efe8971 100644
>>> --- a/drivers/gpu/drm/radeon/radeon.h
>>> +++ b/drivers/gpu/drm/radeon/radeon.h
>>> @@ -355,6 +355,8 @@ struct radeon_fence_driver {
>>>        uint64_t            sync_seq[RADEON_NUM_RINGS];
>>>        atomic64_t            last_seq;
>>>        bool                initialized;
>>> +    struct delayed_work        fence_check_work;
>>> +    struct radeon_device        *rdev;
>> Put the reference to the device as the first field in the structure.
>>
>>>    };
>>>      struct radeon_fence {
>>> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
>>> index 913787085dfa..94eca53d99f8 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_fence.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
>>> @@ -125,16 +125,7 @@ int radeon_fence_emit(struct radeon_device *rdev,
>>>        return 0;
>>>    }
>>>    -/**
>>> - * radeon_fence_process - process a fence
>>> - *
>>> - * @rdev: radeon_device pointer
>>> - * @ring: ring index the fence is associated with
>>> - *
>>> - * Checks the current fence value and wakes the fence queue
>>> - * if the sequence number has increased (all asics).
>>> - */
>>> -void radeon_fence_process(struct radeon_device *rdev, int ring)
>>> +static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
>> Don't use "__" for internal radeon function names and especially don't remove the function documentation.
> I've moved the documentation to the new radeon_fence_process function that calls the __radeon_fence_process one,
> which is an internal function that is only used by the driver. I guess I can rename it to __radeon_fence_process_nowake or something instead,
> but documentation doesn't make sense since it's not used outside of radeon_fence.c

We try to document even static functions in doxygen style even if you 
can't see them outside the C file.

>>>    {
>>>        uint64_t seq, last_seq, last_emitted;
>>>        unsigned count_loop = 0;
>>> @@ -190,7 +181,53 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>>>            }
>>>        } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
>>>    -    if (wake)
>>> +    if (seq < last_emitted && !rdev->in_reset)
>>> +        mod_delayed_work(system_power_efficient_wq,
>>> +                 &rdev->fence_drv[ring].fence_check_work,
>>> +                 RADEON_FENCE_JIFFIES_TIMEOUT);
>> Am I wrong or do you queue the work only after radeon_fence_process is called for the first time?
>>
>> Might be a good idea to have an explicit queue_delayed_work in radeon_fence_emit as well.
> Yeah might as well, with these changes it makes sense to run it as soon as possible.
>
> But if there are no waiters, there's no real need. It's a tree falling in a forest with no-one around to hear it. ;-)

That's why I suggested to schedule the work item only when the IRQ is 
enabled for waiting, otherwise it indeed doesn't make much sense.

Christian.


More information about the dri-devel mailing list