[PATCH 2/3] drm/radeon: handle lockup in delayed work, v3
Christian König
deathsimple at vodafone.de
Mon Aug 18 09:20:22 PDT 2014
Am 18.08.2014 um 17:28 schrieb Maarten Lankhorst:
> Op 18-08-14 om 17:12 schreef Christian König:
>> Am 18.08.2014 um 16:45 schrieb Maarten Lankhorst:
>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at canonical.com>
>>> ---
>>> V1 had a nasty bug breaking gpu lockup recovery. The fix is not
>>> allowing radeon_fence_driver_check_lockup to take exclusive_lock,
>>> and kill it during lockup recovery instead.
>>> V2 used delayed work that ran during lockup recovery, but required
>>> read lock. I've fixed this by downgrading the write, and retrying if
>>> recovery fails.
>>> ---
>>> drivers/gpu/drm/radeon/radeon.h | 2 +
>>> drivers/gpu/drm/radeon/radeon_fence.c | 115 +++++++++++++++++-----------------
>>> 2 files changed, 61 insertions(+), 56 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
>>> index 1d806983ec7b..29504efe8971 100644
>>> --- a/drivers/gpu/drm/radeon/radeon.h
>>> +++ b/drivers/gpu/drm/radeon/radeon.h
>>> @@ -355,6 +355,8 @@ struct radeon_fence_driver {
>>> uint64_t sync_seq[RADEON_NUM_RINGS];
>>> atomic64_t last_seq;
>>> bool initialized;
>>> + struct delayed_work fence_check_work;
>>> + struct radeon_device *rdev;
>> Put the reference to the device as the first field in the structure.
>>
>>> };
>>> struct radeon_fence {
>>> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
>>> index 913787085dfa..94eca53d99f8 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_fence.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
>>> @@ -125,16 +125,7 @@ int radeon_fence_emit(struct radeon_device *rdev,
>>> return 0;
>>> }
>>> -/**
>>> - * radeon_fence_process - process a fence
>>> - *
>>> - * @rdev: radeon_device pointer
>>> - * @ring: ring index the fence is associated with
>>> - *
>>> - * Checks the current fence value and wakes the fence queue
>>> - * if the sequence number has increased (all asics).
>>> - */
>>> -void radeon_fence_process(struct radeon_device *rdev, int ring)
>>> +static bool __radeon_fence_process(struct radeon_device *rdev, int ring)
>> Don't use "__" for internal radeon function names and especially don't remove the function documentation.
> I've moved the documentation to the new radeon_fence_process function that calls the __radeon_fence_process one,
> which is an internal function that is only used by the driver. I guess I can rename it to __radeon_fence_process_nowake or something instead,
> but documentation doesn't make sense since it's not used outside of radeon_fence.c
We try to document even static functions in doxygen style even if you
can't see them outside the C file.
>>> {
>>> uint64_t seq, last_seq, last_emitted;
>>> unsigned count_loop = 0;
>>> @@ -190,7 +181,53 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>>> }
>>> } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
>>> - if (wake)
>>> + if (seq < last_emitted && !rdev->in_reset)
>>> + mod_delayed_work(system_power_efficient_wq,
>>> + &rdev->fence_drv[ring].fence_check_work,
>>> + RADEON_FENCE_JIFFIES_TIMEOUT);
>> Am I wrong or do you queue the work only after radeon_fence_process is called for the first time?
>>
>> Might be a good idea to have an explicit queue_delayed_work in radeon_fence_emit as well.
> Yeah might as well, with these changes it makes sense to run it as soon as possible.
>
> But if there are no waiters, there's no real need. It's a tree falling in a forest with no-one around to hear it. ;-)
That's why I suggested to schedule the work item only when the IRQ is
enabled for waiting, otherwise it indeed doesn't make much sense.
Christian.
More information about the dri-devel
mailing list