[PATCH 0/3] drm/radeon kexec fixes
alexdeucher at gmail.com
Wed Sep 11 06:30:40 PDT 2013
On Wed, Sep 11, 2013 at 5:01 AM, Markus Trippelsdorf
<markus at trippelsdorf.de> wrote:
> On 2013.09.09 at 11:38 +0200, Christian König wrote:
>> Am 09.09.2013 11:21, schrieb Markus Trippelsdorf:
>> > On 2013.09.08 at 17:32 -0700, Eric W. Biederman wrote:
>> >> Markus Trippelsdorf <markus at trippelsdorf.de> writes:
>> >>> Here are a couple of patches that get kexec working with radeon devices.
>> >>> I've tested this on my RS780.
>> >>> Comments or flames are welcome.
>> >>> Thanks.
>> >> A couple of high level comments.
>> >> This looks promising for the usual case.
>> >> Removing the printk at the end of the kexec path seems a little dubious,
>> >> what of other cpus, interrupt handlers, etc. Basically estabilishing a
>> >> new rule on when printk is allowed seems a little dubious at this point,
>> >> even if it is a useful debugging trick.
>> > OK. I will drop this patch. It doesn't seem to be necessary, because I
>> > cannot reproduce the printk related hang anymore.
>> >> Having a clean shutdown of the radeon definitely seems worth doing,
>> >> because the cases where we care abouty video are when a person is in
>> >> front of the system.
>> > Yes. But please note that even with radeon_pci_shutdown implemented, I
>> > still get ring test failures on roughly every eighth kexec boot:
>> > [drm:r600_dma_ring_test] *ERROR* radeon: ring 3 test failed (0xCAFEDEAD)
>> > radeon 0000:01:05.0: disabling GPU acceleration
>> > That's definitely better than the current state of affairs, with ring
>> > test failures on every second boot. But I haven't figured out the reason
>> > for these failures yet. It's curious that once a ring test failure
>> > occurs, it will reliably fail after each kexec invocation, no matter how
>> > often repeated. Only a reboot brings the machine back to normal.
>> The main problem here is that the AMD gfx hardware doesn't really
>> support being reinitialized once booted (for various reasons). It's a
>> (intended) limitation of the hardware design that you can only
>> initialize certain blocks once every power cycle, so the whole approach
>> actually will never work 100% reliable.
>> All you can hope for is that stopping the hardware while shutting down
>> the old kernel and starting it again results in exactly the same
>> hardware parameters (offsets, clock etc...) otherwise starting the
>> blocks will just fail and you end up with disabled acceleration like above.
>> Sorry, but there isn't much we can do about this,
> I've tested this further and it turned out that if I revert commit
> f5d9b7f0f9 on top of my "drm/radeon: Implement radeon_pci_shutdown"
> patch, the initialization failures seem to go away completely.
> Any idea what's going on?
You are disabling dynamic power management with that patch reverted.
The patch fixed a copy paste typo in the register. Bit 0
(SCLK_PWRMGT_OFF) of register SCLK_PWRMGT_CNTL controls whether
dynamic engine clock control is enabled. Bit 0 (GLOBAL_PWRMGT_EN) of
register GENERAL_PWRMGT controls whether dynamic power management
(dynamic engine/memory/voltage, controls etc.) is enabled at all.
> Here's the patch:
> diff --git a/drivers/gpu/drm/radeon/r600_dpm.c b/drivers/gpu/drm/radeon/r600_dpm.c
> index fa0de46..4e8c1988 100644
> --- a/drivers/gpu/drm/radeon/r600_dpm.c
> +++ b/drivers/gpu/drm/radeon/r600_dpm.c
> @@ -296,9 +296,9 @@ bool r600_dynamicpm_enabled(struct radeon_device *rdev)
> void r600_enable_sclk_control(struct radeon_device *rdev, bool enable)
> if (enable)
> - WREG32_P(SCLK_PWRMGT_CNTL, 0, ~SCLK_PWRMGT_OFF);
> + WREG32_P(GENERAL_PWRMGT, 0, ~SCLK_PWRMGT_OFF);
> - WREG32_P(SCLK_PWRMGT_CNTL, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF);
> + WREG32_P(GENERAL_PWRMGT, SCLK_PWRMGT_OFF, ~SCLK_PWRMGT_OFF);
> void r600_enable_mclk_control(struct radeon_device *rdev, bool enable)
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
More information about the dri-devel