[PATCH 3/4] drm/radeon: consolidate uvd/vce initialization, resume and suspend.

Christian König christian.koenig at amd.com
Wed Mar 16 17:06:36 UTC 2016


Am 16.03.2016 um 16:56 schrieb Jerome Glisse:
> On Wed, Mar 16, 2016 at 04:19:16PM +0100, Christian König wrote:
>> Am 16.03.2016 um 15:59 schrieb Jerome Glisse:
>>> On Wed, Mar 16, 2016 at 2:03 PM, Christian König
>>> <deathsimple at vodafone.de> wrote:
>>>> Am 16.03.2016 um 13:48 schrieb Jérôme Glisse:
>>>>> From: Jérome Glisse <jglisse at redhat.com>
>>>>>
>>>>> This consolidate uvd/vce into a common shape for all generation. It
>>>>> also leverage the rdev->has_uvd flags to know what it is useless to
>>>>> try to resume/suspend uvd/vce block.
>>>>>
>>>>> There is no functional changes when there is no error. On error the
>>>>> device driver will behave differently than before after this patch.
>>>>> It should now safely ignore uvd/vce errors and keeps normal operation
>>>>> of others engine. This is an improvement over current situation where
>>>>> we have different behavior depending on GPU generation and on what
>>>>> fails.
>>>>>
>>>>> Finaly this is a preparatory step for a patch which allow to disable
>>>>> uvd/vce as a driver option.
>>>>>
>>>>> This have only been tested on southern island so please test it on
>>>>> other generations (i do not have hardware handy for now).
>>>>>
>>>>> Signed-off-by: Jérôme Glisse <jglisse at redhat.com>
>>>>> Cc: Alex Deucher <alexander.deucher at amd.com>
>>>>> Cc: Christian König <christian.koenig at amd.com>
>>>> NAK, skipping UVD and VCE suspend/resume when initialization fails should
>>>> already be implemented.
>>>>
>>>> There might be some (quite some) bugs in there, but that doesn't justify
>>>> reworking the initialization over all different generations. Especially
>>>> since you don't have hardware to test all of them.
>>>>
>>>> Just make sure that radeon_uvd/vce_fini() is called when something goes
>>>> wrong and/or that the UVD/VCE BO is properly released.
>>>>
>>>> Regards,
>>>> Christian.
>>> Current code is a mess when it comes to handling error related to
>>> uvd/vce. This patch consolidate control flow in something easy to
>>> follow. You can check that there is absolulety no control flow change
>>> for the case where uvd/vce works and thus it does not break anything
>>> that works. It will only gracefully fails and cleanup if things go
>>> wrong. So while i have not tested on other hw i am confident that this
>>> does not introduce regression.
>>>
>>> I tried to do it without consolidation but it ended up in adding even
>>> more if() levels that line did begins after 80colums. So please
>>> reconsider because this is an improvement over existing code.
>> Well then please point out at the example of the SI or CIK code what exactly
>> is missing here.
> Going from :
> 	if (rdev->has_uvd) {
> 		r = uvd_v2_2_resume(rdev);
> 		if (!r) {
> 			r = radeon_fence_driver_start_ring(rdev,
> 							   R600_RING_TYPE_UVD_INDEX);
> 			if (r)
> 				dev_err(rdev->dev, "UVD fences init error (%d).\n", r);
> 		}
> 		if (r)
> 			rdev->ring[R600_RING_TYPE_UVD_INDEX].ring_size = 0;
> 	}
> 	r = radeon_vce_resume(rdev);
> 	if (!r) {
> 		r = vce_v1_0_resume(rdev);
> 		if (!r)
> 			r = radeon_fence_driver_start_ring(rdev,
> 							   TN_RING_TYPE_VCE1_INDEX);
> 		if (!r)
> 			r = radeon_fence_driver_start_ring(rdev,
> 							   TN_RING_TYPE_VCE2_INDEX);
> 	}
> 	if (r) {
> 		dev_err(rdev->dev, "VCE init error (%d).\n", r);
> 		rdev->ring[TN_RING_TYPE_VCE1_INDEX].ring_size = 0;
> 		rdev->ring[TN_RING_TYPE_VCE2_INDEX].ring_size = 0;
> 	}
>
>
> To:
> 	r = uvd_v2_2_resume(rdev);
> 	if (r)
> 		goto error;
> 	r = radeon_fence_driver_start_ring(rdev, R600_RING_TYPE_UVD_INDEX);
> 	if (r)
> 		goto error_uvd;
> 	r = radeon_vce_resume(rdev);
> 	if (r)
> 		goto error_uvd;
> 	r = vce_v1_0_resume(rdev);
> 	if (r)
> 		goto error_vce;
> 	r = radeon_fence_driver_start_ring(rdev, TN_RING_TYPE_VCE1_INDEX);
> 	if (r)
> 		goto error_vce;
> 	r = radeon_fence_driver_start_ring(rdev, TN_RING_TYPE_VCE2_INDEX);
> 	if (r)
> 		goto error_vce;
> 	return;
> error_vce:
> 	radeon_vce_suspend(rdev);
> error_uvd:
> 	radeon_uvd_suspend(rdev);
> error:
> 	dev_err(rdev->dev, "UVD/VCE startup error (%d).\n", r);
> 	/* On error just disable everything. */
> 	radeon_vce_fini(rdev);
> 	radeon_uvd_fini(rdev);
> 	rdev->ring[R600_RING_TYPE_UVD_INDEX].ring_size = 0;
> 	rdev->ring[TN_RING_TYPE_VCE1_INDEX].ring_size = 0;
> 	rdev->ring[TN_RING_TYPE_VCE2_INDEX].ring_size = 0;

And as I said that is exactly what you should NOT be doing here. Once 
the firmware is loaded the block should be kept in that state.

Freeing the memory allocated for the firmware is also not a good idea at 
all because we don't know who exactly is accessing it.

>
>
> Is lot more clear to me than bunch of intertwine if/else. A clear error path
> for which you do not have to jump through if level to see what get executed
> or not on error. The only difference is that it does tie uvd and vce together.
> I did that on purpose because on the hw i am playing with the vce seems to be
> useless when the uvd block fails (opposite seems to be true too). If you think
> we should still try to init vce when uvd fails or uvd when vce fails i can
> split uvd and vce.

UVD and VCE are two completely separate blocks, they shouldn't be 
related to each other in anyway.

When you see failures of both at the same time it's rather unlikely that 
it is actually related to them.

>
> The other difference with existing code is that i free resources normaly use
> uvd/vce on error (free fw buffer). This is just me trying to free resource
> early and it has no impact as block are not working.
>
> -----------------------------------------------------------------------------------
>
> Second part we go from:
> 	if (rdev->has_uvd) {
> 		ring = &rdev->ring[R600_RING_TYPE_UVD_INDEX];
> 		if (ring->ring_size) {
> 			r = radeon_ring_init(rdev, ring, ring->ring_size, 0,
> 					     RADEON_CP_PACKET2);
> 			if (!r)
> 				r = uvd_v1_0_init(rdev);
> 			if (r)
> 				DRM_ERROR("radeon: failed initializing UVD (%d).\n", r);
> 		}
> 	}
> 	r = -ENOENT;
> 	ring = &rdev->ring[TN_RING_TYPE_VCE1_INDEX];
> 	if (ring->ring_size)
> 		r = radeon_ring_init(rdev, ring, ring->ring_size, 0,
> 				     VCE_CMD_NO_OP);
> 	ring = &rdev->ring[TN_RING_TYPE_VCE2_INDEX];
> 	if (ring->ring_size)
> 		r = radeon_ring_init(rdev, ring, ring->ring_size, 0,
> 				     VCE_CMD_NO_OP);
> 	if (!r)
> 		r = vce_v1_0_init(rdev);
> 	else if (r != -ENOENT)
> 		DRM_ERROR("radeon: failed initializing VCE (%d).\n", r);
>
>
> To:
> 	ring = &rdev->ring[R600_RING_TYPE_UVD_INDEX];
> 	r = radeon_ring_init(rdev, ring, ring->ring_size, 0, RADEON_CP_PACKET2);
> 	if (r)
> 		goto error;
> 	r = uvd_v1_0_init(rdev);
> 	if (r)
> 		goto error_uvd;
> 	ring = &rdev->ring[TN_RING_TYPE_VCE1_INDEX];
> 	r = radeon_ring_init(rdev, ring, ring->ring_size, 0, VCE_CMD_NO_OP);
> 	if (r)
> 		goto error_vce1;
> 	ring = &rdev->ring[TN_RING_TYPE_VCE2_INDEX];
> 	r = radeon_ring_init(rdev, ring, ring->ring_size, 0, VCE_CMD_NO_OP);
> 	if (r)
> 		goto error_vce2;
> 	r = vce_v1_0_init(rdev);
> 	if (r)
> 		goto error_vce;
> 	return;
> error_vce:
> 	radeon_ring_fini(rdev, &rdev->ring[TN_RING_TYPE_VCE2_INDEX]);
> error_vce2:
> 	radeon_ring_fini(rdev, &rdev->ring[TN_RING_TYPE_VCE1_INDEX]);
> error_vce1:
> 	uvd_v1_0_fini(rdev);
> error_uvd:
> 	radeon_ring_fini(rdev, &rdev->ring[R600_RING_TYPE_UVD_INDEX]);
> error:
> 	dev_err(rdev->dev, "UVD/VCE resume error (%d).\n", r);
> 	/* On error just disable everything. */
> 	radeon_uvd_suspend(rdev);
> 	radeon_vce_suspend(rdev);
> 	radeon_uvd_fini(rdev);
> 	radeon_vce_fini(rdev);
> 	rdev->ring[R600_RING_TYPE_UVD_INDEX].ring_size = 0;
> 	rdev->ring[TN_RING_TYPE_VCE1_INDEX].ring_size = 0;
> 	rdev->ring[TN_RING_TYPE_VCE2_INDEX].ring_size = 0;
>
> Again lot simpler to follow control flow than to jump through various level
> of if/else. Again uvd and vce tied together (and again i can untie them if
> you think it is better to untie them).
>
> But this time the extra thing is that i properly disable ring if any error
> happens while existing code does not.

And again that is exactly what we should NOT do.

When initialization fails we don't know in which state the ring buffer 
and micro engines are, so freeing them and giving the space back to be 
reused in clearly not a good idea.

All we should do is clearing the ready flag when something fails to 
prevent userspace from making command submissions to the failed engine.

>
>
> I am not pasting the init path but it the same logic, tying uvd and vce
> together and simplifying error code path.
>
>
>
>> Please also note that VCE/UVD has dependencies on power management, so that
>> when they are once initialized they should NOT be turned off again.
>>
>> I only briefly skimmed over your patch, but it actually looks like to me
>> that you broken that by trying to cleanup the initialization routine.
> I have seen that but assuming Heisenbergs does not get involve, then given
> that the block is not responding to register write it is unlikely that thing
> will we worse if we try to disable the block. And from my testing it does
> not impact power management. My guess is that the block keep reporting it is
> busy and that power gating and clock gating are inhibited by that.

It's more complicated than that just a simple busy signal. The engines 
actively communicate with the power management controller to tell them 
their needs and limits for the clocks based on the workload they have.

Once initialized the power management controller expects the UVD and VCE 
micro-controllers to answer such requests.

Failing to do so can get you stuck at a specific power level.

>
> The other thing i am doing over existing code is freeing memory for the fw
> buffer. I do not think it is a big deal. I am doing that because then i just
> flag the uvd has dead (rdev->has_uvd = 0) and avoid to try to restore it
> for next suspend/resume cycle or hibernation cycle.
>
> So again the only thing i am change is the case where thing does not work.
> With that patch i can actualy hibernate laptop and get back a working desktop
> module video decoding/encoding no longer working. I call that an improvement.

It's nice that it works for you now, but my laptop is working fine with 
UVD and VCE as well and I would like to keep it that way.

As far as I can see you're actually messing the error handling up quite 
a bit here instead of improving it.

So please describe in detail what the problems you are seeing and why 
disabling both UVD and VCE helps with them.

A kernel log from a failed suspend/resume cycle would help quite a bit here.

Regards,
Christian.

>
> Jérôme



More information about the dri-devel mailing list