[PATCH] drm/amdgpu: fix UBSAN: Undefined behaviour for amdgpu_fence.c
Alex Deucher
alexdeucher at gmail.com
Tue Jun 26 17:46:09 UTC 2018
On Tue, Jun 26, 2018 at 8:29 AM, Leo Liu <leo.liu at amd.com> wrote:
>
>
> On 06/25/2018 04:42 PM, Alex Deucher wrote:
>>
>> On Mon, Jun 25, 2018 at 4:37 PM, James Zhu <jamesz at amd.com> wrote:
>>>
>>>
>>> On 2018-06-25 04:32 PM, Alex Deucher wrote:
>>>>
>>>> On Mon, Jun 25, 2018 at 4:26 PM, James Zhu <jamesz at amd.com> wrote:
>>>>>
>>>>>
>>>>> On 2018-06-25 04:02 PM, Alex Deucher wrote:
>>>>>>
>>>>>> On Mon, Jun 25, 2018 at 3:17 PM, Leo Liu <leo.liu at amd.com> wrote:
>>>>>>>
>>>>>>> [ 3.866656] index 2 is out of range for type 'amdgpu_uvd_inst [2]'
>>>>>>> [ 3.866667] CPU: 0 PID: 59 Comm: kworker/0:1 Not tainted
>>>>>>> 4.16.0-rc7+
>>>>>>> #3
>>>>>>> [ 3.866677] Hardware name: Gigabyte Technology Co., Ltd.
>>>>>>> GA-990FXA-UD7/GA-990FXA-UD7, BIOS F9 06/08/2012
>>>>>>> [ 3.866693] Workqueue: events work_for_cpu_fn
>>>>>>> [ 3.866702] Call Trace:
>>>>>>> [ 3.866710] dump_stack+0x85/0xc5
>>>>>>> [ 3.866719] ubsan_epilogue+0x9/0x40
>>>>>>> [ 3.866727] __ubsan_handle_out_of_bounds+0x89/0x90
>>>>>>> [ 3.866737] ? rcu_read_lock_sched_held+0x58/0x60
>>>>>>> [ 3.866746] ? __kmalloc+0x26c/0x2d0
>>>>>>> [ 3.866846] amdgpu_fence_driver_start_ring+0x259/0x280 [amdgpu]
>>>>>>> [ 3.866896] amdgpu_ring_init+0x12c/0x710 [amdgpu]
>>>>>>> [ 3.866906] ? sprintf+0x42/0x50
>>>>>>> [ 3.866956] amdgpu_gfx_kiq_init_ring+0x1bc/0x3a0 [amdgpu]
>>>>>>> [ 3.867009] gfx_v8_0_sw_init+0x1ad3/0x2360 [amdgpu]
>>>>>>> [ 3.867062] ? smu7_init+0xec/0x160 [amdgpu]
>>>>>>> [ 3.867109] amdgpu_device_init+0x112c/0x1dc0 [amdgpu]
>>>>>>>
>>>>>>> ring->me might be set as 2 with amdgpu_gfx_kiq_init_ring.
>>>>>>>
>>>>>>> Signed-off-by: Leo Liu <leo.liu at amd.com>
>>>>>>> ---
>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 11 ++++++++++-
>>>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> index 39ec6b8890a1..6067dae1552f 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> @@ -375,8 +375,17 @@ int amdgpu_fence_driver_start_ring(struct
>>>>>>> amdgpu_ring *ring,
>>>>>>> {
>>>>>>> struct amdgpu_device *adev = ring->adev;
>>>>>>> uint64_t index;
>>>>>>> + bool uvd_ring = false;
>>>>>>> + int i;
>>>>>>> +
>>>>>>> + for (i = 0; i < AMDGPU_MAX_UVD_INSTANCES; ++i) {
>>>>>
>>>>>
>>>>>>> + if (ring == &adev->uvd.inst[i].ring) {
>>>>>>> + uvd_ring = true;
>>>>>>> + break;
>>>>>>> + }
>>>>>>> + }
>>>>>>>
>>>>>>> - if (ring != &adev->uvd.inst[ring->me].ring) {
>>>>>>> + if (!uvd_ring) {
>>>>>>
>>>>>> I think we can just simplify this to:
>>>>>> if (ring->funcs->type != AMDGPU_RING_TYPE_UVD)
>>>>>>
>>>>>> Alex
>>>>>
>>>>> Need add UVD_ENC also:
>>>>> if ( ring->funcs->type != AMDGPU_RING_TYPE_UVD &&
>>>>> ring->funcs->type != AMDGPU_RING_TYPE_UVD_ENC ) {
>>>>>
>>>> The old code never checked against the enc rings. Are you sure they
>>>> have the same limitation?
>>>
>>> I am wrong. thanks! James
>>
>> Actually does the uvd decode ring still have this limitation or is
>> this just some leftover from a hw limitation on an older asic?
>
> The code here inherited from old ASICs, and not be revisited since there is
> no problem in functional wise.
>
>
>> If
>> it's still required, the current code in
>> amdgpu_fence_driver_start_ring() won't work for multiple instances.
>> The fences for the rings will overwrite each other.
>
> IIRC Vega20 with 2 instances have separated runtime environment, that should
> have fence separated as well.
Ok. For some reason I was thinking the instances shared the same copy
of the firmware. Anyway, the patch as is is:
Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
We can optimize for newer asics as a later improvement.
Alex
More information about the amd-gfx
mailing list