[PATCH] drm/amdgpu: fix UBSAN: Undefined behaviour for amdgpu_fence.c

Alex Deucher alexdeucher at gmail.com
Tue Jun 26 17:46:09 UTC 2018


On Tue, Jun 26, 2018 at 8:29 AM, Leo Liu <leo.liu at amd.com> wrote:
>
>
> On 06/25/2018 04:42 PM, Alex Deucher wrote:
>>
>> On Mon, Jun 25, 2018 at 4:37 PM, James Zhu <jamesz at amd.com> wrote:
>>>
>>>
>>> On 2018-06-25 04:32 PM, Alex Deucher wrote:
>>>>
>>>> On Mon, Jun 25, 2018 at 4:26 PM, James Zhu <jamesz at amd.com> wrote:
>>>>>
>>>>>
>>>>> On 2018-06-25 04:02 PM, Alex Deucher wrote:
>>>>>>
>>>>>> On Mon, Jun 25, 2018 at 3:17 PM, Leo Liu <leo.liu at amd.com> wrote:
>>>>>>>
>>>>>>> [    3.866656] index 2 is out of range for type 'amdgpu_uvd_inst [2]'
>>>>>>> [    3.866667] CPU: 0 PID: 59 Comm: kworker/0:1 Not tainted
>>>>>>> 4.16.0-rc7+
>>>>>>> #3
>>>>>>> [    3.866677] Hardware name: Gigabyte Technology Co., Ltd.
>>>>>>> GA-990FXA-UD7/GA-990FXA-UD7, BIOS F9 06/08/2012
>>>>>>> [    3.866693] Workqueue: events work_for_cpu_fn
>>>>>>> [    3.866702] Call Trace:
>>>>>>> [    3.866710]  dump_stack+0x85/0xc5
>>>>>>> [    3.866719]  ubsan_epilogue+0x9/0x40
>>>>>>> [    3.866727]  __ubsan_handle_out_of_bounds+0x89/0x90
>>>>>>> [    3.866737]  ? rcu_read_lock_sched_held+0x58/0x60
>>>>>>> [    3.866746]  ? __kmalloc+0x26c/0x2d0
>>>>>>> [    3.866846]  amdgpu_fence_driver_start_ring+0x259/0x280 [amdgpu]
>>>>>>> [    3.866896]  amdgpu_ring_init+0x12c/0x710 [amdgpu]
>>>>>>> [    3.866906]  ? sprintf+0x42/0x50
>>>>>>> [    3.866956]  amdgpu_gfx_kiq_init_ring+0x1bc/0x3a0 [amdgpu]
>>>>>>> [    3.867009]  gfx_v8_0_sw_init+0x1ad3/0x2360 [amdgpu]
>>>>>>> [    3.867062]  ? smu7_init+0xec/0x160 [amdgpu]
>>>>>>> [    3.867109]  amdgpu_device_init+0x112c/0x1dc0 [amdgpu]
>>>>>>>
>>>>>>> ring->me might be set as 2 with amdgpu_gfx_kiq_init_ring.
>>>>>>>
>>>>>>> Signed-off-by: Leo Liu <leo.liu at amd.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 11 ++++++++++-
>>>>>>>     1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> index 39ec6b8890a1..6067dae1552f 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> @@ -375,8 +375,17 @@ int amdgpu_fence_driver_start_ring(struct
>>>>>>> amdgpu_ring *ring,
>>>>>>>     {
>>>>>>>            struct amdgpu_device *adev = ring->adev;
>>>>>>>            uint64_t index;
>>>>>>> +       bool uvd_ring = false;
>>>>>>> +       int i;
>>>>>>> +
>>>>>>> +       for (i = 0; i < AMDGPU_MAX_UVD_INSTANCES; ++i) {
>>>>>
>>>>>
>>>>>>> +               if (ring == &adev->uvd.inst[i].ring) {
>>>>>>> +                       uvd_ring = true;
>>>>>>> +                       break;
>>>>>>> +               }
>>>>>>> +       }
>>>>>>>
>>>>>>> -       if (ring != &adev->uvd.inst[ring->me].ring) {
>>>>>>> +       if (!uvd_ring) {
>>>>>>
>>>>>> I think we can just simplify this to:
>>>>>> if (ring->funcs->type != AMDGPU_RING_TYPE_UVD)
>>>>>>
>>>>>> Alex
>>>>>
>>>>> Need add UVD_ENC also:
>>>>>         if ( ring->funcs->type != AMDGPU_RING_TYPE_UVD &&
>>>>>               ring->funcs->type != AMDGPU_RING_TYPE_UVD_ENC ) {
>>>>>
>>>> The old code never checked against the enc rings.  Are you sure they
>>>> have the same limitation?
>>>
>>> I am wrong. thanks! James
>>
>> Actually does the uvd decode ring still have this limitation or is
>> this just some leftover from a hw limitation on an older asic?
>
> The code here inherited from old ASICs, and not be revisited since there is
> no problem in functional wise.
>
>
>>   If
>> it's still required, the current code in
>> amdgpu_fence_driver_start_ring() won't work for multiple instances.
>> The fences for the rings will overwrite each other.
>
> IIRC Vega20 with 2 instances have separated runtime environment, that should
> have fence separated as well.

Ok.  For some reason I was thinking the instances shared the same copy
of the firmware.  Anyway, the patch as is is:
Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
We can optimize for newer asics as a later improvement.

Alex


More information about the amd-gfx mailing list