[Intel-gfx] [PATCH 03/15] dma-buf & drm/amdgpu: remove dma_resv workaround

Wed Apr 20 18:56:38 UTC 2022

Am 20.04.22 um 20:49 schrieb Christian König:
> Am 20.04.22 um 20:41 schrieb Zack Rusin:
>> On Wed, 2022-04-20 at 19:40 +0200, Christian König wrote:
>>> Am 20.04.22 um 19:38 schrieb Zack Rusin:
>>>> On Wed, 2022-04-20 at 09:37 +0200, Christian König wrote:
>>>>> ⚠ External Email
>>>>>
>>>>> Hi Zack,
>>>>>
>>>>> Am 20.04.22 um 05:56 schrieb Zack Rusin:
>>>>>> On Thu, 2022-04-07 at 10:59 +0200, Christian König wrote:
>>>>>>> Rework the internals of the dma_resv object to allow adding
>>>>>>> more
>>>>>>> than
>>>>>>> one
>>>>>>> write fence and remember for each fence what purpose it had.
>>>>>>>
>>>>>>> This allows removing the workaround from amdgpu which used a
>>>>>>> container
>>>>>>> for
>>>>>>> this instead.
>>>>>>>
>>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>>> Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
>>>>>>> Cc: amd-gfx at lists.freedesktop.org
>>>>>> afaict this change broke vmwgfx which now kernel oops right
>>>>>> after
>>>>>> boot.
>>>>>> I haven't had the time to look into it yet, so I'm not sure
>>>>>> what's
>>>>>> the
>>>>>> problem. I'll look at this tomorrow, but just in case you have
>>>>>> some
>>>>>> clues, the backtrace follows:
>>>>> that's a known issue and should already be fixed with:
>>>>>
>>>>> commit d72dcbe9fce505228dae43bef9da8f2b707d1b3d
>>>>> Author: Christian König <christian.koenig at amd.com>
>>>>> Date:   Mon Apr 11 15:21:59 2022 +0200
>>>> Unfortunately that doesn't seem to be it. The backtrace is from the
>>>> current (as of the time of sending of this email) drm-misc-next,
>>>> which
>>>> has this change, so it's something else.
>>> Ok, that's strange. In this case I need to investigate further.
>>>
>>> Maybe VMWGFX is adding more than one fence and we actually need to
>>> reserve multiple slots.
>> This might be helper code issue with CONFIG_DEBUG_MUTEXES set. On 
>> that config
>> dma_resv_reset_max_fences does:
>>     fences->max_fences = fences->num_fences;
>> For some objects num_fences is 0 and so after max_fences and 
>> num_fences are both 0.
>> And then BUG_ON(num_fences >= max_fences) is triggered.
>
> Yeah, but that's expected behavior.
>
> What's not expected is that max_fences is still 0 (or equal to old 
> num_fences) when VMWGFX tries to add a new fence. The function 
> ttm_eu_reserve_buffers() should have reserved at least one fence slot.
>
> So the underlying problem is that either ttm_eu_reserve_buffers() was 
> never called or VMWGFX tried to add more than one fence.


To figure out what it is could you try the following code fragment:

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c
index f46891012be3..a36f89d3f36d 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.c
@@ -288,7 +288,7 @@ int vmw_validation_add_bo(struct 
vmw_validation_context *ctx,
                 val_buf->bo = ttm_bo_get_unless_zero(&vbo->base);
                 if (!val_buf->bo)
                         return -ESRCH;
-               val_buf->num_shared = 0;
+               val_buf->num_shared = 16;
                 list_add_tail(&val_buf->head, &ctx->bo_list);
                 bo_node->as_mob = as_mob;
                 bo_node->cpu_blit = cpu_blit;

Thanks,
Christian.

>
> Regards,
> Christian.
>
>>
>> z
>>
>