[PATCH v3] drm/fbdev-generic: prohibit potential out-of-bounds access

Wed Apr 19 18:30:17 UTC 2023

Hi,

On 2023/4/19 23:46, Thomas Zimmermann wrote:
> Hi
>
> Am 19.04.23 um 17:09 schrieb Daniel Vetter:
>> On Tue, 18 Apr 2023 at 20:16, Sui Jingfeng <15330273260 at 189.cn> wrote:
>>>
>>> Hi,
>>>
>>> On 2023/4/19 01:52, Sui Jingfeng wrote:
>>>> Hi,
>>>>
>>>> On 2023/4/18 16:32, Daniel Vetter wrote:
>>>>> On Mon, Apr 17, 2023 at 07:32:19PM +0800, Sui Jingfeng wrote:
>>>>>> The fbdev test of IGT may write after EOF, which lead to 
>>>>>> out-of-bound
>>>>>> access for the drm drivers using fbdev-generic. For example, on a 
>>>>>> x86
>>>>>> + aspeed bmc card platform, with a 1680x1050 resolution display,
>>>>>> running
>>>>>> fbdev test if IGT will cause the linux kernel hang with the 
>>>>>> following
>>>>>> call trace:
>>>>>>
>>>>>>     Oops: 0000 [#1] PREEMPT SMP PTI
>>>>>>     [IGT] fbdev: starting subtest eof
>>>>>>     Workqueue: events drm_fb_helper_damage_work [drm_kms_helper]
>>>>>>     [IGT] fbdev: starting subtest nullptr
>>>>>>
>>>>>>     RIP: 0010:memcpy_erms+0xa/0x20
>>>>>>     RSP: 0018:ffffa17d40167d98 EFLAGS: 00010246
>>>>>>     RAX: ffffa17d4eb7fa80 RBX: ffffa17d40e0aa80 RCX: 
>>>>>> 00000000000014c0
>>>>>>     RDX: 0000000000001a40 RSI: ffffa17d40e0b000 RDI: 
>>>>>> ffffa17d4eb80000
>>>>>>     RBP: ffffa17d40167e20 R08: 0000000000000000 R09: 
>>>>>> ffff89522ecff8c0
>>>>>>     R10: ffffa17d4e4c5000 R11: 0000000000000000 R12: 
>>>>>> ffffa17d4eb7fa80
>>>>>>     R13: 0000000000001a40 R14: 000000000000041a R15: 
>>>>>> ffffa17d40167e30
>>>>>>     FS:  0000000000000000(0000) GS:ffff895257380000(0000)
>>>>>> knlGS:0000000000000000
>>>>>>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>     CR2: ffffa17d40e0b000 CR3: 00000001eaeca006 CR4: 
>>>>>> 00000000001706e0
>>>>>>     Call Trace:
>>>>>>      <TASK>
>>>>>>      ? drm_fbdev_generic_helper_fb_dirty+0x207/0x330 
>>>>>> [drm_kms_helper]
>>>>>>      drm_fb_helper_damage_work+0x8f/0x170 [drm_kms_helper]
>>>>>>      process_one_work+0x21f/0x430
>>>>>>      worker_thread+0x4e/0x3c0
>>>>>>      ? __pfx_worker_thread+0x10/0x10
>>>>>>      kthread+0xf4/0x120
>>>>>>      ? __pfx_kthread+0x10/0x10
>>>>>>      ret_from_fork+0x2c/0x50
>>>>>>      </TASK>
>>>>>>     CR2: ffffa17d40e0b000
>>>>>>     ---[ end trace 0000000000000000 ]---
>>>>>>
>>>>>> The direct reason is that damage rectange computed by
>>>>>> drm_fb_helper_memory_range_to_clip() does not guaranteed to be
>>>>>> in-bound.
>>>>>> It is already results in workaround code populate to elsewhere. 
>>>>>> Another
>>>>>> reason is that exposing a larger buffer size than the actual needed
>>>>>> help
>>>>>> to trigger this bug intrinsic in 
>>>>>> drm_fb_helper_memory_range_to_clip().
>>>>>>
>>>>>> Others fbdev emulation solutions write to the GEM buffer 
>>>>>> directly, they
>>>>>> won't reproduce this bug because the .fb_dirty function callback 
>>>>>> do not
>>>>>> being hooked, so no chance is given to
>>>>>> drm_fb_helper_memory_range_to_clip()
>>>>>> to generate a out-of-bound when drm_fb_helper_sys_write() is called.
>>>>>>
>>>>>> This patch break the trigger condition of this bug by shrinking the
>>>>>> shadow
>>>>>> buffer size to sizes->surface_height * buffer->fb->pitches[0].
>>>>>>
>>>>>> Fixes: '8fbc9af55de0 ("drm/fbdev-generic: Set screen size to size of
>>>>>> GEM
>>>>>> buffer")'
>>>>>>
>>>>>> Signed-off-by: Sui Jingfeng <suijingfeng at loongson.cn>
>>>>>> ---
>>>>>>    drivers/gpu/drm/drm_fbdev_generic.c | 2 +-
>>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_fbdev_generic.c
>>>>>> b/drivers/gpu/drm/drm_fbdev_generic.c
>>>>>> index 8e5148bf40bb..b057cfbba938 100644
>>>>>> --- a/drivers/gpu/drm/drm_fbdev_generic.c
>>>>>> +++ b/drivers/gpu/drm/drm_fbdev_generic.c
>>>>>> @@ -94,7 +94,7 @@ static int
>>>>>> drm_fbdev_generic_helper_fb_probe(struct drm_fb_helper *fb_helper,
>>>>>>        fb_helper->buffer = buffer;
>>>>>>        fb_helper->fb = buffer->fb;
>>>>>>    -    screen_size = buffer->gem->size;
>>>>>> +    screen_size = sizes->surface_height * buffer->fb->pitches[0];
>>>>> So I read core some more and stumbled over 
>>>>> drm_fb_helper_deferred_io().
>>>>> Which has all the code and comments about this, including limiting.
>>>>>
>>>>> I think it would be clearer if we fix the issue there, instead of
>>>>> passing
>>>>> limits around in obscure places that then again get broken?
>>>>
>>>> No, it is more obscure doing that way...
>>>>
>>>>
>>>> As the size of the shadow screen buffer will be exposed to userspace.
>>>>
>>>> The size 'helper->fb->height * helper->fb->pitches[0]' is a
>>>> exactly(best) fit,
>>>>
>>>> You are guaranteed to waste at lease one byte by increasing one byte,
>>>>
>>>> and can not store all pixels by decreasing one byte (In the case where
>>>> `helper->fb->pitches[0] = helper->fb->width * 4`).
>>>>
>>>> It implicitly tell the userspace do not go beyond that boundary.
>>>>
>>>> although userspace program can still choose to write  after EOF,
>>>>
>>>> But it is for test purpose, to test the kernel if it can return a
>>>> -EFBIG or not.
>>>>
>>>>> The thing is,
>>>>> Thomas both authored the limit checks in 
>>>>> drm_fb_helper_deferred_io() and
>>>>> the patch which broken them again, so clearly this isn't very
>>>>> obvious. I'm
>>>>> thinking of something like this:
>>>>>
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_fb_helper.c
>>>>> b/drivers/gpu/drm/drm_fb_helper.c
>>>>> index ef4eb8b12766..726dab67c359 100644
>>>>> --- a/drivers/gpu/drm/drm_fb_helper.c
>>>>> +++ b/drivers/gpu/drm/drm_fb_helper.c
>>>>> @@ -697,10 +697,7 @@ void drm_fb_helper_deferred_io(struct fb_info
>>>>> *info, struct list_head *pagerefli
>>>>>         * of the screen and account for non-existing scanlines. 
>>>>> Hence,
>>>>>         * keep the covered memory area within the screen buffer.
>>>>>         */
>>>>> -    if (info->screen_size)
>>>>> -        total_size = info->screen_size;
>>>>> -    else
>>>>> -        total_size = info->fix.smem_len;
>>>>> +    total_size = helper->fb->height * helper->fb->pitches[0];
>>>>
>>>> This is just to mitigate the mistakes already has been made,
>>>>
>>>> because it  do not do a good splitting between the *clip* part and the
>>>> *damage update* part.
>>>>
>>>> An ideal clipping do not obscure its updating backend with a
>>>> out-of-bound damage rectangle.
>>>>
>>>> Why did the drm_fb_helper_memory_range_to_clip() can not do a good job
>>>> in all case
>>>>
>>>> to pass its backend a always meaningful damage rect ?
>>>>
>>>>>        max_off = min(max_off, total_size);
>>>>>          if (min_off < max_off) {
>>>>>
>>>>>
>>>>> I think that would make it utmost clear on what we're doing and why.
>>>>> Otherwise we're just going to re-create the same bug again, like 
>>>>> we've
>>>>> done already :-)
>>>>
>>>> No, we create no bugs, we fix one.
>>>>
>>>> Thanks.
>>>>
>>> But honestly I do not have strong feel toward this, I just type what 
>>> I'm
>>> understand without seeing you resend a V3.
>>>
>>> It's OK in overall,  I will help to test this tomorrow.  :-)
>>
>> Apologies for making you jump around all the time and doing different
>> versions of the same bugfix :-/
>>
>> I think this one here is ok to merge, I just thought when looking at
>> the history that we revert the exact patch without any other changes
>> or comments, and usually that means someone will come up with the same
>> cleanup idea again, and then we'll have a bug again. So maybe a
>> comment or a WARN_ON or something else would be good.
>>
>> I guess we could also do your patch, but put a WARN_ON that the
>> computed total_size is never bigger than the drm_fb size into
>> drm_fb_helper_deferred_io()? That would also make sure that this bug
>> doesn't get resurrected again.
>
> We'd have to put this test into drm_fbdev_generic.c. Otherwise we'll 
> break i915, which also uses deferred I/O, but without shadow 
> buffering.. Maybe test in drm_fbdev_generic_helper_fb_dirty() if the 
> clip rectangle extends the framebuffer size.
>
Yeah, i915 carve out part of system ram as video memory,  it is also 
called stolen memory.

I just learned it recently from i915 related document.

But from what I'm understanding, It's still RAM in its nature, just 
reserved by firmware.

Its bandwidth is extremely high, why not write to the GEM buffer directly?

why deferred I/O pay off?

> Best regards
> Thomas
>
>> -Daniel
>