[PATCH 4/8] drm/fb-helper: Add fb_deferred_io support

Fri Apr 22 17:28:17 UTC 2016

Den 22.04.2016 19:05, skrev Daniel Vetter:
> On Fri, Apr 22, 2016 at 04:17:14PM +0200, Noralf Trønnes wrote:
>> Den 22.04.2016 10:27, skrev Daniel Vetter:
>>> On Thu, Apr 21, 2016 at 08:54:45PM +0200, Noralf Trønnes wrote:
>>>> Den 20.04.2016 17:25, skrev Noralf Trønnes:
>>>>> This adds deferred io support if CONFIG_FB_DEFERRED_IO is enabled.
>>>>> Accumulated fbdev framebuffer changes are signaled using the callback
>>>>> (struct drm_framebuffer_funcs *)->dirty()
>>>>>
>>>>> The drm_fb_helper_sys_*() functions will accumulate changes and
>>>>> schedule fb_info.deferred_work _if_ fb_info.fbdefio is set.
>>>>> This worker is used by the deferred io mmap code to signal that it
>>>>> has been collecting page faults. The page faults and/or other changes
>>>>> are then merged into a drm_clip_rect and passed to the framebuffer
>>>>> dirty() function.
>>>>>
>>>>> The driver is responsible for setting up the fb_info.fbdefio structure
>>>>> and calling fb_deferred_io_init() using the provided callback:
>>>>> (struct fb_deferred_io).deferred_io = drm_fb_helper_deferred_io;
>>>>>
>>>>> Signed-off-by: Noralf Trønnes <noralf at tronnes.org>
>>>>> ---
>>>>>   drivers/gpu/drm/drm_fb_helper.c | 119 +++++++++++++++++++++++++++++++++++++++-
>>>>>   include/drm/drm_fb_helper.h     |  15 +++++
>>>>>   2 files changed, 133 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
>>>> [...]
>>>>
>>>>> +#ifdef CONFIG_FB_DEFERRED_IO
>>>>> +/**
>>>>> + * drm_fb_helper_deferred_io() - (struct fb_deferred_io *)->deferred_io callback
>>>>> + *                               function
>>>>> + *
>>>>> + * This function always runs in process context (struct delayed_work) and calls
>>>>> + * the (struct drm_framebuffer_funcs)->dirty function with the collected
>>>>> + * damage. There's no need to worry about the possibility that the fb_sys_*()
>>>>> + * functions could be running in atomic context. They only schedule the
>>>>> + * delayed worker which then calls this deferred_io callback.
>>>>> + */
>>>>> +void drm_fb_helper_deferred_io(struct fb_info *info,
>>>>> +			       struct list_head *pagelist)
>>>>> +{
>>>>> +	struct drm_fb_helper *helper = info->par;
>>>>> +	unsigned long start, end, min, max;
>>>>> +	struct drm_clip_rect clip;
>>>>> +	unsigned long flags;
>>>>> +	struct page *page;
>>>>> +
>>>>> +	if (!helper->fb->funcs->dirty)
>>>>> +		return;
>>>>> +
>>>>> +	spin_lock_irqsave(&helper->dirty_lock, flags);
>>>>> +	clip = helper->dirty_clip;
>>>>> +	drm_clip_rect_reset(&helper->dirty_clip);
>>>>> +	spin_unlock_irqrestore(&helper->dirty_lock, flags);
>>>>> +
>>>>> +	min = ULONG_MAX;
>>>>> +	max = 0;
>>>>> +	list_for_each_entry(page, pagelist, lru) {
>>>>> +		start = page->index << PAGE_SHIFT;
>>>>> +		end = start + PAGE_SIZE - 1;
>>>>> +		min = min(min, start);
>>>>> +		max = max(max, end);
>>>>> +	}
>>>>> +
>>>>> +	if (min < max) {
>>>>> +		struct drm_clip_rect mmap_clip;
>>>>> +
>>>>> +		mmap_clip.x1 = 0;
>>>>> +		mmap_clip.x2 = info->var.xres;
>>>>> +		mmap_clip.y1 = min / info->fix.line_length;
>>>>> +		mmap_clip.y2 = min_t(u32, max / info->fix.line_length,
>>>>> +				     info->var.yres);
>>>>> +		drm_clip_rect_merge(&clip, &mmap_clip, 1, 0, 0, 0);
>>>>> +	}
>>>>> +
>>>>> +	if (!drm_clip_rect_is_empty(&clip))
>>>>> +		helper->fb->funcs->dirty(helper->fb, NULL, 0, 0, &clip, 1);
>>>>> +}
>>>>> +EXPORT_SYMBOL(drm_fb_helper_deferred_io);
>>>> There is one thing I have wondered about when it comes to deferred io and
>>>> long run times for the defio worker with my displays:
>>>>
>>>> Userspace writes to some pages then the deferred io worker kicks off and
>>>> runs for 100ms holding the page list mutex. While this is happening,
>>>> userspace writes to a new page triggering a page_mkwrite. Now this
>>>> function has to wait for the mutex to be released.
>>>>
>>>> Who is actually waiting here and is there a problem that this can last
>>>> for 100ms?
>>> No idea at all - I haven't looked that closely at  fbdev defio. But one
>>> reason we have an explicit ioctl in drm to flush out frontbuffer rendering
>>> is exactly that flushing could take some time, and should only be done
>>> once userspace has completed some rendering. Not right in the middle of an
>>> op.
>>>
>>> I guess fix up your userspace to use dumb drm fb + drm dirtyfb ioctl?
>>> Otherwise you'll get to improve fbdev defio, and fbdev is deprecated
>>> subsystem and a real horror show. I highly recommend against touching it
>>> ;-)
>> I have tried to track the call chain and it seems to be part of the
>> page fault handler. Which means it's userspace wanting to write to the
>> page that has to wait. And it has to wait at some random point in
>> whatever rendering it's doing.
>>
>> Unless someone has any objections, I will make a change and add a worker
>> like qxl does. This decouples the deferred_io worker holding the mutex
>> from the framebuffer flushing job. However I intend to differ from qxl in
>> that I will use a delayed worker (run immediately from the mmap side which
>> has already been deferred). Since I don't see any point in flushing the
>> framebuffer immediately when fbcon has put out only one glyph, might as
>> well defer it a couple of jiffies to be able to capture some more glyphs.
>>
>> Adding a worker also means that udl doesn't have to initialize deferred_io
>> because we won't be using the deferred_work worker for flushing fb_*().
> I'm confused ... I thought we already have enough workers? One in the
> fbdev deferred_io implementation used by default. The other in case we get
> a draw call from an atomic context.

This patch extend the use of the fbdev deferred_io worker to also handle
the fbdev drawing operations, which can happen in atomic context.
The qxl driver adds an extra worker (struct qxl_device).fb_work which is
used to flush the framebuffer. Both the mmap deferred_io (qxl_deferred_io())
code which is run by the deferred io worker and the fbdev drawing operations
(qxl_fb_fillrect() etc.) schedule this fb_work worker.

So this patch uses 1 worker, qxl uses 2 workers.

It's no big deal for me, fbtft has used 1 worker since 2013 without anyone
pointing out that this has been a problem. And and extra worker can be
added later without changing the drivers.
But since qxl used an extra worker I thought maybe there's a reason for
that and it would remove my worry about those page faults being held up.

Noralf.

>
> Why do we need even more workers? Have you measured that you actually hit
> this delay, or just conjecture from reading the code? Because my reading
> says that the defio mmap support in fbdev already does what you want, and
> should sufficiently coalesce mmap access. There's a delayed work/timer in
> there to make sure it doesn't flush on the very first faulted page.
> -Daniel
>
>> And yes, using drm from userspace is "The solution" here :-), however
>> I want to make the best out of fbdev since some of the tinydrm users
>> coming from drivers/staging/fbtft will probably continue with fbdev.
>>
>>
>> Noralf.
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel