[RFC 1/5] drm: Add DRM support for tiny LCD displays

Tue Mar 29 07:27:08 UTC 2016

On Fri, Mar 25, 2016 at 11:41:44AM +0100, Noralf Trønnes wrote:
> 
> Den 23.03.2016 18:28, skrev Daniel Vetter:
> >On Wed, Mar 23, 2016 at 06:07:56PM +0100, Noralf Trønnes wrote:
> >>Den 18.03.2016 18:47, skrev Daniel Vetter:
> >>>On Thu, Mar 17, 2016 at 10:51:55PM +0100, Noralf Trønnes wrote:
> >>>>Den 16.03.2016 16:11, skrev Daniel Vetter:
> >>>>>On Wed, Mar 16, 2016 at 02:34:15PM +0100, Noralf Trønnes wrote:
> >>>>>>tinydrm provides a very simplified view of DRM for displays that has
> >>>>>>onboard video memory and is connected through a slow bus like SPI/I2C.
> >>>>>>
> >>>>>>Signed-off-by: Noralf Trønnes <noralf at tronnes.org>
> >>>>>Yay, it finally happens! I already made a comment on the cover letter
> >>>>>about the fbdev stuff, I think that's the biggest part to split out from
> >>>>>tinydrm here. I'm not entirely sure a detailed code review makes sense
> >>>>>before that part is done (and hey we can start merging already), so just a
> >>>>>high level review for now:
> >>[...]
> >>>
> >>>>>In the case of tinydrm I think that means we should have a bunch of new
> >>>>>drm helpers, or extensions for existing ones:
> >>>>>- fbdev deferred io support using ->dirtyfb in drm_fb_helper.c.
> >>>>Are you thinking something like this?
> >>>>
> >>>>struct drm_fb_helper_funcs {
> >>>>     int (*dirtyfb)(struct drm_fb_helper *fb_helper,
> >>>>                struct drm_clip_rect *clip);
> >>>We already have a dirty_fb function in
> >>>dev->mode_config->funcs->dirty_fb(). This is the official interface native
> >>>drm/kms userspace is supposed to use to flush frontbuffer rendering. The
> >>>xfree86-video-modesetting driver uses it.
> >>I couldn't find this dirty_fb() function, but I assume you mean
> >>drm_framebuffer_funcs.dirty().
> >Yup.
> >
> >>>>};
> >>>>
> >>>>struct drm_fb_helper {
> >>>>     spinlock_t dirty_lock;
> >>>>     struct drm_clip_rect *dirty_clip;
> >>>>};
> >>>Yeah, this part is needed for the delayed work for the fbdev helper.
> >>>struct work dirty_fb_work; is missing.
> >>This confuses me.
> >>If we have this then there's no need for a fb->funcs->dirty() call,
> >>the driver can just add a work function here instead.
> >>
> >>Possible fb dirty() call chain:
> >>Calls to drm_fb_helper_sys_* or mmap page writes will schedule
> >>fb_info->deferred_work. The worker func fb_deferred_io_work() calls
> >>fb_info->fbdefio->deferred_io().
> >>Then deferred_io() can call fb_helper->fb->funcs->dirty().
> >>
> >>In my use-case this dirty() function would schedule a delayed_work to run
> >>immediately since it has already been deferred collecting changes.
> >>The regular drm side framebuffer dirty() collects damage and schedules
> >>the same worker to run deferred.
> >>
> >>I don't see an easy way for a driver to set the dirty() function in
> >>drm_fb_cma_helper apart from doing this:
> >>
> >>  struct drm_fbdev_cma {
> >>          struct drm_fb_helper    fb_helper;
> >>          struct drm_fb_cma       *fb;
> >>+        int (*dirty)(struct drm_framebuffer *framebuffer,
> >>+                     struct drm_file *file_priv, unsigned flags,
> >>+                     unsigned color, struct drm_clip_rect *clips,
> >>+                     unsigned num_clips);
> >>  };
> >Well my point is that drm core already has a canonical interface
> >(drm_framebuffer_funcs.dirty) to flush out rendering. And it's supposed to
> >be called from process context, and userspace is supposed to batch up
> >dirty updates.
> 
> Batched up into one ioctl call?
> If that's the case, then I don't have to use delayed_work like I do now.
> I can just queue a work_struct to run immediately.
> 
> This comment in include/uapi/drm/drm_mode.h made me think that I might
> receive multiple calls:
> 
>  * The kernel or hardware is free to update more then just the
>  * region specified by the clip rects. The kernel or hardware
>  * may also delay and/or coalesce several calls to dirty into a
>  * single update.

This just means that userspace must ensure that the entire buffer is still
valid, specifically that regions outside of the cliprects are not garabge.
This way drivers can increase the area that's uploaded (e.g. upload the
entire screen or only 1 rect spanning all cliprects in case that's the
only thing the hw can do).

Userspace otoh is supposed to batch up updates into one ioctl call (and
not one per drawn glyph or something silly like that).

> I have assumed that I can't run the whole display update from the ioctl
> call since one full display update on the "worst" display takes ~200ms.
> But maybe it's fine to run all this in the ioctl call?

Hm, thus far all drivers do their updating synchronously. But then none
yet took 200ms to do that - most have hw dma for the upload itself.

Short term I'd just do the update synchronously, async dirty with this
slow display has all kinds of fun problems I guess with X rendering
getting ahead and reducing interactivity even more. Long term there's talk
about adding cliprects to atomic, and then we could do a helper to
implement dirtyfb as an async atomic update.

> >What I'd like is that the fbdev emulation uses exactly that interface,
> >without requiring drivers to write any additional fbdev code (like qxl and
> >udl currently have). Since the drm_framebuffer_funcs.dirty is already
> >expected to run in process context I think the only bit we need is the
> >deferred_work you already added in fbdev, so that we can schedule the
> >driver's ->dirty() function.
> >
> >There shouldn't be any need to have another ->dirty() function anywhere
> >else.
> 
> I'll try and see if I can explain better with some code.
> First the drm_fb_helper part:
> 
> void drm_fb_helper_deferred_io(struct fb_info *info,
>                                struct list_head *pagelist)
> {
>         struct drm_fb_helper *helper = info->par;
>         unsigned long start, end, min, max;
>         struct drm_clip_rect clip;
>         struct page *page;
> 
>         if (!helper->fb->funcs->dirty)
>                 return;
> 
>         spin_lock(&helper->dirty_lock);
>         clip = *helper->dirty_clip;
>         /* reset dirty_clip */
>         spin_unlock(&helper->dirty_lock);
> 
>         min = ULONG_MAX;
>         max = 0;
>         list_for_each_entry(page, pagelist, lru) {
>                 start = page->index << PAGE_SHIFT;
>                 end = start + PAGE_SIZE - 1;
>                 min = min(min, start);
>                 max = max(max, end);
>         }
> 
>         if (min < max) {
>                 /* TODO merge with clip */
>         }
> 
>         helper->fb->funcs->dirty(helper->fb, NULL, 0, 0, &clip, 1);
> }
> EXPORT_SYMBOL(drm_fb_helper_deferred_io);
> 
> /* Called by the drm_fb_helper_sys_*() functions */
> static void drm_fb_helper_sys_deferred(struct fb_info *info, u32 x, u32 y,
>                                        u32 width, u32 height)
> {
>         struct drm_fb_helper *helper = info->par;
>         struct drm_clip_rect clip;
> 
>         if (!info->fbdefio)
>                 return;
> 
>         clip.x1 = x;
>         clip.x2 = x + width -1;
>         cliy.y1 = y;
>         clip.y2 = y + height - 1;
> 
>         spin_lock(&helper->dirty_lock);
>         /* TODO merge clip with helper->dirty_clip */
>         spin_unlock(&helper->dirty_lock);
> 
>         schedule_delayed_work(&info->deferred_work, info->fbdefio->delay);
> }
> 
> 
> So the question I have asked is this: How can the driver set the
> dirty() function within drm_fb_cma_helper?
> 
> Having looked at the code over and over again, I have a suggestion,
> but it assumes that it's allowed to change fb->funcs.
> 
> First the necessary drm_fb_cma_helper changes:
> 
> EXPORT_SYMBOL(drm_fb_cma_destroy);
> EXPORT_SYMBOL(drm_fb_cma_create_handle);
> EXPORT_SYMBOL(drm_fbdev_cma_create);
> 
> /* This is the drm_fbdev_cma_init() code with one change */
> struct drm_fbdev_cma *drm_fbdev_cma_init_with_funcs(struct drm_device *dev,
>         unsigned int preferred_bpp, unsigned int num_crtc,
>         unsigned int max_conn_count, struct drm_framebuffer_funcs *funcs)
> {
> [...]
>         drm_fb_helper_prepare(dev, helper, funcs);
> [...]
> }
> 
> struct drm_fbdev_cma *drm_fbdev_cma_init(struct drm_device *dev,
>         unsigned int preferred_bpp, unsigned int num_crtc,
>         unsigned int max_conn_count)
> {
>         return drm_fbdev_cma_init_with_funcs(dev, preferred_bpp, num_crtc,
>                                              max_conn_count,
> &drm_fb_cma_helper_funcs);
> }
> 
> 
> Now tinydrm should be able to do this:
> 
> static int tinydrm_fbdev_dirty(struct drm_framebuffer *fb,
>                                struct drm_file *file_priv,
>                                unsigned flags, unsigned color,
>                                struct drm_clip_rect *clips,
>                                unsigned num_clips)
> {
>         struct drm_fb_helper *helper = info->par;
>         struct tinydrm_device *tdev = helper->dev->dev_private;
>         struct drm_framebuffer *fb = helper->fb;
>         struct drm_gem_cma_object *cma_obj;
> 
>         if (tdev->plane.fb != fb)
>                 return 0;
> 
>         cma_obj = drm_fb_cma_get_gem_obj(fb, 0);
>         if (!cma_obj) {
>                 dev_err_once(info->dev, "Can't get cma_obj\n");
>                 return -EINVAL;
>         }
> 
>         return tdev->dirtyfb(fb, cma_obj->vaddr, flags, color, clips,
> num_clips);
> }
> 
> static struct drm_framebuffer_funcs tinydrm_fb_cma_funcs = {
>         .destroy        = drm_fb_cma_destroy,
>         .create_handle  = drm_fb_cma_create_handle,
>         .dirty          = tinydrm_fbdev_dirty,
> };

Ah, cma midlayer strikes again. We've had a similar discussion around the
gem side when vc4 landed for similar reasons iirc - it's hard to overwrite
specific helper functions to customize the behaviour.

I guess the simplest solution would be to export
drm_fb_cma_destroy/create_handle functions, and then add a
drm_fb_cma_create_with_funcs helper which you can use like this:

static int tinydrm_fbdev_create(struct drm_fb_helper *helper,
                                struct drm_fb_helper_surface_size *sizes)
{
	return drm_fb_cma_create_with_funcs(helper, sizes, tinydrm_fb_cma_funcs);
}

Changing fb->funcs after drm_framebuffer_init is called is a no-go, since
that function also registers the framebuffer and makes it userspace
accessible. So after that point other threads can go&look at fb->funcs.

> 
> static int tinydrm_fbdev_create(struct drm_fb_helper *helper,
>                                 struct drm_fb_helper_surface_size *sizes)
> {
>         struct tinydrm_device *tdev = helper->dev->dev_private;
>         struct fb_deferred_io *fbdefio;
>         struct fb_ops *fbops;
>         struct fb_info *info;
> 
>         /*
>          * A per device structure is needed for:
>          * - fbops because fb_deferred_io_cleanup() clears fbops.fb_mmap
>          * - fbdefio to get per device delay
>          */
>         fbops = devm_kzalloc(helper->dev->dev, sizeof(*fbops), GFP_KERNEL);
>         fbdefio = devm_kzalloc(helper->dev->dev, sizeof(*fbdefio),
> GFP_KERNEL);
>         if (!fbops || !fbdefio)
>                 return -ENOMEM;
> 
>         ret = drm_fbdev_cma_create(helper, sizes);
>         if (ret)
>                 return ret;
> 
>         helper->fb->funcs = &tinydrm_fb_cma_funcs;

Ok, first thing I've forgotten is that the fbdev cma helper doesn't call
mode_config->funcs->fb_create and so gets the wrong fb->funcs. One thing
I've pondered for different reasons lately is whether the fbdev emulation
should grow a fake drm_file fpriv. With that we could just allocate a dumb
buffer and pass it to ->fb_create (it takes a handle, which means we need
a fake fpriv). I think that would fully demidlayer cma helpers. Or we
create a new helper function to do ->fb_create but with gem objects
instead of ids like what we've done for vc4. Or we just open-code it all.

Not sure which is better here.

> 
>         info = fb_helper->fbdev;
>         /*
>          * fb_deferred_io requires a vmalloc address or a physical address.
>          * drm_fbdev_cma_create() sets smem_start to the dma address which
> is
>          * a device address. This isn't always a physical address.
>          */
>         info->smem_start = page_to_phys(virt_to_page(info->screen_buffer));
> 
>         *fbops = *info->fbops;
>         info->fbops = fbops;
> 
>         info->fbdefio = fbdefio;
>         fbdefio->deferred_io = drm_fb_helper_deferred_io;
>         fbdefio->delay = msecs_to_jiffies(tdev->deferred->defer_ms);
>         /* delay=0 is turned into delay=HZ, so use 1 as a minimum */
>         if (!fbdefio->delay)
>                 fbdefio->delay = 1;
>         fb_deferred_io_init(info);
> 
>         return 0;
> }
> 
> static const struct drm_fb_helper_funcs tinydrm_fb_helper_funcs = {
>         .fb_probe = tinydrm_fbdev_create,
> };
> 
> int tinydrm_fbdev_init(struct tinydrm_device *tdev)
> {
>         struct drm_device *dev = tdev->base;
>         struct drm_fbdev_cma *cma;
> 
>         cma = drm_fbdev_cma_init_with_funcs(dev, 16,
> dev->mode_config.num_crtc,
> dev->mode_config.num_connector,
> &tinydrm_fb_helper_funcs);
>         if (IS_ERR(cma))
>                 return PTR_ERR(cma);

I was wondering whether we couldn't avoid the _with_funcs here by setting
up fbdefio iff ->dirty is set. Then you could add the generic defio setup
code to drm_fbdev_cma_create after helper->fb is allocated, but only if
helper->fb->funcs->dirty is set. Makes for a bit less boilerplate.

Or did I miss something?
-Daniel

> 
>         tdev->fbdev_cma = cma;
> 
>         return 0;
> }
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch