[RFC 0/1] drm/pl111: Initial drm/kms driver for pl111

Fri Aug 9 09:15:46 PDT 2013

> > Turning to DRM/KMS, it seems the supported formats of a plane can be
> > queried using drm_mode_get_plane. However, there doesn't seem to be a
> > way to query the supported formats of a crtc? If display HW only
> > supports scanning out from a single buffer (like pl111 does), I think
> > it won't have any planes and a fb can only be set on the crtc. In
> > which case, how should user-space query which pixel formats that crtc
> > supports?
> 
> it is exposed for drm plane's.  What is missing is to expose the
> primary-plane associated with the crtc.

Cool - so a patch which adds a way to query the what formats a crtc
supports would be welcome?

What about a way to query the stride alignment constraints?

Presumably using the drm_mode_get_property mechanism would be the
right way to implement that?

> > As with v4l2, DRM doesn't appear to have a way to query the stride
> > constraints? Assuming there is a way to query the stride constraints,
> > there also isn't a way to specify them when creating a buffer with
> > DRM, though perhaps the existing pitch parameter of
> > drm_mode_create_dumb could be used to allow user-space to pass in a
> > minimum stride as well as receive the allocated stride?
> >
> 
> well, you really shouldn't be using create_dumb..  you should have a
> userspace piece that is specific to the drm driver, and knows how to
> use that driver's gem allocate ioctl.

Sorry, why does this need a driver-specific allocation function? It's
just a display controller driver and I just want to allocate a scan-
out buffer - all I'm asking is for the display controller driver to
use a minimum stride alignment so I can export the buffer and use
another device to fill it with data.

The whole point is to be able to allocate the buffer in such a way
that another device can access it. So the driver _can't_ use a
special, device specific format, nor can it allocate it from a
private memory pool because doing so would preclude it from being
shared with another device.

That other device doesn't need to be a GPU wither, it could just as
easily be a camera/ISP or video decoder.

> >> > So presumably you're talking about a GPU driver being the exporter
> >> > here? If so, how could the GPU driver do these kind of tricks on
> >> > memory shared with another device?
> >>
> >> Yes, that is gpu-as-exporter.  If someone else is allocating
> >> buffers, it is up to them to do these tricks or not.  Probably 
> >> there is a pretty good chance that if you aren't a GPU you don't 
> >> need those sort of tricks for fast allocation of transient upload 
> >> buffers, staging textures, temporary pixmaps, etc.  Ie. I don't 
> >> really think a v4l camera or video decoder would benefit from that 
> >> sort of optimization.
> >
> > Right - but none of those are really buffers you'd want to export
> 
> > with dma_buf to share with another device are they? In which case, 
> > why not just have dma_buf figure out the constraints and allocate 
> > the memory?
>
> maybe not.. but (a) you don't necessarily know at creation time if it
> is going to be exported (maybe you know if it is definitely not going
> to be exported, but the converse is not true),

I can't actually think of an example where you would not know if a
buffer was going to be exported or not at allocation time? Do you have
a case in mind?

Regardless, you'd certainly have to know if a buffer will be exported
pretty quickly, before it's used so that you can import it into
whatever devices are going to access it. Otherwise if it gets
allocated before you export it, the allocation won't satisfy the
constraints of the other devices which will need to access it and
importing will fail. Assuming of course deferred allocation of the
backing pages as discussed earlier in the thread.

> and (b) there isn't
> really any reason to special case the allocation in the driver because
> it is going to be exported.

Not sure I follow you here? Surely you absolutely have to special-case
the allocation if the buffer is to be exported because you have to
take the other devices' constraints into account when you allocate? Or
do you mean you don't need to special-case the GEM buffer object
creation, only the allocation of the backing pages? Though I'm not
sure how that distinction is useful - at the end of the day, you need
to special-case allocation of the backing pages.

> helpers that can be used by simple drivers, yes.  Forcing the way the
> buffer is allocated, for sure not.  Currently, for example, there is
> no issue to export a buffer allocated from stolen-mem.

Where stolen-mem is the PC-world's version of a carveout? I.e. A chunk
of memory reserved at boot for the GPU which the OS can't touch? I
guess I view such memory as accessible to all media devices on the 
system and as such, needs to be managed by a central allocator which
dma_buf can use to allocate from.

I guess if that stolen-mem is managed by a single device then in
essence that device becomes the central allocator you have to use to
be able to allocate from that stolen mem?

> > If a driver needs to allocate memory in a special way for a
> > particular device, I can't really imagine how it would be able 
> > to share that buffer with another device using dma_buf? I guess 
> > a driver is likely to need some magic voodoo to configure access 
> > to the buffer for its device, but surely that would be done by 
> > the dma_mapping framework when dma_buf_map happens?
> >
> 
> if, what it has to configure actually manages to fit in the
> dma-mapping framework

But if it doesn't, surely that's an issue which needs to be addressed
in the dma_mapping framework or else you won't be able to import
buffers for use by that device anyway?

> anyways, where the pages come from has nothing to do with whether a
> buffer can be shared or not

Sure, but where they are located in physical memory really does
matter.

> >> At any rate, for both xorg and wayland/gbm, you know when a buffer
> >> is going to be a scanout buffer.  What I'd recommend is define a 
> >> small userspace API that your customers (the SoC vendors) implement
> >> to allocate a scanout buffer and hand you back a dmabuf fd.  That 
> >> could be used both for x11 and for gbm.  Inputs should be requested
> >> width/height and format.  And outputs pitch plus dmabuf fd.
> >>
> >> (Actually you might even just want to use gbm as your starting
> >> point. You could probably just use gbm from xf86-video-armsoc for
> >> allocation, to have one thing that works for both wayland and x11.
> >> Scanout and cursor buffers should go to vendor/SoC specific fxn, 
> >> rest can be allocated from mali kernel driver.)
> >
> > What does that buy us over just using drm_mode_create_dumb on the
> > display's DRM driver?
> 
> well, for example, if there was actually some hw w/ omap's dss + mali,
> you could actually have mali render transparently to tiled buffers
> which could be scanned out rotated.  Which would not be possible w/
> dumb buffers.

Why not? As you said earlier, the format is defined when you setup the
fb with drm_mode_fb_cmd2. If you wanted to share the buffer between
devices, you have to be explicit about what format that buffer is in,
so you'd have to add an entry to drm_fourcc.h for the tiled format.

So userspace queries what formats the GPU DRM supports and what
formats the OMAP DSS DRM supports, selects the tiled format and then 
uses drm_mode_create_dumb to allocate a buffer of the correct size and
sets the appropriate drm_fourcc.h enum value when creating an fb for
that buffer. Or have I missed something?

> >> >> For example, on omapdrm, the SCANOUT flag does nothing on omap4+
> >> >> (where phys contig is not required for scanout), but causes CMA
> >> >> (dma_alloc_*()) to be used on omap3.  Userspace doesn't care.
> >> >> It just knows that it wants to be able to scanout that particular
> >> >> buffer.
> >> >
> >> > I think that's the idea? The omap3's allocator driver would use
> >> > contiguous memory when it detects the SCANOUT flag whereas the
> >> > omap4 allocator driver wouldn't have to. No complex negotiation
> >> > of constraints - it just "knows".
> >> >
> >>
> >> well, it is same allocating driver in both cases (although maybe
> >> that is unimportant).  The "it" that just knows it wants to scanout 
> >> is userspace.  The "it" that just knows that scanout translates to
> >> contiguous (or not) is the kernel.  Perhaps we are saying the same
> >> thing ;-)
> >
> > Yeah - I think we are... so what's the issue with having a per-SoC
> > allocation driver again?
> >
> 
> In a way the display driver is a per-SoC allocator.  But not
> necessarily the *central* allocator for everything.  Ie. no need for
> display driver to allocate vertex buffers for a separate gpu driver,
> and that sort of thing.

Again, I'm only talking about allocating buffers which will be shared
between different devices. At no point have I mentioned the allocation
of buffers which aren't to be shared between devices. Sorry if that's
not been clear.

So for buffers which are to be shared between devices, your suggesting
that the display driver is the per-SoC allocator? But as I say, and
how this thread got started, the same display driver can be used on
different SoCs, so having _it_ be the central allocator isn't ideal.
Though this is our current solution and why we're "abusing" the dumb 
buffer allocation functions. :-)

Cheers,

Tom