[RFC 0/1] drm/pl111: Initial drm/kms driver for pl111

Mon Aug 5 11:07:12 PDT 2013

On Mon, Aug 5, 2013 at 1:10 PM, Tom Cooksey <tom.cooksey at arm.com> wrote:
> Hi Rob,
>
> +linux-media, +linaro-mm-sig for discussion of video/camera
> buffer constraints...
>
>
>> On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey <tom.cooksey at arm.com>
>> wrote:
>> >> >  * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to also
>> >> >    allocate buffers for the GPU. Still not sure how to resolve
>> >> >    this as we don't use DRM for our GPU driver.
>> >>
>> >> any thoughts/plans about a DRM GPU driver?  Ideally long term (esp.
>> >> once the dma-fence stuff is in place), we'd have gpu-specific drm
>> >> (gpu-only, no kms) driver, and SoC/display specific drm/kms driver,
>> >> using prime/dmabuf to share between the two.
>> >
>> > The "extra" buffers we were allocating from armsoc DDX were really
>> > being allocated through DRM/GEM so we could get an flink name
>> > for them and pass a reference to them back to our GPU driver on
>> > the client side. If it weren't for our need to access those
>> > extra off-screen buffers with the GPU we wouldn't need to
>> > allocate them with DRM at all. So, given they are really "GPU"
>> > buffers, it does absolutely make sense to allocate them in a
>> > different driver to the display driver.
>> >
>> > However, to avoid unnecessary memcpys & related cache
>> > maintenance ops, we'd also like the GPU to render into buffers
>> > which are scanned out by the display controller. So let's say
>> > we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
>> > out buffers with the display's DRM driver but a custom ioctl
>> > on the GPU's DRM driver to allocate non scanout, off-screen
>> > buffers. Sounds great, but I don't think that really works
>> > with DRI2. If we used two drivers to allocate buffers, which
>> > of those drivers do we return in DRI2ConnectReply? Even if we
>> > solve that somehow, GEM flink names are name-spaced to a
>> > single device node (AFAIK). So when we do a DRI2GetBuffers,
>> > how does the EGL in the client know which DRM device owns GEM
>> > flink name "1234"? We'd need some pretty dirty hacks.
>>
>> You would return the name of the display driver allocating the
>> buffers.  On the client side you can use generic ioctls to go from
>> flink -> handle -> dmabuf.  So the client side would end up opening
>> both the display drm device and the gpu, but without needing to know
>> too much about the display.
>
> I think the bit I was missing was that a GEM bo for a buffer imported
> using dma_buf/PRIME can still be flink'd. So the display controller's
> DRM driver allocates scan-out buffers via the DUMB buffer allocate
> ioctl. Those scan-out buffers than then be exported from the
> dispaly's DRM driver and imported into the GPU's DRM driver using
> PRIME. Once imported into the GPU's driver, we can use flink to get a
> name for that buffer within the GPU DRM driver's name-space to return
> to the DRI2 client. That same namespace is also what DRI2 back-buffers
> are allocated from, so I think that could work... Except...
>

(and.. the general direction is that things will move more to just use
dmabuf directly, ie. wayland or dri3)

>
>> > Anyway, that latter case also gets quite difficult. The "GPU"
>> > DRM driver would need to know the constraints of the display
>> > controller when allocating buffers intended to be scanned out.
>> > For example, pl111 typically isn't behind an IOMMU and so
>> > requires physically contiguous memory. We'd have to teach the
>> > GPU's DRM driver about the constraints of the display HW. Not
>> > exactly a clean driver model. :-(
>> >
>> > I'm still a little stuck on how to proceed, so any ideas
>> > would greatly appreciated! My current train of thought is
>> > having a kind of SoC-specific DRM driver which allocates
>> > buffers for both display and GPU within a single GEM
>> > namespace. That SoC-specific DRM driver could then know the
>> > constraints of both the GPU and the display HW. We could then
>> > use PRIME to export buffers allocated with the SoC DRM driver
>> > and import them into the GPU and/or display DRM driver.
>>
>> Usually if the display drm driver is allocating the buffers that might
>> be scanned out, it just needs to have minimal knowledge of the GPU
>> (pitch alignment constraints).  I don't think we need a 3rd device
>> just to allocate buffers.
>
> While Mali can render to pretty much any buffer, there is a mild
> performance improvement to be had if the buffer stride is aligned to
> the AXI bus's max burst length when drawing to the buffer.

I suspect the display controllers might frequently benefit if the
pitch is aligned to AXI burst length too..

> So in some respects, there is a constraint on how buffers which will
> be drawn to using the GPU are allocated. I don't really like the idea
> of teaching the display controller DRM driver about the GPU buffer
> constraints, even if they are fairly trivial like this. If the same
> display HW IP is being used on several SoCs, it seems wrong somehow
> to enforce those GPU constraints if some of those SoCs don't have a
> GPU.

Well, I suppose you could get min_pitch_alignment from devicetree, or
something like this..

In the end, the easy solution is just to make the display allocate to
the worst-case pitch alignment.  In the early days of dma-buf
discussions, we kicked around the idea of negotiating or
programatically describing the constraints, but that didn't really
seem like a bounded problem.

> We may also then have additional constraints when sharing buffers
> between the display HW and video decode or even camera ISP HW.
> Programmatically describing buffer allocation constraints is very
> difficult and I'm not sure you can actually do it - there's some
> pretty complex constraints out there! E.g. I believe there's a
> platform where Y and UV planes of the reference frame need to be in
> separate DRAM banks for real-time 1080p decode, or something like
> that?

yes, this was discussed.  This is different from pitch/format/size
constraints.. it is really just a placement constraint (ie. where do
the physical pages go).  IIRC the conclusion was to use a dummy
devices with it's own CMA pool for attaching the Y vs UV buffers.

> Anyway, I guess my point is that even if we solve how to allocate
> buffers which will be shared between the GPU and display HW such that
> both sets of constraints are satisfied, that may not be the end of
> the story.
>

that was part of the reason to punt this problem to userspace ;-)

In practice, the kernel drivers doesn't usually know too much about
the dimensions/format/etc.. that is really userspace level knowledge.
There are a few exceptions when the kernel needs to know how to setup
GTT/etc for tiled buffers, but normally this sort of information is up
at the next level up (userspace, and drm_framebuffer in case of
scanout).  Userspace media frameworks like GStreamer already have a
concept of format/caps negotiation.  For non-display<->gpu sharing, I
think this is probably where this sort of constraint negotiation
should be handled.

BR,
-R

>
> Cheers,
>
> Tom
>
>
>
>
>