GEM allocation for para-virtualized DRM driver

Tue Mar 21 15:12:18 UTC 2017

On 03/20/2017 08:52 PM, Rob Clark wrote:
> On Mon, Mar 20, 2017 at 2:25 PM, Oleksandr Andrushchenko
> <andr2000 at gmail.com> wrote:
>> On 03/20/2017 08:17 PM, Rob Clark wrote:
>>> On Mon, Mar 20, 2017 at 2:01 PM, Oleksandr Andrushchenko
>>> <andr2000 at gmail.com> wrote:
>>>> On 03/20/2017 07:38 PM, Rob Clark wrote:
>>>>> On Mon, Mar 20, 2017 at 1:18 PM, Oleksandr Andrushchenko
>>>>> <andr2000 at gmail.com> wrote:
>>>>>>
>>>>>> On 03/18/2017 02:22 PM, Rob Clark wrote:
>>>>>>> On Fri, Mar 17, 2017 at 1:39 PM, Oleksandr Andrushchenko
>>>>>>> <andr2000 at gmail.com> wrote:
>>>>>>>> Hello,
>>>>>>>> I am writing a para-virtualized DRM driver for Xen hypervisor
>>>>>>>> and it now works with DRM CMA helpers, but I would also like
>>>>>>>> to make it work with non-contigous memory: virtual machine
>>>>>>>> that the driver runs in can't guarantee that CMA is actually
>>>>>>>> physically contigous (that is not a problem because of IPMMU
>>>>>>>> and other means, the only constraint I have is that I cannot mmap
>>>>>>>> with pgprot == noncached). So, I am planning to use
>>>>>>>> *drm_gem_get_pages*
>>>>>>>> +
>>>>>>>> *shmem_read_mapping_page_gfp* to allocate memory for GEM objects
>>>>>>>> (scanout buffers + dma-bufs shared with virtual GPU)
>>>>>>>>
>>>>>>>> Do you think this is the right approach to take?
>>>>>>> I guess if you had some case where you needed to "migrate" buffers
>>>>>>> between host and guest memory, then TTM might be useful.  Otherwise
>>>>>>> this sounds like the right approach.
>>>>>> Tried that today (drm_gem_get_pages), the result is interesting:
>>>>>>
>>>>>> 1. modetest
>>>>>> 1.1. Runs, I can see page flips
>>>>>> 1.2. vm_operations_struct.fault is called, I can vm_insert_page
>>>>>>
>>>>>> 2. kmscube (Rob, thanks for that :) + PowerVR SGX 6250
>>>>>> 2.1. Cannot initialize EGL
>>>>>> 2.2. vm_operations_struct.fault is NOT called
>>>>> jfwiw, pages will only get faulted in when CPU accesses them..
>>>> indeed, good catch
>>>>> modetest "renders" the frame on the CPU but kmscube does it on gpu.
>>>> yes, I have already learned that modetest only renders once and
>>>> then just flips
>>>>> So not seeing vm_operations_struct.fault is normal.  The EGL fail is
>>>>> not..
>>>>>
>>>>>> In both cases 2 dumbs are created and successfully mmaped,
>>>>>> in case of kmscube there are also handle_to_fd IOCTLs issued
>>>>>> and no DRM errors observed. No DMA-BUF mmap attempt seen
>>>>>>
>>>>>> I re-checked 2) with alloc_pages + remap_pfn_range and it works
>>>>>> (it cannot unmap cleanly, but it could be because I didn't call
>>>>>> split_pages after alloc_pages), thus the setup is still good
>>>>>>
>>>>>> Can it be that the buffer allocated with drm_gem_get_pages
>>>>>> doesn't suit PowerVR for some reason?
>>>>> I've no idea what the state of things is w/ pvr as far as gbm support
>>>>> (not required/used by modetest, but anything that uses the gpu on
>>>>> "bare metal" needs it).  Or what the state of dmabuf-import is with
>>>>> pvr.
>>>> Do you think there could be DMA related problems with
>>>> the buffer allocated with drm_gem_get_pages and DMA mapping,
>>>> use? So GPU is not able to handle those?
>>>>
>>>> The only source of knowledge at the moment I have is
>>>> publicly available pvrsrvkm kernel module. But there are
>>>> other unknowns, e.g. user-space libraries, firmware which
>>>> are in binary form: thus kernel driver is mostly a bridge
>>>> between FW and libs. That being said, do you think I have to get
>>>> deeper into GPU use-case or should I switch back to alloc_pages+
>>>> remap_pfn_range? ;)
>>> so, I suppose with pvr there is a whole host of potential pain... *but*..
>>>
>>> if alloc_pages path actually works, then perhaps the issue is the
>>> deferred allocation.  Ie. most drivers don't drm_gem_get_pages() until
>>> the buffer is passed to hw or until it is faulted in.  You should make
>>> sure it ends up getting called (if it hasn't been called already)
>>> somewhere in gem_prime_pin.
>> I call drm_gem_get_pages as part of dumb creation, because I
>> need to pass the pages to the host OS. So, probably, this is not
>> because of the late allocation, but something else
> hmm, well all the pvr gpu's that I've had to deal with in the past
> have MMUs, so there shouldn't be any specific issue with where the
> pages come from.  But I guess you have to poke around the kernel
> module to see where things go wrong with dmabuf import (or if it even
> gets that far)
well, if I do vm_insert_page on .mmap for the whole
buffer, then everything is ok for both GPU and CPU, so
probably I'll leave it that way.
I also removed .fault handler as it seems to be not needed
if we mmap the whole thing at once

> BR,
> -R
>
>>> BR,
>>> -R
>> Thank you!
Thank you for helping!