A simple alternative to GEMr

Thu Oct 3 19:00:14 CEST 2013

On Thu, Oct 3, 2013 at 7:48 AM, dm.leontiev7 <dm.leontiev7 at gmail.com> wrote:
> Hello
>
> In my opinion, graphics stack will benefit from moving memory management to userspace because there are tons of features not available in kernel, like simd or c++.

both of which bring no benefit to memory management code

> Also, bugs in buffer management code will bite only one process, not the whole system.

As soon as you need to pin pages (which you need to do, except for the
hw that Jerome is targetting with his proposal where the GPU can
really support virtual memory), memory management becomes a whole
system issue..  pinning pages can only be done from the kernel and it
is pretty frowned upon to have a driver that lets userspace pin
arbitrary pages without being able to keep track of those pages and
clean up.

Anyways, it is much better to trust the kernel than userspace.  In
system design, you must assume userspace is untrusted.  If you have
enough tracking for random pages that userspace asks the kernel to pin
for the gpu in order to cleanup when userspace process dies, then you
have *more* complexity than what you have in GEM.  Trust me, it is far
easier for the kernel to deal with buffer handles than having go
figure out the pages backing a random vma (get_user_pages()) and
keeping track of things on a per-page basis.

>
> However, tile-based page flipping can be implemented without major changes in graphics stack and it may improve double-buffered 2D rendering performance by reducing amount of blitted pixels by reusing unchanged pages. If GPU's ROP units can take pixels from one location(front buffer) and put results to another one(back buffer), blitting may be completely avoided if a small area of double buffered window is updated.
>

Taking pixels from one location to another sounds like blitting to me.
 But anyways, client GL app blitting (or otherwise) directly into
front buffer is basically defeating the purpose of dri2

And tile base page flipping is an orthogonal topic to userspace vs
kernel memory management.

> As for security, there are thousands of ways to peeform a DoS attack. In windows, one can eat so much ram, so user will be unable to kill an app because the task manager will not start. To avoid this, some memory must be reserved for emergency situation, enough to perform 2D rendering by single client. Multiple clients will be able to render their gui without caching of window contents even under stress conditions. Also, kernel dri module must be able to warn a client  if it must return memory to system and reset it's context on task manager request
>

With the current GEM design, buffers can be swapped out under memory
pressure, or the appropriate cleanup done if OOM killer kills a
userspace process.

Doing the memory management in userspace, there are just so many ways
that things can go wrong.  And once you've fixed those, you end up
with something more complex.   Sorry, it is just a really bad idea.

BR,
-R

> Regards, Dmitry.
>
>
>
> Пользователь Rob Clark <robdclark at gmail.com> писал:
>
>>right, but the time you do that, you've implemented enough memory
>>tracking/management in the kernel, so you don't really win on
>>complexity.  Otherwise those pinned pages will remain pinned, and you
>>are still out of memory.
>>
>>BR,
>>-R
>>
>>
>>On Fri, Sep 27, 2013 at 7:53 PM, dm.leontiev7 <dm.leontiev7 at gmail.com> wrote:
>>> DoS from client app is a certainly a problem if we can't interrupt a program. But we can.
>>>
>>> The program ate all gpu ram, ok. Let wm to cast oom killer on gpu ram eater.j
>>>
>>> Пользователь Rob Clark <robdclark at gmail.com> писал:
>>>
>>>>sure, but userspace memory management is not a good idea for gpu's
>>>>which cannot support page fault & resume, as it requires pinning
>>>>pages.  In the best case (ignoring other issues), it allows any
>>>>userspace that can use GPU easily construct a DoS attach by pinning
>>>>all available memory.
>>>>
>>>>BR,
>>>>-R
>>>>
>>>>On Fri, Sep 27, 2013 at 6:54 PM, dm.leontiev7 <dm.leontiev7 at gmail.com> wrote:
>>>>> My idea targets not only new gpus. it targets any GPU with MMU.
>>>>>
>>>>>
>>>>> I  just want the idea to be not patentable.
>>>>>
>>>>> Пользователь Rob Clark <robdclark at gmail.com> писал:
>>>>>
>>>>>>new gpu's can support coherency.. this is the HSA stuff (latest
>>>>>>generation of radeon can support, and I think latest nv stuff as
>>>>>>well.. probably not any current intel hw, though).  What Jerome was
>>>>>>talking about is a bit different from what you are trying to do.
>>>>>>
>>>>>>On Fri, Sep 27, 2013 at 6:41 PM, dm.leontiev7 <dm.leontiev7 at gmail.com> wrote:
>>>>>>> Passing structures... well, maybe sometimes in future.
>>>>>>>
>>>>>>> But NOW we are not living in infuture. Right now gpus doesn't support cache snooping, memory coherence protocols like MESI or MOESI. Radeon cache is read-only. And memory is NUMA. Just forget about coherence.
>>>>>>>
>>>>>>> I see no point in fighting selfmade problems. Really.
>>>>>>>
>>>>>>> Пользователь Rob Clark <robdclark at gmail.com> писал:
>>>>>>>
>>>>>>>>Jerome's talk was about something above and beyond opencl, where you
>>>>>>>>can just pass data structures (which can include cpu userspace ptrs)
>>>>>>>>to the gpu for more transparent cpu/gpu interoperability.. (ie.
>>>>>>>>without explicit map step)
>>>>>>>>
>>>>>>>>BR,
>>>>>>>>-R
>>>>>>>>
>>>>>>>>On Fri, Sep 27, 2013 at 5:54 PM, dm.leontiev7 <dm.leontiev7 at gmail.com> wrote:
>>>>>>>>> In my opinion, GART support can be dropped because non pci-e hardware is just not usable with modern linux distros. It is too old and does not have enough ram.
>>>>>>>>>
>>>>>>>>> About page faults: I don't really understand what is the problem with page faults. All pages referenced by memory map must be locked before execution of a gpu operation. Memory map must be locked(by rwsem) while it is in use.
>>>>>>>>>
>>>>>>>>> Пользователь Rob Clark <robdclark at gmail.com> писал:
>>>>>>>>>
>>>>>>>>>>For GL yes (ignoring some important details like GART size
>>>>>>>>>>limitations, alignment, etc)
>>>>>>>>>>
>>>>>>>>>>Jerome's talk was about doing things where an explicit map-to-gpu is
>>>>>>>>>>not required... think of things like passing a pointer to a linked
>>>>>>>>>>list to a shader.  For that you need to let the CPU intervene on page
>>>>>>>>>>fault from GPU.
>>>>>>>>>>
>>>>>>>>>>BR,
>>>>>>>>>>-R
>>>>>>>>>>
>>>>>>>>>>On Fri, Sep 27, 2013 at 4:48 PM, dm.leontiev7 <dm.leontiev7 at gmail.com> wrote:
>>>>>>>>>>> Hello
>>>>>>>>>>>
>>>>>>>>>>> Page fault support is not required: virtual address space can be separated into 3 areas: read-only, write-only and read-write. So, no read-write protection on mmu level is required.
>>>>>>>>>>>
>>>>>>>>>>> Non-existent pages are not the problem because an application has to allocate page before mapping it. Pages must always exist.
>>>>>>>>>>>
>>>>>>>>>>> On page deallocation driver must invalidate all affected memory maps.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Dmitry
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Пользователь Rob Clark <robdclark at gmail.com> писал:
>>>>>>>>>>>
>>>>>>>>>>>>On Fri, Sep 27, 2013 at 3:08 PM, Christian König
>>>>>>>>>>>><deathsimple at vodafone.de> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> A different story is backing buffers with anonymous system memory. I was
>>>>>>>>>>>>> told that Jerome just recently did a very interesting talk at XDC about it
>>>>>>>>>>>>> (didn't have time to look at it myself).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>note that this requires a gpu which can page fault (and more
>>>>>>>>>>>>importantly, resume after cpu intervenes on page fault).. which I
>>>>>>>>>>>>think means modern(ish) radeon or nv..
>>>>>>>>>>>>
>>>>>>>>>>>>BR,
>>>>>>>>>>>>-R