[PATCH v2 00/25] AMDKFD kernel driver

Wed Jul 23 14:46:54 PDT 2014

>-----Original Message-----
>From: dri-devel [mailto:dri-devel-bounces at lists.freedesktop.org] On Behalf
>Of Jesse Barnes
>Sent: Wednesday, July 23, 2014 5:00 PM
>To: dri-devel at lists.freedesktop.org
>Subject: Re: [PATCH v2 00/25] AMDKFD kernel driver
>
>On Mon, 21 Jul 2014 19:05:46 +0200
>daniel at ffwll.ch (Daniel Vetter) wrote:
>
>> On Mon, Jul 21, 2014 at 11:58:52AM -0400, Jerome Glisse wrote:
>> > On Mon, Jul 21, 2014 at 05:25:11PM +0200, Daniel Vetter wrote:
>> > > On Mon, Jul 21, 2014 at 03:39:09PM +0200, Christian K?nig wrote:
>> > > > Am 21.07.2014 14:36, schrieb Oded Gabbay:
>> > > > >On 20/07/14 20:46, Jerome Glisse wrote:
>
>[snip!!]
My BlackBerry thumb thanks you ;)
>
>> > > >
>> > > > The main questions here are if it's avoid able to pin down the
>> > > > memory and if the memory is pinned down at driver load, by
>> > > > request from userspace or by anything else.
>> > > >
>> > > > As far as I can see only the "mqd per userspace queue" might be
>> > > > a bit questionable, everything else sounds reasonable.
>> > >
>> > > Aside, i915 perspective again (i.e. how we solved this): When
>> > > scheduling away from contexts we unpin them and put them into the
>> > > lru. And in the shrinker we have a last-ditch callback to switch
>> > > to a default context (since you can't ever have no context once
>> > > you've started) which means we can evict any context object if it's
>getting in the way.
>> >
>> > So Intel hardware report through some interrupt or some channel when
>> > it is not using a context ? ie kernel side get notification when
>> > some user context is done executing ?
>>
>> Yes, as long as we do the scheduling with the cpu we get interrupts
>> for context switches. The mechanic is already published in the
>> execlist patches currently floating around. We get a special context
>> switch interrupt.
>>
>> But we have this unpin logic already on the current code where we
>> switch contexts through in-line cs commands from the kernel. There we
>> obviously use the normal batch completion events.
>
>Yeah and we can continue that going forward.  And of course if your hw can
>do page faulting, you don't need to pin the normal data buffers.
>
>Usually there are some special buffers that need to be pinned for longer
>periods though, anytime the context could be active.  Sounds like in this case
>the userland queues, which makes some sense.  But maybe for smaller
>systems the size limit could be clamped to something smaller than 128M.  Or
>tie it into the rlimit somehow, just like we do for mlock() stuff.
>
Yeah, even the queues are in pageable memory, it's just a ~256 byte structure per queue (the Memory Queue Descriptor) that describes the queue to hardware, plus a couple of pages for each process using HSA to hold things like doorbells. Current thinking is to limit # processes using HSA to ~256 and #queues per process to ~1024 by default in the initial code, although my guess is that we could take the #queues per process default limit even lower.  

>> > The issue with radeon hardware AFAICT is that the hardware do not
>> > report any thing about the userspace context running ie you do not
>> > get notification when a context is not use. Well AFAICT. Maybe hardware
>do provide that.
>>
>> I'm not sure whether we can do the same trick with the hw scheduler.
>> But then unpinning hw contexts will drain the pipeline anyway, so I
>> guess we can just stop feeding the hw scheduler until it runs dry. And
>> then unpin and evict.
>
>Yeah we should have an idea which contexts have been fed to the scheduler,
>at least with kernel based submission.  With userspace submission we'll be in a
>tougher spot...  but as you say we can always idle things and unpin everything
>under pressure.  That's a really big hammer to apply though.
>
>> > Like the VMID is a limited resources so you have to dynamicly bind
>> > them so maybe we can only allocate pinned buffer for each VMID and
>> > then when binding a PASID to a VMID it also copy back pinned buffer to
>pasid unpinned copy.
>>
>> Yeah, pasid assignment will be fun. Not sure whether Jesse's patches
>> will do this already. We _do_ already have fun with ctx id assigments
>> though since we move them around (and the hw id is the ggtt address
>> afaik). So we need to remap them already. Not sure on the details for
>> pasid mapping, iirc it's a separate field somewhere in the context
>> struct. Jesse knows the details.
>
>The PASID space is a bit bigger, 20 bits iirc.  So we probably won't run out
>quickly or often.  But when we do I thought we could apply the same trick
>Linux uses for ASID management on SPARC and ia64 (iirc on sparc anyway,
>maybe MIPS too): "allocate" a PASID everytime you need one, but don't tie it
>to the process at all, just use it as a counter that lets you know when you need
>to do a full TLB flush, then start the allocation process over.  This lets you
>minimize TLB flushing and gracefully handles oversubscription.

IIRC we have a 9-bit limit for PASID on current hardware, although that will go up in future. 
>
>My current code doesn't bother though; context creation will fail if we run out
>of PASIDs on a given device.
>
>--
>Jesse Barnes, Intel Open Source Technology Center
>_______________________________________________
>dri-devel mailing list
>dri-devel at lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/dri-devel