[PATCH v2 00/25] AMDKFD kernel driver

Mon Jul 21 10:28:50 PDT 2014

On 21/07/14 20:05, Daniel Vetter wrote:
> On Mon, Jul 21, 2014 at 11:58:52AM -0400, Jerome Glisse wrote:
>> On Mon, Jul 21, 2014 at 05:25:11PM +0200, Daniel Vetter wrote:
>>> On Mon, Jul 21, 2014 at 03:39:09PM +0200, Christian König wrote:
>>>> Am 21.07.2014 14:36, schrieb Oded Gabbay:
>>>>> On 20/07/14 20:46, Jerome Glisse wrote:
>>>>>> On Thu, Jul 17, 2014 at 04:57:25PM +0300, Oded Gabbay wrote:
>>>>>>> Forgot to cc mailing list on cover letter. Sorry.
>>>>>>>
>>>>>>> As a continuation to the existing discussion, here is a v2 patch series
>>>>>>> restructured with a cleaner history and no
>>>>>>> totally-different-early-versions
>>>>>>> of the code.
>>>>>>>
>>>>>>> Instead of 83 patches, there are now a total of 25 patches, where 5 of
>>>>>>> them
>>>>>>> are modifications to radeon driver and 18 of them include only amdkfd
>>>>>>> code.
>>>>>>> There is no code going away or even modified between patches, only
>>>>>>> added.
>>>>>>>
>>>>>>> The driver was renamed from radeon_kfd to amdkfd and moved to reside
>>>>>>> under
>>>>>>> drm/radeon/amdkfd. This move was done to emphasize the fact that this
>>>>>>> driver
>>>>>>> is an AMD-only driver at this point. Having said that, we do foresee a
>>>>>>> generic hsa framework being implemented in the future and in that
>>>>>>> case, we
>>>>>>> will adjust amdkfd to work within that framework.
>>>>>>>
>>>>>>> As the amdkfd driver should support multiple AMD gfx drivers, we want
>>>>>>> to
>>>>>>> keep it as a seperate driver from radeon. Therefore, the amdkfd code is
>>>>>>> contained in its own folder. The amdkfd folder was put under the radeon
>>>>>>> folder because the only AMD gfx driver in the Linux kernel at this
>>>>>>> point
>>>>>>> is the radeon driver. Having said that, we will probably need to move
>>>>>>> it
>>>>>>> (maybe to be directly under drm) after we integrate with additional
>>>>>>> AMD gfx
>>>>>>> drivers.
>>>>>>>
>>>>>>> For people who like to review using git, the v2 patch set is located
>>>>>>> at:
>>>>>>> http://cgit.freedesktop.org/~gabbayo/linux/log/?h=kfd-next-3.17-v2
>>>>>>>
>>>>>>> Written by Oded Gabbayh <oded.gabbay at amd.com>
>>>>>>
>>>>>> So quick comments before i finish going over all patches. There is many
>>>>>> things that need more documentation espacialy as of right now there is
>>>>>> no userspace i can go look at.
>>>>> So quick comments on some of your questions but first of all, thanks for
>>>>> the time you dedicated to review the code.
>>>>>>
>>>>>> There few show stopper, biggest one is gpu memory pinning this is a big
>>>>>> no, that would need serious arguments for any hope of convincing me on
>>>>>> that side.
>>>>> We only do gpu memory pinning for kernel objects. There are no userspace
>>>>> objects that are pinned on the gpu memory in our driver. If that is the
>>>>> case, is it still a show stopper ?
>>>>>
>>>>> The kernel objects are:
>>>>> - pipelines (4 per device)
>>>>> - mqd per hiq (only 1 per device)
>>>>> - mqd per userspace queue. On KV, we support up to 1K queues per process,
>>>>> for a total of 512K queues. Each mqd is 151 bytes, but the allocation is
>>>>> done in 256 alignment. So total *possible* memory is 128MB
>>>>> - kernel queue (only 1 per device)
>>>>> - fence address for kernel queue
>>>>> - runlists for the CP (1 or 2 per device)
>>>>
>>>> The main questions here are if it's avoid able to pin down the memory and if
>>>> the memory is pinned down at driver load, by request from userspace or by
>>>> anything else.
>>>>
>>>> As far as I can see only the "mqd per userspace queue" might be a bit
>>>> questionable, everything else sounds reasonable.
>>>
>>> Aside, i915 perspective again (i.e. how we solved this): When scheduling
>>> away from contexts we unpin them and put them into the lru. And in the
>>> shrinker we have a last-ditch callback to switch to a default context
>>> (since you can't ever have no context once you've started) which means we
>>> can evict any context object if it's getting in the way.
>>
>> So Intel hardware report through some interrupt or some channel when it is
>> not using a context ? ie kernel side get notification when some user context
>> is done executing ?
> 
> Yes, as long as we do the scheduling with the cpu we get interrupts for
> context switches. The mechanic is already published in the execlist
> patches currently floating around. We get a special context switch
> interrupt.
> 
> But we have this unpin logic already on the current code where we switch
> contexts through in-line cs commands from the kernel. There we obviously
> use the normal batch completion events.
> 
>> The issue with radeon hardware AFAICT is that the hardware do not report any
>> thing about the userspace context running ie you do not get notification when
>> a context is not use. Well AFAICT. Maybe hardware do provide that.
> 
> I'm not sure whether we can do the same trick with the hw scheduler. But
> then unpinning hw contexts will drain the pipeline anyway, so I guess we
> can just stop feeding the hw scheduler until it runs dry. And then unpin
> and evict.
So, I'm afraid but we can't do this for AMD Kaveri because:

a. The hw scheduler doesn't inform us which queues it is going to
execute next. We feed it a runlist of queues, which can be very large
(we have a test that runs 1000 queues on the same runlist, but we can
put a lot more). All the MQDs of those queues must be pinned in memory
as long as the runlist is in effect. The runlist is in effect until
either a queue is deleted or a queue is added (or something more extreme
happens, like the process terminates).

b. The hw scheduler takes care of VMID to PASID mapping. We don't
program the ATC registers manually, the internal CP does that
dynamically, so we basically have over-subscription of processes as
well. Therefore, we can't ping MQDs based on VMID binding.

I don't see AMD moving back to SW scheduling, as it doesn't scale well
with the number of processes and queues and our next gen APU will have a
lot more queues than what we have on KV

	Oded
> 
>> Like the VMID is a limited resources so you have to dynamicly bind them so
>> maybe we can only allocate pinned buffer for each VMID and then when binding
>> a PASID to a VMID it also copy back pinned buffer to pasid unpinned copy.
> 
> Yeah, pasid assignment will be fun. Not sure whether Jesse's patches will
> do this already. We _do_ already have fun with ctx id assigments though
> since we move them around (and the hw id is the ggtt address afaik). So we
> need to remap them already. Not sure on the details for pasid mapping,
> iirc it's a separate field somewhere in the context struct. Jesse knows
> the details.
> -Daniel
>