[PATCH 00/83] AMD HSA kernel driver

Tue Jul 15 10:06:56 PDT 2014

>-----Original Message-----
>From: Dave Airlie [mailto:airlied at gmail.com]
>Sent: Tuesday, July 15, 2014 12:35 AM
>To: Christian König
>Cc: Jerome Glisse; Bridgman, John; Lewycky, Andrew; linux-
>kernel at vger.kernel.org; dri-devel at lists.freedesktop.org; Deucher,
>Alexander; akpm at linux-foundation.org
>Subject: Re: [PATCH 00/83] AMD HSA kernel driver
>
>On 14 July 2014 18:37, Christian König <deathsimple at vodafone.de> wrote:
>>> I vote for HSA module that expose ioctl and is an intermediary with
>>> the kernel driver that handle the hardware. This gives a single point
>>> for HSA hardware and yes this enforce things for any hardware
>manufacturer.
>>> I am more than happy to tell them that this is it and nothing else if
>>> they want to get upstream.
>>
>> I think we should still discuss this single point of entry a bit more.
>>
>> Just to make it clear the plan is to expose all physical HSA capable
>> devices through a single /dev/hsa device node to userspace.
>
>This is why we don't design kernel interfaces in secret foundations, and
>expect anyone to like them.

Understood and agree. In this case though this isn't a cross-vendor interface designed by a secret committee, it's supposed to be more of an inoffensive little single-vendor interface designed *for* a secret committee. I'm hoping that's better ;)

>
>So before we go any further, how is this stuff planned to work for multiple
>GPUs/accelerators?

Three classes of "multiple" :

1. Single CPU with IOMMUv2 and multiple GPUs:

- all devices accessible via /dev/kfd
- topology information identifies CPU + GPUs, each has "node ID" at top of userspace API, "global ID" at user/kernel interface
 (don't think we've implemented CPU part yet though)
- userspace builds snapshot from sysfs info & exposes to HSAIL runtime, which in turn exposes the "standard" API
- kfd sets up ATC aperture so GPUs can access system RAM via IOMMUv2 (fast for APU, relatively less so for dGPU over PCIE)
- to-be-added memory operations allow allocation & residency control (within existing gfx driver limits) of buffers in VRAM & carved-out system RAM
- queue operations specify a node ID to userspace library, which translates to "global ID" before calling kfd

2. Multiple CPUs connected via fabric (eg HyperTransport) each with 0 or more GPUs:

- topology information exposes CPUs & GPUs, along with affinity info showing what is connected to what
- everything else works as in (1) above

3. Multiple CPUs not connected via fabric (eg a blade server) each with 0 or more GPUs

- no attempt to cover this with HSA topology, each CPU and associated GPUs is accessed independently via separate /dev/kfd instances

>
>Do we have a userspace to exercise this interface so we can see how such a
>thing would look?

Yes -- initial IP review done, legal stuff done, sanitizing WIP, hoping for final approval this week

There's a separate test harness to exercise the userspace lib calls, haven't started IP review or sanitizing for that but legal stuff is done

>
>Dave.