[PATCH v4 4/4] doc: add documentation for accel subsystem

Mon Nov 21 15:26:50 UTC 2022

On 11/21/2022 8:18 AM, Oded Gabbay wrote:
> On Mon, Nov 21, 2022 at 12:02 AM Jeffrey Hugo <quic_jhugo at quicinc.com> wrote:
>>
>> On 11/19/2022 1:44 PM, Oded Gabbay wrote:
>>> Add an introduction section for the accel subsystem. Most of the
>>> relevant data is in the DRM documentation, so the introduction only
>>> presents the why of the new subsystem, how are the compute accelerators
>>> exposed to user-space and what changes need to be done in a standard
>>> DRM driver to register it to the new accel subsystem.
>>>
>>> Signed-off-by: Oded Gabbay <ogabbay at kernel.org>
>>> ---
>>>    Documentation/accel/index.rst        |  17 +++++
>>>    Documentation/accel/introduction.rst | 109 +++++++++++++++++++++++++++
>>>    Documentation/subsystem-apis.rst     |   1 +
>>>    MAINTAINERS                          |   1 +
>>>    4 files changed, 128 insertions(+)
>>>    create mode 100644 Documentation/accel/index.rst
>>>    create mode 100644 Documentation/accel/introduction.rst
>>>
>>> diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
>>> new file mode 100644
>>> index 000000000000..2b43c9a7f67b
>>> --- /dev/null
>>> +++ b/Documentation/accel/index.rst
>>> @@ -0,0 +1,17 @@
>>> +.. SPDX-License-Identifier: GPL-2.0
>>> +
>>> +====================
>>> +Compute Accelerators
>>> +====================
>>> +
>>> +.. toctree::
>>> +   :maxdepth: 1
>>> +
>>> +   introduction
>>> +
>>> +.. only::  subproject and html
>>> +
>>> +   Indices
>>> +   =======
>>> +
>>> +   * :ref:`genindex`
>>> diff --git a/Documentation/accel/introduction.rst b/Documentation/accel/introduction.rst
>>> new file mode 100644
>>> index 000000000000..5a3963eae973
>>> --- /dev/null
>>> +++ b/Documentation/accel/introduction.rst
>>> @@ -0,0 +1,109 @@
>>> +.. SPDX-License-Identifier: GPL-2.0
>>> +
>>> +============
>>> +Introduction
>>> +============
>>> +
>>> +The Linux compute accelerators subsystem is designed to expose compute
>>> +accelerators in a common way to user-space and provide a common set of
>>> +functionality.
>>> +
>>> +These devices can be either stand-alone ASICs or IP blocks inside an SoC/GPU.
>>> +Although these devices are typically designed to accelerate Machine-Learning
>>> +and/or Deep-Learning computations, the accel layer is not limited to handling
>>
>> You use "DL" later on as a short form for Deep-Learning.  It would be
>> good to introduce that here.
>>
>>> +these types of accelerators.
>>> +
>>> +typically, a compute accelerator will belong to one of the following
>>
>> Typically
>>
>>> +categories:
>>> +
>>> +- Edge AI - doing inference at an edge device. It can be an embedded ASIC/FPGA,
>>> +  or an IP inside a SoC (e.g. laptop web camera). These devices
>>> +  are typically configured using registers and can work with or without DMA.
>>> +
>>> +- Inference data-center - single/multi user devices in a large server. This
>>> +  type of device can be stand-alone or an IP inside a SoC or a GPU. It will
>>> +  have on-board DRAM (to hold the DL topology), DMA engines and
>>> +  command submission queues (either kernel or user-space queues).
>>> +  It might also have an MMU to manage multiple users and might also enable
>>> +  virtualization (SR-IOV) to support multiple VMs on the same device. In
>>> +  addition, these devices will usually have some tools, such as profiler and
>>> +  debugger.
>>> +
>>> +- Training data-center - Similar to Inference data-center cards, but typically
>>> +  have more computational power and memory b/w (e.g. HBM) and will likely have
>>> +  a method of scaling-up/out, i.e. connecting to other training cards inside
>>> +  the server or in other servers, respectively.
>>> +
>>> +All these devices typically have different runtime user-space software stacks,
>>> +that are tailored-made to their h/w. In addition, they will also probably
>>> +include a compiler to generate programs to their custom-made computational
>>> +engines. Typically, the common layer in user-space will be the DL frameworks,
>>> +such as PyTorch and TensorFlow.
>>> +
>>> +Sharing code with DRM
>>> +=====================
>>> +
>>> +Because this type of devices can be an IP inside GPUs or have similar
>>> +characteristics as those of GPUs, the accel subsystem will use the
>>> +DRM subsystem's code and functionality. i.e. the accel core code will
>>> +be part of the DRM subsystem and an accel device will be a new type of DRM
>>> +device.
>>> +
>>> +This will allow us to leverage the extensive DRM code-base and
>>> +collaborate with DRM developers that have experience with this type of
>>> +devices. In addition, new features that will be added for the accelerator
>>> +drivers can be of use to GPU drivers as well.
>>> +
>>> +Differentiation from GPUs
>>> +=========================
>>> +
>>> +Because we want to prevent the extensive user-space graphic software stack
>>> +from trying to use an accelerator as a GPU, the compute accelerators will be
>>> +differentiated from GPUs by using a new major number and new device char files.
>>> +
>>> +Furthermore, the drivers will be located in a separate place in the kernel
>>> +tree - drivers/accel/.
>>> +
>>> +The accelerator devices will be exposed to the user space with the dedicated
>>> +261 major number and will have the following convention:
>>> +
>>> +- device char files - /dev/accel/accel*
>>> +- sysfs             - /sys/class/accel/accel*/
>>> +- debugfs           - /sys/kernel/debug/accel/accel*/
>>> +
>>> +Getting Started
>>> +===============
>>> +
>>> +First, read the DRM documentation. Not only it will explain how to write a new
>>
>> How about a link to the DRM documentation?
>>
>>> +DRM driver but it will also contain all the information on how to contribute,
>>> +the Code Of Conduct and what is the coding style/documentation. All of that
>>> +is the same for the accel subsystem.
>>> +
>>> +Second, make sure the kernel is configured with CONFIG_DRM_ACCEL.
>>> +
>>> +To expose your device as an accelerator, two changes are needed to
>>> +be done in your driver (as opposed to a standard DRM driver):
>>> +
>>> +- Add the DRIVER_COMPUTE_ACCEL feature flag in your drm_driver's
>>> +  driver_features field. It is important to note that this driver feature is
>>> +  mutually exclusive with DRIVER_RENDER and DRIVER_MODESET. Devices that want
>>
>> I don't remember seeing code that validates a driver with
>> DRIVER_COMPUTE_ACCEL does not also have DRIVER_MODESET.  What am I missing?
> 
> Look at drm_dev_init() (patch 3/4):
> 
> if (drm_core_check_feature(dev, DRIVER_COMPUTE_ACCEL) &&
>                 (drm_core_check_feature(dev, DRIVER_RENDER) ||
>                   drm_core_check_feature(dev, DRIVER_MODESET))) {
>              DRM_ERROR("DRM driver can't be both a compute acceleration
> and graphics driver\n");
>               return -EINVAL;
> }

Ah.  I saw "RENDER", but "MODESET" didn't register in my brain.  Thanks 
for pointing it out to me.  All good here.