[git pull] habanalabs for drm-next-6.7
Oded Gabbay
ogabbay at kernel.org
Tue Oct 10 09:55:21 UTC 2023
Hi Dave, Daniel.
Habanalabs pull request for 6.7.
It's a bit all over the place, a few uapi changes, mostly improvements and
bug fixes.
Notable things are the move to the accel subsystem in the code itself,
meaning we removed the habanalabs class and the code to created device char
and instead we are registering to accel. Also notable is moving some
firmware interface files to include/linux/habanalabs. This is needed as
a pre-requisite for upstreaming the Gaudi2 NIC drivers, which will include
those files.
Full details are in the signed tag.
Thanks,
Oded
The following changes since commit 389af786f92ecdff35883551d54bf4e507ffcccb:
Merge tag 'drm-intel-next-2023-09-29' of git://anongit.freedesktop.org/drm/drm-intel into drm-next (2023-10-04 13:55:19 +1000)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux.git tags/drm-habanalabs-next-2023-10-10
for you to fetch changes up to 4db74c0fdeb8138f6438d42a015c5dcdb2e6874c:
accel/habanalabs/gaudi2: fix spmu mask creation (2023-10-09 12:37:24 +0300)
----------------------------------------------------------------
This tag contains habanalabs driver changes for v6.7.
The notable changes are:
- uAPI changes:
- Expose tsc clock sampling to better sync clock information in profiler.
- Enhance engine error reporting in the info ioctl.
- Block access to the eventfd operations through the control device.
- Disable the option of the user to register multiple times with the same
offset for timestamp dump by the driver. If a user wants to use the same
offset in the timestamp buffer for different interrupt, it needs to first
de-register the offset.
- When exporting dma-buf (for p2p), force the user to specify size/offset
in multiples of PAGE_SIZE. This is instead of the driver doing the
rounding to PAGE_SIZE, which has caused the driver to map more memory
than was intended by the user.
- New features and improvements:
- Complete the move of the driver to the accel subsystem by removing the
custom habanalabs class and major and registering to accel subsystem.
- Move the firmware interface files to include/linux/habanalabs. This is
a pre-requisite for upstreaming the NIC drivers of Gaudi (as they need to
include those files).
- Perform device hard-reset upon PCIe AXI drain event to prevent the failure
from cascading to different IP blocks in the SoC. In secured environments,
this is done automatically by the firmware.
- Print device name when it is removed for better debuggability.
- Add support for trace of dma map sgtable operations.
- Optimize handling of user interrupts by splitting the interrupts to two
lists. One list for fast handling and second list for handling with
timestamp recording, which is slower.
- Prevent double device hard-reset due to 2 adjacent H/W events.
- Set device status 'malfunction' while in rmmod.
- Firmware related fixes:
- Extend preboot timeout because preboot loading might take longer than
expected in certain cases.
- Add a protection mechanism for the Event Queue. In case it is full, the
firmware will be able to notify about it through a dedicated interrupt.
- Perform device hard-reset in case scrubbing of memory has failed.
- Bug fixes and code cleanups:
- Small fixes of dma-buf handling in Gaudi2, such as handling an offset != 0,
using the correct exported size, creation of sg table.
- Fix spmu mask creation.
- Fix bug in wait for cs completion for decoder workloads.
- Cleanup Greco name from documentation.
- Fix bug in recording timestamp during cs completion interrupt handling.
- Fix CoreSight ETF configuration and flush logic.
- Fix small bug in hpriv_list handling (the list that contains the private
data per process that opens our device).
----------------------------------------------------------------
Ariel Suller (1):
accel/habanalabs: update boot status print
Arnd Bergmann (1):
accel/habanalabs: add missing debugfs function stubs
Benjamin Dotan (3):
accel/habanalabs/gaudi2 : remove psoc_arc access
accel/habanalabs: fix ETR/ETF flush logic
accel/habanalabs: improve etf configuration
Christophe JAILLET (1):
accel/habanalabs/gaudi2: Fix incorrect string length computation in gaudi2_psoc_razwi_get_engines()
Dafna Hirschfeld (5):
accel/habanalabs: disable events ioctls on control device
accel/habanalabs: fix inline doc typos
accel/habanalabs: add fw status SHUTDOWN_PREP
accel/habanalabs: extend preboot timeout when preboot might take longer
accel/habanalabs: remove wrong doc for init_phys_pg_pack_from_userptr
Dani Liberman (2):
accel/habanalabs: handle arc farm razwi
accel/habanalabs: handle f/w reserved dram space request
David Meriin (1):
accel/habanalabs: move cpucp interface to linux/habanalabs
Hen Alon (1):
accel/habanalabs: add tsc clock sampling to clock sync info
Igor Grinberg (2):
accel/habanalabs/gaudi2: prepare to remove soft_rst_irq
accel/habanalabs/gaudi2: prepare to remove cpu_rst_status
Ivan Orlov (1):
accel: make accel_class a static const structure
Juerg Haefliger (1):
accel/habanalabs/gaudi: Add MODULE_FIRMWARE macros
Justin Stitt (2):
accel/habanalabs: refactor deprecated strncpy to strscpy_pad
accel/habanalabs: refactor deprecated strncpy
Koby Elbaz (4):
accel/habanalabs: set device status 'malfunction' while in rmmod
accel/habanalabs: print return code when process termination fails
accel/habanalabs: call put_pid after hpriv list is updated
accel/habanalabs: rename fd_list to hpriv_list
Moti Haimovski (1):
accel/habanalabs/gaudi2: print power-mode changes
Oded Gabbay (14):
accel/habanalabs: remove pdev check on idle check
accel/habanalabs: reset device if scrubbing failed
accel/habanalabs/gaudi2: fix missing check of kernel ctx
accel/habanalabs: remove unused asic functions
accel/habanalabs: minor cosmetics update to cpucp_if.h
accel/habanalabs: minor cosmetics update to trace file
accel/habanalabs: change Greco to Gaudi2
accel/habanalabs/gaudi: remove unused structure definition
accel/habanalabs: remove unused field
accel/habanalabs: print device name when it is removed
accel/habanalabs: remove leftover code
accel/habanalabs/gaudi: remove define used for simulator
accel/habanalabs: minor cosmetic update to habanalabs.h
accel/habanalabs/gaudi2: fix spmu mask creation
Ofir Bitton (6):
accel/habanalabs: notify user about undefined opcode event
accel/habanalabs: stop fetching MME SBTE error cause
accel/habanalabs: dump temperature threshold boot error
accel/habanalabs/gaudi2: unsecure tpc count registers
accel/habanalabs: add info ioctl for engine error reports
accel/habanalabs/gaudi2: include block id in ECC error reporting
Ohad Sharabi (2):
accel/habanalabs: add traces for dma mappings
accel/habanalabs: trace dma map sgtable
Tomer Tayar (19):
accel/habanalabs: prevent immediate hard reset due to 2 adjacent H/W events
accel/habanalabs: update pending reset flags with new reset requests
accel/habanalabs: print task name and request code upon ioctl failure
accel/habanalabs: print task name upon creation of a user context
accel/habanalabs/gaudi2: un-secure register for engine cores interrupt
accel/habanalabs: set default device release watchdog T/O as 30 sec
accel/habanalabs: register compute device as an accel device
accel/habanalabs: update sysfs-driver-habanalabs with the accel path
accel/habanalabs: update debugfs-driver-habanalabs with the accel path
accel/habanalabs: Move ioctls to the device specific ioctls range
accel/habanalabs: always pass exported size to alloc_sgt_from_device_pages()
accel/habanalabs: use exported size from dma_buf and not from phys_pg_pack
accel/habanalabs: export dma-buf only if size/offset multiples of PAGE_SIZE
accel/habanalabs: tiny refactor of hl_map_dmabuf()
accel/habanalabs: fix SG table creation for dma-buf mapping
accel/habanalabs: set hl_dmabuf_priv.device_address only when needed
accel/habanalabs: add missing offset handling for dma-buf
accel/habanalabs: add debug prints to dump content of SG table for dma-buf
accel/habanalabs/gaudi2: perform hard-reset upon PCIe AXI drain event
farah kassabri (10):
accel/habanalabs: fix standalone preboot descriptor request
accel/habanalabs: Allow single timestamp registration request at a time
accel/habanalabs: fix wait_for_interrupt abortion flow
accel/habanalabs/gaudi2: handle eq health heartbeat check
accel/habanalabs/gaudi2: add eq health check using irq
accel/habanalabs: prevent sending heartbeat before events are enabled
accel/habanalabs: fix bug in timestamp interrupt handling
accel/habanalabs: optimize timestamp registration handler
accel/habanalabs: split user interrupts pending list
accel/habanalabs: fix bug in decoder wait for cs completion
.../ABI/testing/debugfs-driver-habanalabs | 82 ++--
Documentation/ABI/testing/sysfs-driver-habanalabs | 64 +--
MAINTAINERS | 1 +
drivers/accel/drm_accel.c | 21 +-
drivers/accel/habanalabs/common/command_buffer.c | 5 +-
.../accel/habanalabs/common/command_submission.c | 488 ++++++++++++---------
drivers/accel/habanalabs/common/context.c | 9 +-
drivers/accel/habanalabs/common/debugfs.c | 22 +-
drivers/accel/habanalabs/common/device.c | 425 +++++++++++-------
drivers/accel/habanalabs/common/firmware_if.c | 45 +-
drivers/accel/habanalabs/common/habanalabs.h | 212 +++++----
drivers/accel/habanalabs/common/habanalabs_drv.c | 186 ++++----
drivers/accel/habanalabs/common/habanalabs_ioctl.c | 112 +++--
drivers/accel/habanalabs/common/irq.c | 180 ++++++--
drivers/accel/habanalabs/common/memory.c | 308 +++++++------
drivers/accel/habanalabs/gaudi/gaudi.c | 17 +-
drivers/accel/habanalabs/gaudi/gaudiP.h | 2 +-
drivers/accel/habanalabs/gaudi/gaudi_coresight.c | 12 +
drivers/accel/habanalabs/gaudi2/gaudi2.c | 487 ++++++++++++++++----
drivers/accel/habanalabs/gaudi2/gaudi2P.h | 4 +-
drivers/accel/habanalabs/gaudi2/gaudi2_coresight.c | 46 +-
drivers/accel/habanalabs/gaudi2/gaudi2_security.c | 21 +-
drivers/accel/habanalabs/goya/goya.c | 10 +-
drivers/accel/habanalabs/goya/goyaP.h | 2 +-
drivers/accel/habanalabs/goya/goya_coresight.c | 10 +
.../accel/habanalabs/include/gaudi/gaudi_fw_if.h | 32 --
.../include/gaudi2/gaudi2_async_events.h | 7 +
.../include/gaudi2/gaudi2_async_ids_map_extended.h | 16 +-
.../common => include/linux/habanalabs}/cpucp_if.h | 36 +-
.../linux/habanalabs}/hl_boot_if.h | 7 +
include/trace/events/habanalabs.h | 45 +-
include/uapi/drm/habanalabs_accel.h | 68 +--
32 files changed, 1919 insertions(+), 1063 deletions(-)
rename {drivers/accel/habanalabs/include/common => include/linux/habanalabs}/cpucp_if.h (98%)
rename {drivers/accel/habanalabs/include/common => include/linux/habanalabs}/hl_boot_if.h (98%)
More information about the dri-devel
mailing list