*** New in v3:
Added device tree bindings for new property. Rebased. ***
*** New in v2:
Added support for Tegra194 Use standard iommu-map property instead of custom mechanism ***
This series adds support for Host1x 'context isolation'. Since when programming engines through Host1x, userspace can program in any addresses it wants, we need some way to isolate the engines' memory spaces. Traditionally this has either been done imperfectly with a single shared IOMMU domain, or by copying and verifying the programming command stream at submit time (Host1x firewall).
Since Tegra186 there is a privileged (only usable by kernel) Host1x opcode that allows setting the stream ID sent by the engine to the SMMU. So, by allocating a number of context banks and stream IDs for this purpose, and using this opcode at the beginning of each job, we can implement isolation. Due to the limited number of context banks only each process gets its own context, and not each channel.
This feature also allows sharing engines among multiple VMs when used with Host1x's hardware virtualization support - up to 8 VMs can be configured with a subset of allowed stream IDs, enforced at hardware level.
To implement this, this series adds a new host1x context bus, which will contain the 'struct device's corresponding to each context bank / stream ID, changes to device tree and SMMU code to allow registering the devices and using the bus, as well as the Host1x stream ID programming code and support in TegraDRM.
Thanks, Mikko
Mikko Perttunen (9): dt-bindings: host1x: Add memory-contexts property gpu: host1x: Add context bus gpu: host1x: Add context device management code gpu: host1x: Program context stream ID on submission iommu/arm-smmu: Attach to host1x context device bus arm64: tegra: Add Host1x context stream IDs on Tegra186+ drm/tegra: falcon: Set DMACTX field on DMA transactions drm/tegra: vic: Implement get_streamid_offset drm/tegra: Support context isolation
.../display/tegra/nvidia,tegra20-host1x.yaml | 10 + arch/arm64/boot/dts/nvidia/tegra186.dtsi | 12 ++ arch/arm64/boot/dts/nvidia/tegra194.dtsi | 12 ++ drivers/gpu/Makefile | 3 +- drivers/gpu/drm/tegra/drm.h | 2 + drivers/gpu/drm/tegra/falcon.c | 8 + drivers/gpu/drm/tegra/falcon.h | 1 + drivers/gpu/drm/tegra/submit.c | 13 ++ drivers/gpu/drm/tegra/uapi.c | 36 +++- drivers/gpu/drm/tegra/vic.c | 38 ++++ drivers/gpu/host1x/Kconfig | 5 + drivers/gpu/host1x/Makefile | 2 + drivers/gpu/host1x/context.c | 174 ++++++++++++++++++ drivers/gpu/host1x/context.h | 27 +++ drivers/gpu/host1x/context_bus.c | 31 ++++ drivers/gpu/host1x/dev.c | 12 +- drivers/gpu/host1x/dev.h | 2 + drivers/gpu/host1x/hw/channel_hw.c | 52 +++++- drivers/gpu/host1x/hw/host1x06_hardware.h | 10 + drivers/gpu/host1x/hw/host1x07_hardware.h | 10 + drivers/iommu/arm/arm-smmu/arm-smmu.c | 13 ++ include/linux/host1x.h | 21 +++ include/linux/host1x_context_bus.h | 15 ++ 23 files changed, 500 insertions(+), 9 deletions(-) create mode 100644 drivers/gpu/host1x/context.c create mode 100644 drivers/gpu/host1x/context.h create mode 100644 drivers/gpu/host1x/context_bus.c create mode 100644 include/linux/host1x_context_bus.h
Add schema information for the memory-contexts property used to specify context stream IDs. This uses the standard iommu-map property inside a child node.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com --- v3: * New patch --- .../bindings/display/tegra/nvidia,tegra20-host1x.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml index 4fd513efb0f7..3ac0fde54a16 100644 --- a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml +++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml @@ -144,6 +144,16 @@ allOf: reset-names: maxItems: 1
+ memory-contexts: + type: object + properties: + iommu-map: + description: Specification of stream IDs available for memory context device + use. Should be a mapping of IDs 0..n to IOMMU entries corresponding to + usable stream IDs. + required: + - iommu-map + required: - reg-names
On 2022-02-18 11:39, Mikko Perttunen via iommu wrote:
Add schema information for the memory-contexts property used to specify context stream IDs. This uses the standard iommu-map property inside a child node.
Couldn't you simply make "iommu-map" an allowed property on the host1x node itself? From a DT perspective I'm not sure the intermediate node really fits meaningfully, and I can't see that it serves much purpose in practice either, other than perhaps defeating fw_devlink.
Robin.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com
v3:
- New patch
.../bindings/display/tegra/nvidia,tegra20-host1x.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml index 4fd513efb0f7..3ac0fde54a16 100644 --- a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml +++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml @@ -144,6 +144,16 @@ allOf: reset-names: maxItems: 1
memory-contexts:
type: object
properties:
iommu-map:
description: Specification of stream IDs available for memory context device
use. Should be a mapping of IDs 0..n to IOMMU entries corresponding to
usable stream IDs.
required:
- iommu-map
required: - reg-names
On 2/21/22 17:23, Robin Murphy wrote:
On 2022-02-18 11:39, Mikko Perttunen via iommu wrote:
Add schema information for the memory-contexts property used to specify context stream IDs. This uses the standard iommu-map property inside a child node.
Couldn't you simply make "iommu-map" an allowed property on the host1x node itself? From a DT perspective I'm not sure the intermediate node really fits meaningfully, and I can't see that it serves much purpose in practice either, other than perhaps defeating fw_devlink.
Robin.
The stream IDs described here are not used by the host1x device itself, so I don't think I can. Host1x's memory transactions still go through the stream ID specified in its 'iommus' property, these stream IDs are used by engines (typically in addition to the stream ID specified in their own nodes).
Host1x 'iommus' -- Channel commands Engine 'iommus' -- Engine firmware (and data if context isolation is not enabled) memory-contexts 'iommu-map' -- Data used by engines.
(Perhaps I should add this information to various places in more abundance and clarity.)
Mikko
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com
v3:
- New patch
.../bindings/display/tegra/nvidia,tegra20-host1x.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
index 4fd513efb0f7..3ac0fde54a16 100644
a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
+++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
@@ -144,6 +144,16 @@ allOf: reset-names: maxItems: 1 + memory-contexts: + type: object + properties: + iommu-map: + description: Specification of stream IDs available for memory context device + use. Should be a mapping of IDs 0..n to IOMMU entries corresponding to + usable stream IDs. + required: + - iommu-map
required: - reg-names
On 2022-02-21 15:28, Mikko Perttunen wrote:
On 2/21/22 17:23, Robin Murphy wrote:
On 2022-02-18 11:39, Mikko Perttunen via iommu wrote:
Add schema information for the memory-contexts property used to specify context stream IDs. This uses the standard iommu-map property inside a child node.
Couldn't you simply make "iommu-map" an allowed property on the host1x node itself? From a DT perspective I'm not sure the intermediate node really fits meaningfully, and I can't see that it serves much purpose in practice either, other than perhaps defeating fw_devlink.
Robin.
The stream IDs described here are not used by the host1x device itself, so I don't think I can. Host1x's memory transactions still go through the stream ID specified in its 'iommus' property, these stream IDs are used by engines (typically in addition to the stream ID specified in their own nodes).
Host1x 'iommus' -- Channel commands Engine 'iommus' -- Engine firmware (and data if context isolation is not enabled) memory-contexts 'iommu-map' -- Data used by engines.
Right, that still appears to match my understanding, that as far as software sees, the host1x is effectively acting as a bridge to the engines in itself. Even if it's not physically routing traffic in and/or out, the host1x device is the place where the context IDs *logically* exist, and thus owns the mapping between context IDs and the StreamIDs emitted by any engine working in a given context.
Consider a PCIe root complex with integrated endpoints - chances are the RCiEPs have their own physical interfaces to issue DMA directly into the SoC interconnect, but that doesn't change how we describe the PCI Requester ID to StreamID mapping at the root complex, since the RC still logically owns the RID space. You can think of a RID as being "consumed" at the RC by indexing into config space to ultimately gain control of the corresponding endpoint, just like context IDs are "consumed" at the host1x by generating commands to ultimately cause some engine to operate in the correct address space.
You don't have to pretend the host1x uses a context for its own command-fetching (or whatever) traffic either - it's always been intended that the "iommus" and "iommu-map" properties should happily be able to coexist on the same node, since they serve distinctly different purposes. If it doesn't work in practice then we've got a bug to fix somewhere.
If the context-switching mechanism was some distinct self-contained thing bolted on beside the other host1x functionality then describing it as a separate level of DT hierarchy might be more justifiable, but that's not the impression I'm getting from skimming the rest of the series. Just reading of the names of things in patch #6, my intuitive reaction is that clearly each host1x owns 9 StreamIDs, one for general stuff and 8 for contexts. Adding the knowledge that technically the context StreamIDs end up delegated to other host1x-controlled engines still doesn't shift the paradigm. I don't believe we need a level of DT structure purely to help document what the iommu-map means for host1x - the binding can do that just fine.
Thanks, Robin.
(Perhaps I should add this information to various places in more abundance and clarity.)
Mikko
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com
v3:
- New patch
.../bindings/display/tegra/nvidia,tegra20-host1x.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
index 4fd513efb0f7..3ac0fde54a16 100644
a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
+++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
@@ -144,6 +144,16 @@ allOf: reset-names: maxItems: 1 + memory-contexts: + type: object + properties: + iommu-map: + description: Specification of stream IDs available for memory context device + use. Should be a mapping of IDs 0..n to IOMMU entries corresponding to + usable stream IDs. + required: + - iommu-map
required: - reg-names
On 2/21/22 18:58, Robin Murphy wrote:
On 2022-02-21 15:28, Mikko Perttunen wrote:
On 2/21/22 17:23, Robin Murphy wrote:
On 2022-02-18 11:39, Mikko Perttunen via iommu wrote:
Add schema information for the memory-contexts property used to specify context stream IDs. This uses the standard iommu-map property inside a child node.
Couldn't you simply make "iommu-map" an allowed property on the host1x node itself? From a DT perspective I'm not sure the intermediate node really fits meaningfully, and I can't see that it serves much purpose in practice either, other than perhaps defeating fw_devlink.
Robin.
The stream IDs described here are not used by the host1x device itself, so I don't think I can. Host1x's memory transactions still go through the stream ID specified in its 'iommus' property, these stream IDs are used by engines (typically in addition to the stream ID specified in their own nodes).
Host1x 'iommus' -- Channel commands Engine 'iommus' -- Engine firmware (and data if context isolation is not enabled) memory-contexts 'iommu-map' -- Data used by engines.
Right, that still appears to match my understanding, that as far as software sees, the host1x is effectively acting as a bridge to the engines in itself. Even if it's not physically routing traffic in and/or out, the host1x device is the place where the context IDs *logically* exist, and thus owns the mapping between context IDs and the StreamIDs emitted by any engine working in a given context.
Consider a PCIe root complex with integrated endpoints - chances are the RCiEPs have their own physical interfaces to issue DMA directly into the SoC interconnect, but that doesn't change how we describe the PCI Requester ID to StreamID mapping at the root complex, since the RC still logically owns the RID space. You can think of a RID as being "consumed" at the RC by indexing into config space to ultimately gain control of the corresponding endpoint, just like context IDs are "consumed" at the host1x by generating commands to ultimately cause some engine to operate in the correct address space.
You don't have to pretend the host1x uses a context for its own command-fetching (or whatever) traffic either - it's always been intended that the "iommus" and "iommu-map" properties should happily be able to coexist on the same node, since they serve distinctly different purposes. If it doesn't work in practice then we've got a bug to fix somewhere.
Interesting, I had assumed that they were exclusive but indeed comparing with PCIe this makes sense. I'll look into it.
If the context-switching mechanism was some distinct self-contained thing bolted on beside the other host1x functionality then describing it as a separate level of DT hierarchy might be more justifiable, but that's not the impression I'm getting from skimming the rest of the series. Just reading of the names of things in patch #6, my intuitive reaction is that clearly each host1x owns 9 StreamIDs, one for general stuff and 8 for contexts. Adding the knowledge that technically the context StreamIDs end up delegated to other host1x-controlled engines still doesn't shift the paradigm. I don't believe we need a level of DT structure purely to help document what the iommu-map means for host1x - the binding can do that just fine.
Theoretically there can be any number of these stream IDs, but indeed, there is quite specific HW support for this in Host1x.
Thanks for your help once again! Mikko
Thanks, Robin.
(Perhaps I should add this information to various places in more abundance and clarity.)
Mikko
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com
v3:
- New patch
.../bindings/display/tegra/nvidia,tegra20-host1x.yaml | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
index 4fd513efb0f7..3ac0fde54a16 100644
a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
+++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
@@ -144,6 +144,16 @@ allOf: reset-names: maxItems: 1 + memory-contexts: + type: object + properties: + iommu-map: + description: Specification of stream IDs available for memory context device + use. Should be a mapping of IDs 0..n to IOMMU entries corresponding to + usable stream IDs. + required: + - iommu-map
required: - reg-names
The context bus is a "dummy" bus that contains struct devices that correspond to IOMMU contexts assigned through Host1x to processes.
Even when host1x itself is built as a module, the bus is registered in built-in code so that the built-in ARM SMMU driver is able to reference it.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/Makefile | 3 +-- drivers/gpu/host1x/Kconfig | 5 +++++ drivers/gpu/host1x/Makefile | 1 + drivers/gpu/host1x/context_bus.c | 31 ++++++++++++++++++++++++++++++ include/linux/host1x_context_bus.h | 15 +++++++++++++++ 5 files changed, 53 insertions(+), 2 deletions(-) create mode 100644 drivers/gpu/host1x/context_bus.c create mode 100644 include/linux/host1x_context_bus.h
diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile index 835c88318cec..8997f0096545 100644 --- a/drivers/gpu/Makefile +++ b/drivers/gpu/Makefile @@ -2,7 +2,6 @@ # drm/tegra depends on host1x, so if both drivers are built-in care must be # taken to initialize them in the correct order. Link order is the only way # to ensure this currently. -obj-$(CONFIG_TEGRA_HOST1X) += host1x/ -obj-y += drm/ vga/ +obj-y += host1x/ drm/ vga/ obj-$(CONFIG_IMX_IPUV3_CORE) += ipu-v3/ obj-$(CONFIG_TRACE_GPU_MEM) += trace/ diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig index 6815b4db17c1..1861a8180d3f 100644 --- a/drivers/gpu/host1x/Kconfig +++ b/drivers/gpu/host1x/Kconfig @@ -1,8 +1,13 @@ # SPDX-License-Identifier: GPL-2.0-only + +config TEGRA_HOST1X_CONTEXT_BUS + bool + config TEGRA_HOST1X tristate "NVIDIA Tegra host1x driver" depends on ARCH_TEGRA || (ARM && COMPILE_TEST) select DMA_SHARED_BUFFER + select TEGRA_HOST1X_CONTEXT_BUS select IOMMU_IOVA help Driver for the NVIDIA Tegra host1x hardware. diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index d2b6f7de0498..c891a3e33844 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -18,3 +18,4 @@ host1x-y = \ hw/host1x07.o
obj-$(CONFIG_TEGRA_HOST1X) += host1x.o +obj-$(CONFIG_TEGRA_HOST1X_CONTEXT_BUS) += context_bus.o diff --git a/drivers/gpu/host1x/context_bus.c b/drivers/gpu/host1x/context_bus.c new file mode 100644 index 000000000000..2625914f3c7d --- /dev/null +++ b/drivers/gpu/host1x/context_bus.c @@ -0,0 +1,31 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2021, NVIDIA Corporation. + */ + +#include <linux/device.h> +#include <linux/of.h> + +struct bus_type host1x_context_device_bus_type = { + .name = "host1x-context", +}; +EXPORT_SYMBOL(host1x_context_device_bus_type); + +static int __init host1x_context_device_bus_init(void) +{ + int err; + + if (!of_machine_is_compatible("nvidia,tegra186") && + !of_machine_is_compatible("nvidia,tegra194") && + !of_machine_is_compatible("nvidia,tegra234")) + return 0; + + err = bus_register(&host1x_context_device_bus_type); + if (err < 0) { + pr_err("bus type registration failed: %d\n", err); + return err; + } + + return 0; +} +postcore_initcall(host1x_context_device_bus_init); diff --git a/include/linux/host1x_context_bus.h b/include/linux/host1x_context_bus.h new file mode 100644 index 000000000000..72462737a6db --- /dev/null +++ b/include/linux/host1x_context_bus.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (c) 2021, NVIDIA Corporation. All rights reserved. + */ + +#ifndef __LINUX_HOST1X_CONTEXT_BUS_H +#define __LINUX_HOST1X_CONTEXT_BUS_H + +#include <linux/device.h> + +#ifdef CONFIG_TEGRA_HOST1X_CONTEXT_BUS +extern struct bus_type host1x_context_device_bus_type; +#endif + +#endif
18.02.2022 14:39, Mikko Perttunen пишет:
+config TEGRA_HOST1X_CONTEXT_BUS
- bool
config TEGRA_HOST1X tristate "NVIDIA Tegra host1x driver" depends on ARCH_TEGRA || (ARM && COMPILE_TEST) select DMA_SHARED_BUFFER
- select TEGRA_HOST1X_CONTEXT_BUS
What is the point of TEGRA_HOST1X_CONTEXT_BUS if it's selected unconditionally?
19.02.2022 20:54, Dmitry Osipenko пишет:
18.02.2022 14:39, Mikko Perttunen пишет:
+config TEGRA_HOST1X_CONTEXT_BUS
- bool
config TEGRA_HOST1X tristate "NVIDIA Tegra host1x driver" depends on ARCH_TEGRA || (ARM && COMPILE_TEST) select DMA_SHARED_BUFFER
- select TEGRA_HOST1X_CONTEXT_BUS
What is the point of TEGRA_HOST1X_CONTEXT_BUS if it's selected unconditionally?
I see now that it's used by arm-smmu.c, should be okay then.
Add code to register context devices from device tree, allocate them out and manage their refcounts.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com --- v2: * Directly set DMA mask instead of inheriting from Host1x. * Use iommu-map instead of custom DT property. --- drivers/gpu/host1x/Makefile | 1 + drivers/gpu/host1x/context.c | 174 +++++++++++++++++++++++++++++++++++ drivers/gpu/host1x/context.h | 27 ++++++ drivers/gpu/host1x/dev.c | 12 ++- drivers/gpu/host1x/dev.h | 2 + include/linux/host1x.h | 17 ++++ 6 files changed, 232 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/host1x/context.c create mode 100644 drivers/gpu/host1x/context.h
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index c891a3e33844..8a65e13d113a 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -10,6 +10,7 @@ host1x-y = \ debug.o \ mipi.o \ fence.o \ + context.o \ hw/host1x01.o \ hw/host1x02.o \ hw/host1x04.o \ diff --git a/drivers/gpu/host1x/context.c b/drivers/gpu/host1x/context.c new file mode 100644 index 000000000000..987c08a1e2f2 --- /dev/null +++ b/drivers/gpu/host1x/context.c @@ -0,0 +1,174 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (c) 2021, NVIDIA Corporation. + */ + +#include <linux/device.h> +#include <linux/kref.h> +#include <linux/of.h> +#include <linux/of_platform.h> +#include <linux/pid.h> +#include <linux/slab.h> + +#include "context.h" +#include "dev.h" + +/* + * Due to an issue with T194 NVENC, only 38 bits can be used. + * Anyway, 256GiB of IOVA ought to be enough for anyone. + */ +static dma_addr_t context_device_dma_mask = DMA_BIT_MASK(38); + +int host1x_context_list_init(struct host1x *host1x) +{ + struct host1x_context_list *cdl = &host1x->context_list; + struct host1x_context *ctx; + struct device_node *node; + int index; + int err; + + node = of_get_child_by_name(host1x->dev->of_node, "memory-contexts"); + if (!node) + return 0; + + cdl->devs = NULL; + cdl->len = 0; + mutex_init(&cdl->lock); + + err = of_property_count_u32_elems(node, "iommu-map"); + if (err < 0) { + err = 0; + goto put_node; + } + + cdl->devs = kcalloc(err, sizeof(*cdl->devs), GFP_KERNEL); + if (!cdl->devs) { + err = -ENOMEM; + goto put_node; + } + cdl->len = err / 4; + + for (index = 0; index < cdl->len; index++) { + struct iommu_fwspec *fwspec; + + ctx = &cdl->devs[index]; + + ctx->host = host1x; + + device_initialize(&ctx->dev); + + ctx->dev.dma_mask = &context_device_dma_mask; + ctx->dev.coherent_dma_mask = context_device_dma_mask; + dev_set_name(&ctx->dev, "host1x-ctx.%d", index); + ctx->dev.bus = &host1x_context_device_bus_type; + ctx->dev.parent = host1x->dev; + + dma_set_max_seg_size(&ctx->dev, UINT_MAX); + + err = device_add(&ctx->dev); + if (err) { + dev_err(host1x->dev, "could not add context device %d: %d\n", index, err); + goto del_devices; + } + + err = of_dma_configure_id(&ctx->dev, node, true, &index); + if (err) { + dev_err(host1x->dev, "IOMMU configuration failed for context device %d: %d\n", + index, err); + device_del(&ctx->dev); + goto del_devices; + } + + fwspec = dev_iommu_fwspec_get(&ctx->dev); + if (!fwspec) { + dev_err(host1x->dev, "Context device %d has no IOMMU!\n", index); + device_del(&ctx->dev); + goto del_devices; + } + + ctx->stream_id = fwspec->ids[0] & 0xffff; + } + + of_node_put(node); + + return 0; + +del_devices: + while (--index >= 0) + device_del(&cdl->devs[index].dev); + + kfree(cdl->devs); + cdl->len = 0; + +put_node: + of_node_put(node); + + return err; +} + +void host1x_context_list_free(struct host1x_context_list *cdl) +{ + int i; + + for (i = 0; i < cdl->len; i++) + device_del(&cdl->devs[i].dev); + + kfree(cdl->devs); + cdl->len = 0; +} + +struct host1x_context *host1x_context_alloc(struct host1x *host1x, + struct pid *pid) +{ + struct host1x_context_list *cdl = &host1x->context_list; + struct host1x_context *free = NULL; + int i; + + if (!cdl->len) + return ERR_PTR(-EOPNOTSUPP); + + mutex_lock(&cdl->lock); + + for (i = 0; i < cdl->len; i++) { + struct host1x_context *cd = &cdl->devs[i]; + + if (cd->owner == pid) { + refcount_inc(&cd->ref); + mutex_unlock(&cdl->lock); + return cd; + } else if (!cd->owner && !free) { + free = cd; + } + } + + if (!free) { + mutex_unlock(&cdl->lock); + return ERR_PTR(-EBUSY); + } + + refcount_set(&free->ref, 1); + free->owner = get_pid(pid); + + mutex_unlock(&cdl->lock); + + return free; +} +EXPORT_SYMBOL(host1x_context_alloc); + +void host1x_context_get(struct host1x_context *cd) +{ + refcount_inc(&cd->ref); +} +EXPORT_SYMBOL(host1x_context_get); + +void host1x_context_put(struct host1x_context *cd) +{ + struct host1x_context_list *cdl = &cd->host->context_list; + + if (refcount_dec_and_mutex_lock(&cd->ref, &cdl->lock)) { + put_pid(cd->owner); + cd->owner = NULL; + mutex_unlock(&cdl->lock); + } +} +EXPORT_SYMBOL(host1x_context_put); diff --git a/drivers/gpu/host1x/context.h b/drivers/gpu/host1x/context.h new file mode 100644 index 000000000000..268ecdf6b1bb --- /dev/null +++ b/drivers/gpu/host1x/context.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Host1x context devices + * + * Copyright (c) 2020, NVIDIA Corporation. + */ + +#ifndef __HOST1X_CONTEXT_H +#define __HOST1X_CONTEXT_H + +#include <linux/mutex.h> +#include <linux/refcount.h> + +struct host1x; + +extern struct bus_type host1x_context_device_bus_type; + +struct host1x_context_list { + struct mutex lock; + struct host1x_context *devs; + unsigned int len; +}; + +int host1x_context_list_init(struct host1x *host1x); +void host1x_context_list_free(struct host1x_context_list *cdl); + +#endif diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 6994f8c0e02e..40f64efd5865 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -28,6 +28,7 @@
#include "bus.h" #include "channel.h" +#include "context.h" #include "debug.h" #include "dev.h" #include "intr.h" @@ -502,10 +503,16 @@ static int host1x_probe(struct platform_device *pdev) goto iommu_exit; }
+ err = host1x_context_list_init(host); + if (err) { + dev_err(&pdev->dev, "failed to initialize context list\n"); + goto free_channels; + } + err = host1x_syncpt_init(host); if (err) { dev_err(&pdev->dev, "failed to initialize syncpts\n"); - goto free_channels; + goto free_contexts; }
err = host1x_intr_init(host, syncpt_irq); @@ -549,6 +556,8 @@ static int host1x_probe(struct platform_device *pdev) host1x_intr_deinit(host); deinit_syncpt: host1x_syncpt_deinit(host); +free_contexts: + host1x_context_list_free(&host->context_list); free_channels: host1x_channel_list_free(&host->channel_list); iommu_exit: @@ -568,6 +577,7 @@ static int host1x_remove(struct platform_device *pdev)
host1x_intr_deinit(host); host1x_syncpt_deinit(host); + host1x_context_list_free(&host->context_list); host1x_iommu_exit(host); host1x_bo_cache_destroy(&host->cache);
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index ca4b082f0cd4..92f4804d8b70 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -14,6 +14,7 @@
#include "cdma.h" #include "channel.h" +#include "context.h" #include "intr.h" #include "job.h" #include "syncpt.h" @@ -141,6 +142,7 @@ struct host1x { struct mutex syncpt_mutex;
struct host1x_channel_list channel_list; + struct host1x_context_list context_list;
struct dentry *debugfs;
diff --git a/include/linux/host1x.h b/include/linux/host1x.h index e8dc5bc41f79..9d9f1711472b 100644 --- a/include/linux/host1x.h +++ b/include/linux/host1x.h @@ -440,4 +440,21 @@ int tegra_mipi_disable(struct tegra_mipi_device *device); int tegra_mipi_start_calibration(struct tegra_mipi_device *device); int tegra_mipi_finish_calibration(struct tegra_mipi_device *device);
+/* host1x context devices */ + +struct host1x_context { + struct host1x *host; + + refcount_t ref; + struct pid *owner; + + struct device dev; + u32 stream_id; +}; + +struct host1x_context *host1x_context_alloc(struct host1x *host1x, + struct pid *pid); +void host1x_context_get(struct host1x_context *cd); +void host1x_context_put(struct host1x_context *cd); + #endif
18.02.2022 14:39, Mikko Perttunen пишет: ...
+/*
- Due to an issue with T194 NVENC, only 38 bits can be used.
- Anyway, 256GiB of IOVA ought to be enough for anyone.
- */
+static dma_addr_t context_device_dma_mask = DMA_BIT_MASK(38);
s/dma_addr_t/u64/ ? Apparently you should get compilation warning on ARM32.
https://elixir.bootlin.com/linux/v5.17-rc4/source/include/linux/device.h#L52...
+int host1x_context_list_init(struct host1x *host1x) +{
- struct host1x_context_list *cdl = &host1x->context_list;
- struct host1x_context *ctx;
- struct device_node *node;
- int index;
Nitpick: unsigned int
...
+del_devices:
- while (--index >= 0)
Nitpick: while (index--)
...
+void host1x_context_list_free(struct host1x_context_list *cdl) +{
- int i;
Nitpick: unsigned int
On 2/19/22 19:48, Dmitry Osipenko wrote:
18.02.2022 14:39, Mikko Perttunen пишет: ...
+/*
- Due to an issue with T194 NVENC, only 38 bits can be used.
- Anyway, 256GiB of IOVA ought to be enough for anyone.
- */
+static dma_addr_t context_device_dma_mask = DMA_BIT_MASK(38);
s/dma_addr_t/u64/ ? Apparently you should get compilation warning on ARM32.
https://elixir.bootlin.com/linux/v5.17-rc4/source/include/linux/device.h#L52... >
+int host1x_context_list_init(struct host1x *host1x) +{
- struct host1x_context_list *cdl = &host1x->context_list;
- struct host1x_context *ctx;
- struct device_node *node;
- int index;
Nitpick: unsigned int
...
+del_devices:
- while (--index >= 0)
Nitpick: while (index--)
... >> +void host1x_context_list_free(struct host1x_context_list *cdl)
+{
- int i;
Nitpick: unsigned int
Thanks, fixed all.
Mikko
18.02.2022 14:39, Mikko Perttunen пишет:
- for (index = 0; index < cdl->len; index++) {
struct iommu_fwspec *fwspec;
ctx = &cdl->devs[index];
ctx->host = host1x;
device_initialize(&ctx->dev);
ctx->dev.dma_mask = &context_device_dma_mask;
ctx->dev.coherent_dma_mask = context_device_dma_mask;
dev_set_name(&ctx->dev, "host1x-ctx.%d", index);
ctx->dev.bus = &host1x_context_device_bus_type;
host1x_context_device_bus_type will be an undefined symbol if CONFIG_TEGRA_HOST1X_CONTEXT_BUS=n? Please compile and test all combinations.
On 2/19/22 19:52, Dmitry Osipenko wrote:
18.02.2022 14:39, Mikko Perttunen пишет:
- for (index = 0; index < cdl->len; index++) {
struct iommu_fwspec *fwspec;
ctx = &cdl->devs[index];
ctx->host = host1x;
device_initialize(&ctx->dev);
ctx->dev.dma_mask = &context_device_dma_mask;
ctx->dev.coherent_dma_mask = context_device_dma_mask;
dev_set_name(&ctx->dev, "host1x-ctx.%d", index);
ctx->dev.bus = &host1x_context_device_bus_type;
host1x_context_device_bus_type will be an undefined symbol if CONFIG_TEGRA_HOST1X_CONTEXT_BUS=n? Please compile and test all combinations.
But this file is only built if CONFIG_HOST1X, which selects CONFIG_TEGRA_HOST1X_CONTEXT_BUS?
Mikko
Add code to do stream ID switching at the beginning of a job. The stream ID is switched to the stream ID specified by the context passed in the job structure.
Before switching the stream ID, an OP_DONE wait is done on the channel's engine to ensure that there is no residual ongoing work that might do DMA using the new stream ID.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/host1x/hw/channel_hw.c | 52 +++++++++++++++++++++-- drivers/gpu/host1x/hw/host1x06_hardware.h | 10 +++++ drivers/gpu/host1x/hw/host1x07_hardware.h | 10 +++++ include/linux/host1x.h | 4 ++ 4 files changed, 72 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c index 6b40e9af1e88..e23e1395c9f4 100644 --- a/drivers/gpu/host1x/hw/channel_hw.c +++ b/drivers/gpu/host1x/hw/channel_hw.c @@ -180,6 +180,45 @@ static void host1x_enable_gather_filter(struct host1x_channel *ch) #endif }
+static void host1x_channel_program_engine_streamid(struct host1x_job *job) +{ +#if HOST1X_HW >= 6 + u32 fence; + + if (!job->context) + return; + + fence = host1x_syncpt_incr_max(job->syncpt, 1); + + /* First, increment a syncpoint on OP_DONE condition.. */ + + host1x_cdma_push(&job->channel->cdma, + host1x_opcode_nonincr(HOST1X_UCLASS_INCR_SYNCPT, 1), + HOST1X_UCLASS_INCR_SYNCPT_INDX_F(job->syncpt->id) | + HOST1X_UCLASS_INCR_SYNCPT_COND_F(1)); + + /* Wait for syncpoint to increment */ + + host1x_cdma_push(&job->channel->cdma, + host1x_opcode_setclass(HOST1X_CLASS_HOST1X, + host1x_uclass_wait_syncpt_r(), 1), + host1x_class_host_wait_syncpt(job->syncpt->id, fence)); + + /* + * Now that we know the engine is idle, return to class and + * change stream ID. + */ + + host1x_cdma_push(&job->channel->cdma, + host1x_opcode_setclass(job->class, 0, 0), + HOST1X_OPCODE_NOP); + + host1x_cdma_push(&job->channel->cdma, + host1x_opcode_setpayload(job->context->stream_id), + host1x_opcode_setstreamid(job->engine_streamid_offset / 4)); +#endif +} + static int channel_submit(struct host1x_job *job) { struct host1x_channel *ch = job->channel; @@ -236,18 +275,23 @@ static int channel_submit(struct host1x_job *job) if (sp->base) synchronize_syncpt_base(job);
- syncval = host1x_syncpt_incr_max(sp, user_syncpt_incrs); - host1x_hw_syncpt_assign_to_channel(host, sp, ch);
- job->syncpt_end = syncval; - /* add a setclass for modules that require it */ if (job->class) host1x_cdma_push(&ch->cdma, host1x_opcode_setclass(job->class, 0, 0), HOST1X_OPCODE_NOP);
+ /* + * Ensure engine DMA is idle and set new stream ID. May increment + * syncpt max. + */ + host1x_channel_program_engine_streamid(job); + + syncval = host1x_syncpt_incr_max(sp, user_syncpt_incrs); + job->syncpt_end = syncval; + submit_gathers(job, syncval - user_syncpt_incrs);
/* end CDMA submit & stash pinned hMems into sync queue */ diff --git a/drivers/gpu/host1x/hw/host1x06_hardware.h b/drivers/gpu/host1x/hw/host1x06_hardware.h index 01a142a09800..5d515745eee7 100644 --- a/drivers/gpu/host1x/hw/host1x06_hardware.h +++ b/drivers/gpu/host1x/hw/host1x06_hardware.h @@ -127,6 +127,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count) return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count; }
+static inline u32 host1x_opcode_setstreamid(unsigned streamid) +{ + return (7 << 28) | streamid; +} + +static inline u32 host1x_opcode_setpayload(unsigned payload) +{ + return (9 << 28) | payload; +} + static inline u32 host1x_opcode_gather_wide(unsigned count) { return (12 << 28) | count; diff --git a/drivers/gpu/host1x/hw/host1x07_hardware.h b/drivers/gpu/host1x/hw/host1x07_hardware.h index e6582172ebfd..82c0cc9bb0b5 100644 --- a/drivers/gpu/host1x/hw/host1x07_hardware.h +++ b/drivers/gpu/host1x/hw/host1x07_hardware.h @@ -127,6 +127,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count) return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count; }
+static inline u32 host1x_opcode_setstreamid(unsigned streamid) +{ + return (7 << 28) | streamid; +} + +static inline u32 host1x_opcode_setpayload(unsigned payload) +{ + return (9 << 28) | payload; +} + static inline u32 host1x_opcode_gather_wide(unsigned count) { return (12 << 28) | count; diff --git a/include/linux/host1x.h b/include/linux/host1x.h index 9d9f1711472b..185ce6c56365 100644 --- a/include/linux/host1x.h +++ b/include/linux/host1x.h @@ -321,6 +321,10 @@ struct host1x_job {
/* Whether host1x-side firewall should be ran for this job or not */ bool enable_firewall; + + /* Options for configuring engine data stream ID */ + struct host1x_context *context; + u32 engine_streamid_offset; };
struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
Set itself as the IOMMU for the host1x context device bus, containing "dummy" devices used for Host1x context isolation.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 4bc75c4ce402..23082675d542 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -39,6 +39,7 @@
#include <linux/amba/bus.h> #include <linux/fsl/mc.h> +#include <linux/host1x_context_bus.h>
#include "arm-smmu.h"
@@ -2051,8 +2052,20 @@ static int arm_smmu_bus_init(struct iommu_ops *ops) goto err_reset_pci_ops; } #endif +#ifdef CONFIG_TEGRA_HOST1X_CONTEXT_BUS + if (!iommu_present(&host1x_context_device_bus_type)) { + err = bus_set_iommu(&host1x_context_device_bus_type, ops); + if (err) + goto err_reset_fsl_mc_ops; + } +#endif + return 0;
+err_reset_fsl_mc_ops: __maybe_unused; +#ifdef CONFIG_FSL_MC_BUS + bus_set_iommu(&fsl_mc_bus_type, NULL); +#endif err_reset_pci_ops: __maybe_unused; #ifdef CONFIG_PCI bus_set_iommu(&pci_bus_type, NULL);
Add Host1x context stream IDs on systems that support Host1x context isolation. Host1x and attached engines can use these stream IDs to allow isolation between memory used by different processes.
The specified stream IDs must match those configured by the hypervisor, if one is present.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com --- v2: * Added context devices on T194. * Use iommu-map instead of custom property. --- arch/arm64/boot/dts/nvidia/tegra186.dtsi | 12 ++++++++++++ arch/arm64/boot/dts/nvidia/tegra194.dtsi | 12 ++++++++++++ 2 files changed, 24 insertions(+)
diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi b/arch/arm64/boot/dts/nvidia/tegra186.dtsi index c91afff1b757..7c49a0281986 100644 --- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi @@ -1406,6 +1406,18 @@ host1x@13e00000 {
iommus = <&smmu TEGRA186_SID_HOST1X>;
+ memory-contexts { + iommu-map = < + 0 &smmu TEGRA186_SID_HOST1X_CTX0 1 + 1 &smmu TEGRA186_SID_HOST1X_CTX1 1 + 2 &smmu TEGRA186_SID_HOST1X_CTX2 1 + 3 &smmu TEGRA186_SID_HOST1X_CTX3 1 + 4 &smmu TEGRA186_SID_HOST1X_CTX4 1 + 5 &smmu TEGRA186_SID_HOST1X_CTX5 1 + 6 &smmu TEGRA186_SID_HOST1X_CTX6 1 + 7 &smmu TEGRA186_SID_HOST1X_CTX7 1>; + }; + dpaux1: dpaux@15040000 { compatible = "nvidia,tegra186-dpaux"; reg = <0x15040000 0x10000>; diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 2d48c3715fc6..240202f2669b 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -1686,6 +1686,18 @@ host1x@13e00000 { interconnect-names = "dma-mem"; iommus = <&smmu TEGRA194_SID_HOST1X>;
+ memory-contexts { + iommu-map = < + 0 &smmu TEGRA194_SID_HOST1X_CTX0 1 + 1 &smmu TEGRA194_SID_HOST1X_CTX1 1 + 2 &smmu TEGRA194_SID_HOST1X_CTX2 1 + 3 &smmu TEGRA194_SID_HOST1X_CTX3 1 + 4 &smmu TEGRA194_SID_HOST1X_CTX4 1 + 5 &smmu TEGRA194_SID_HOST1X_CTX5 1 + 6 &smmu TEGRA194_SID_HOST1X_CTX6 1 + 7 &smmu TEGRA194_SID_HOST1X_CTX7 1>; + }; + nvdec@15140000 { compatible = "nvidia,tegra194-nvdec"; reg = <0x15140000 0x00040000>;
The DMACTX field determines which context, as specified in the TRANSCFG register, is used. While during boot it doesn't matter which is used, later on it matters and this value is reused by the firmware.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/drm/tegra/falcon.c | 8 ++++++++ drivers/gpu/drm/tegra/falcon.h | 1 + 2 files changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/tegra/falcon.c b/drivers/gpu/drm/tegra/falcon.c index 223ab2ceb7e6..8bdb72f08f58 100644 --- a/drivers/gpu/drm/tegra/falcon.c +++ b/drivers/gpu/drm/tegra/falcon.c @@ -48,6 +48,14 @@ static int falcon_copy_chunk(struct falcon *falcon, if (target == FALCON_MEMORY_IMEM) cmd |= FALCON_DMATRFCMD_IMEM;
+ /* + * Use second DMA context (i.e. the one for firmware). Strictly + * speaking, at this point both DMA contexts point to the firmware + * stream ID, but this register's value will be reused by the firmware + * for later DMA transactions, so we need to use the correct value. + */ + cmd |= FALCON_DMATRFCMD_DMACTX(1); + falcon_writel(falcon, offset, FALCON_DMATRFMOFFS); falcon_writel(falcon, base, FALCON_DMATRFFBOFFS); falcon_writel(falcon, cmd, FALCON_DMATRFCMD); diff --git a/drivers/gpu/drm/tegra/falcon.h b/drivers/gpu/drm/tegra/falcon.h index c56ee32d92ee..1955cf11a8a6 100644 --- a/drivers/gpu/drm/tegra/falcon.h +++ b/drivers/gpu/drm/tegra/falcon.h @@ -50,6 +50,7 @@ #define FALCON_DMATRFCMD_IDLE (1 << 1) #define FALCON_DMATRFCMD_IMEM (1 << 4) #define FALCON_DMATRFCMD_SIZE_256B (6 << 8) +#define FALCON_DMATRFCMD_DMACTX(v) (((v) & 0x7) << 12)
#define FALCON_DMATRFFBOFFS 0x0000111c
Implement the get_streamid_offset required for supporting context isolation. Since old firmware cannot support context isolation without hacks that we don't want to implement, check the firmware binary to see if context isolation should be enabled.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/drm/tegra/vic.c | 38 +++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+)
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c index 1e342fa3d27b..2863ee5e0e67 100644 --- a/drivers/gpu/drm/tegra/vic.c +++ b/drivers/gpu/drm/tegra/vic.c @@ -38,6 +38,8 @@ struct vic { struct clk *clk; struct reset_control *rst;
+ bool can_use_context; + /* Platform configuration */ const struct vic_config *config; }; @@ -229,6 +231,7 @@ static int vic_load_firmware(struct vic *vic) { struct host1x_client *client = &vic->client.base; struct tegra_drm *tegra = vic->client.drm; + u32 fce_bin_data_offset; dma_addr_t iova; size_t size; void *virt; @@ -277,6 +280,25 @@ static int vic_load_firmware(struct vic *vic) vic->falcon.firmware.phys = phys; }
+ /* + * Check if firmware is new enough to not require mapping firmware + * to data buffer domains. + */ + fce_bin_data_offset = *(u32 *)(virt + VIC_UCODE_FCE_DATA_OFFSET); + + if (!vic->config->supports_sid) { + vic->can_use_context = false; + } else if (fce_bin_data_offset != 0x0 && fce_bin_data_offset != 0xa5a5a5a5) { + /* + * Firmware will access FCE through STREAMID0, so context + * isolation cannot be used. + */ + vic->can_use_context = false; + dev_warn_once(vic->dev, "context isolation disabled due to old firmware\n"); + } else { + vic->can_use_context = true; + } + return 0;
cleanup: @@ -358,10 +380,26 @@ static void vic_close_channel(struct tegra_drm_context *context) host1x_channel_put(context->channel); }
+static int vic_get_streamid_offset(struct tegra_drm_client *client) +{ + struct vic *vic = to_vic(client); + int err; + + err = vic_load_firmware(vic); + if (err < 0) + return err; + + if (vic->can_use_context) + return 0x30; + else + return -ENOTSUPP; +} + static const struct tegra_drm_client_ops vic_ops = { .open_channel = vic_open_channel, .close_channel = vic_close_channel, .submit = tegra_drm_submit, + .get_streamid_offset = vic_get_streamid_offset, };
#define NVIDIA_TEGRA_124_VIC_FIRMWARE "nvidia/tegra124/vic03_ucode.bin"
18.02.2022 14:39, Mikko Perttunen пишет:
+static int vic_get_streamid_offset(struct tegra_drm_client *client) +{
- struct vic *vic = to_vic(client);
- int err;
- err = vic_load_firmware(vic);
You can't invoke vic_load_firmware() while RPM is suspended. Either replace this with RPM get/put or do something else.
- if (err < 0)
return err;
- if (vic->can_use_context)
return 0x30;
- else
return -ENOTSUPP;
If (!vic->can_use_context) return -ENOTSUPP;
return 0x30;
19.02.2022 21:49, Dmitry Osipenko пишет:
18.02.2022 14:39, Mikko Perttunen пишет:
+static int vic_get_streamid_offset(struct tegra_drm_client *client) +{
- struct vic *vic = to_vic(client);
- int err;
- err = vic_load_firmware(vic);
You can't invoke vic_load_firmware() while RPM is suspended. Either replace this with RPM get/put or do something else.
- if (err < 0)
return err;
- if (vic->can_use_context)
return 0x30;
- else
return -ENOTSUPP;
If (!vic->can_use_context) return -ENOTSUPP;
return 0x30;
and s/ENOTSUPP/EOPNOTSUPP/
On 2/19/22 20:54, Dmitry Osipenko wrote:
19.02.2022 21:49, Dmitry Osipenko пишет:
18.02.2022 14:39, Mikko Perttunen пишет:
+static int vic_get_streamid_offset(struct tegra_drm_client *client) +{
- struct vic *vic = to_vic(client);
- int err;
- err = vic_load_firmware(vic);
You can't invoke vic_load_firmware() while RPM is suspended. Either replace this with RPM get/put or do something else.
Why not, I'm not seeing any HW accesses in vic_load_firmware? Although it looks like it might race with the vic_load_firmware call in vic_runtime_resume which probably needs to be fixed.
- if (err < 0)
return err;
- if (vic->can_use_context)
return 0x30;
- else
return -ENOTSUPP;
If (!vic->can_use_context) return -ENOTSUPP;
return 0x30;
and s/ENOTSUPP/EOPNOTSUPP/
Ok.
Mikko
21.02.2022 14:44, Mikko Perttunen пишет:
On 2/19/22 20:54, Dmitry Osipenko wrote:
19.02.2022 21:49, Dmitry Osipenko пишет:
18.02.2022 14:39, Mikko Perttunen пишет:
+static int vic_get_streamid_offset(struct tegra_drm_client *client) +{ + struct vic *vic = to_vic(client); + int err;
+ err = vic_load_firmware(vic);
You can't invoke vic_load_firmware() while RPM is suspended. Either replace this with RPM get/put or do something else.
Why not, I'm not seeing any HW accesses in vic_load_firmware? Although it looks like it might race with the vic_load_firmware call in vic_runtime_resume which probably needs to be fixed.
It was not clear from the function's name that h/w is untouched, I read "load" as "upload" and then looked at vic_runtime_resume(). I'd rename vic_load_firmware() to vic_prepare_firmware_image().
And yes, technically lock is needed.
On 2/21/22 22:10, Dmitry Osipenko wrote:
21.02.2022 14:44, Mikko Perttunen пишет:
On 2/19/22 20:54, Dmitry Osipenko wrote:
19.02.2022 21:49, Dmitry Osipenko пишет:
18.02.2022 14:39, Mikko Perttunen пишет:
+static int vic_get_streamid_offset(struct tegra_drm_client *client) +{ + struct vic *vic = to_vic(client); + int err;
+ err = vic_load_firmware(vic);
You can't invoke vic_load_firmware() while RPM is suspended. Either replace this with RPM get/put or do something else.
Why not, I'm not seeing any HW accesses in vic_load_firmware? Although it looks like it might race with the vic_load_firmware call in vic_runtime_resume which probably needs to be fixed.
It was not clear from the function's name that h/w is untouched, I read "load" as "upload" and then looked at vic_runtime_resume(). I'd rename vic_load_firmware() to vic_prepare_firmware_image().
And yes, technically lock is needed.
Yep, I'll consider renaming it.
Mikko
22.02.2022 11:27, Mikko Perttunen пишет:
On 2/21/22 22:10, Dmitry Osipenko wrote:
21.02.2022 14:44, Mikko Perttunen пишет:
On 2/19/22 20:54, Dmitry Osipenko wrote:
19.02.2022 21:49, Dmitry Osipenko пишет:
18.02.2022 14:39, Mikko Perttunen пишет:
+static int vic_get_streamid_offset(struct tegra_drm_client *client) +{ + struct vic *vic = to_vic(client); + int err;
+ err = vic_load_firmware(vic);
You can't invoke vic_load_firmware() while RPM is suspended. Either replace this with RPM get/put or do something else.
Why not, I'm not seeing any HW accesses in vic_load_firmware? Although it looks like it might race with the vic_load_firmware call in vic_runtime_resume which probably needs to be fixed.
It was not clear from the function's name that h/w is untouched, I read "load" as "upload" and then looked at vic_runtime_resume(). I'd rename vic_load_firmware() to vic_prepare_firmware_image().
And yes, technically lock is needed.
Yep, I'll consider renaming it.
Looking at this all again, I'd suggest to change:
int get_streamid_offset(client)
to:
int get_streamid_offset(client, *offset)
and bail out if get_streamid_offset() returns error. It's never okay to ignore errors.
On 2/22/22 12:46, Dmitry Osipenko wrote:
22.02.2022 11:27, Mikko Perttunen пишет:
On 2/21/22 22:10, Dmitry Osipenko wrote:
21.02.2022 14:44, Mikko Perttunen пишет:
On 2/19/22 20:54, Dmitry Osipenko wrote:
19.02.2022 21:49, Dmitry Osipenko пишет:
18.02.2022 14:39, Mikko Perttunen пишет: > +static int vic_get_streamid_offset(struct tegra_drm_client *client) > +{ > + struct vic *vic = to_vic(client); > + int err; > + > + err = vic_load_firmware(vic);
You can't invoke vic_load_firmware() while RPM is suspended. Either replace this with RPM get/put or do something else.
Why not, I'm not seeing any HW accesses in vic_load_firmware? Although it looks like it might race with the vic_load_firmware call in vic_runtime_resume which probably needs to be fixed.
It was not clear from the function's name that h/w is untouched, I read "load" as "upload" and then looked at vic_runtime_resume(). I'd rename vic_load_firmware() to vic_prepare_firmware_image().
And yes, technically lock is needed.
Yep, I'll consider renaming it.
Looking at this all again, I'd suggest to change:
int get_streamid_offset(client)
to:
int get_streamid_offset(client, *offset)
and bail out if get_streamid_offset() returns error. It's never okay to ignore errors.
Sure, seems reasonable. We'll still need some error code to indicate that context isolation isn't available for the engine and continue on in that case but that's better than just ignoring all of them.
Mikko
22.02.2022 13:54, Mikko Perttunen пишет:
On 2/22/22 12:46, Dmitry Osipenko wrote:
22.02.2022 11:27, Mikko Perttunen пишет:
On 2/21/22 22:10, Dmitry Osipenko wrote:
21.02.2022 14:44, Mikko Perttunen пишет:
On 2/19/22 20:54, Dmitry Osipenko wrote:
19.02.2022 21:49, Dmitry Osipenko пишет: > 18.02.2022 14:39, Mikko Perttunen пишет: >> +static int vic_get_streamid_offset(struct tegra_drm_client >> *client) >> +{ >> + struct vic *vic = to_vic(client); >> + int err; >> + >> + err = vic_load_firmware(vic); > > You can't invoke vic_load_firmware() while RPM is suspended. Either > replace this with RPM get/put or do something else.
Why not, I'm not seeing any HW accesses in vic_load_firmware? Although it looks like it might race with the vic_load_firmware call in vic_runtime_resume which probably needs to be fixed.
It was not clear from the function's name that h/w is untouched, I read "load" as "upload" and then looked at vic_runtime_resume(). I'd rename vic_load_firmware() to vic_prepare_firmware_image().
And yes, technically lock is needed.
Yep, I'll consider renaming it.
Looking at this all again, I'd suggest to change:
int get_streamid_offset(client)
to:
int get_streamid_offset(client, *offset)
and bail out if get_streamid_offset() returns error. It's never okay to ignore errors.
Sure, seems reasonable. We'll still need some error code to indicate that context isolation isn't available for the engine and continue on in that case but that's better than just ignoring all of them.
Yes, check for -EOPNOTSUPP and skip it.
On 2022-02-18 11:39, Mikko Perttunen via iommu wrote:
Implement the get_streamid_offset required for supporting context isolation. Since old firmware cannot support context isolation without hacks that we don't want to implement, check the firmware binary to see if context isolation should be enabled.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com
drivers/gpu/drm/tegra/vic.c | 38 +++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+)
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c index 1e342fa3d27b..2863ee5e0e67 100644 --- a/drivers/gpu/drm/tegra/vic.c +++ b/drivers/gpu/drm/tegra/vic.c @@ -38,6 +38,8 @@ struct vic { struct clk *clk; struct reset_control *rst;
- bool can_use_context;
- /* Platform configuration */ const struct vic_config *config; };
@@ -229,6 +231,7 @@ static int vic_load_firmware(struct vic *vic) { struct host1x_client *client = &vic->client.base; struct tegra_drm *tegra = vic->client.drm;
- u32 fce_bin_data_offset; dma_addr_t iova; size_t size; void *virt;
@@ -277,6 +280,25 @@ static int vic_load_firmware(struct vic *vic) vic->falcon.firmware.phys = phys; }
/*
* Check if firmware is new enough to not require mapping firmware
* to data buffer domains.
*/
fce_bin_data_offset = *(u32 *)(virt + VIC_UCODE_FCE_DATA_OFFSET);
if (!vic->config->supports_sid) {
vic->can_use_context = false;
} else if (fce_bin_data_offset != 0x0 && fce_bin_data_offset != 0xa5a5a5a5) {
/*
* Firmware will access FCE through STREAMID0, so context
* isolation cannot be used.
*/
vic->can_use_context = false;
dev_warn_once(vic->dev, "context isolation disabled due to old firmware\n");
} else {
vic->can_use_context = true;
}
return 0;
cleanup:
@@ -358,10 +380,26 @@ static void vic_close_channel(struct tegra_drm_context *context) host1x_channel_put(context->channel); }
+static int vic_get_streamid_offset(struct tegra_drm_client *client) +{
- struct vic *vic = to_vic(client);
- int err;
- err = vic_load_firmware(vic);
- if (err < 0)
return err;
- if (vic->can_use_context)
return 0x30;
- else
return -ENOTSUPP;
+}
- static const struct tegra_drm_client_ops vic_ops = { .open_channel = vic_open_channel, .close_channel = vic_close_channel, .submit = tegra_drm_submit,
- .get_streamid_offset = vic_get_streamid_offset,
The patch order seems off here, since the .get_streamid_offset member isn't defined yet.
Robin.
};
#define NVIDIA_TEGRA_124_VIC_FIRMWARE "nvidia/tegra124/vic03_ucode.bin"
On 2/21/22 19:27, Robin Murphy wrote:
On 2022-02-18 11:39, Mikko Perttunen via iommu wrote:
Implement the get_streamid_offset required for supporting context isolation. Since old firmware cannot support context isolation without hacks that we don't want to implement, check the firmware binary to see if context isolation should be enabled.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com
drivers/gpu/drm/tegra/vic.c | 38 +++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+)
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c index 1e342fa3d27b..2863ee5e0e67 100644 --- a/drivers/gpu/drm/tegra/vic.c +++ b/drivers/gpu/drm/tegra/vic.c @@ -38,6 +38,8 @@ struct vic { struct clk *clk; struct reset_control *rst; + bool can_use_context;
/* Platform configuration */ const struct vic_config *config; }; @@ -229,6 +231,7 @@ static int vic_load_firmware(struct vic *vic) { struct host1x_client *client = &vic->client.base; struct tegra_drm *tegra = vic->client.drm; + u32 fce_bin_data_offset; dma_addr_t iova; size_t size; void *virt; @@ -277,6 +280,25 @@ static int vic_load_firmware(struct vic *vic) vic->falcon.firmware.phys = phys; } + /* + * Check if firmware is new enough to not require mapping firmware + * to data buffer domains. + */ + fce_bin_data_offset = *(u32 *)(virt + VIC_UCODE_FCE_DATA_OFFSET);
+ if (!vic->config->supports_sid) { + vic->can_use_context = false; + } else if (fce_bin_data_offset != 0x0 && fce_bin_data_offset != 0xa5a5a5a5) { + /* + * Firmware will access FCE through STREAMID0, so context + * isolation cannot be used. + */ + vic->can_use_context = false; + dev_warn_once(vic->dev, "context isolation disabled due to old firmware\n"); + } else { + vic->can_use_context = true; + }
return 0; cleanup: @@ -358,10 +380,26 @@ static void vic_close_channel(struct tegra_drm_context *context) host1x_channel_put(context->channel); } +static int vic_get_streamid_offset(struct tegra_drm_client *client) +{ + struct vic *vic = to_vic(client); + int err;
+ err = vic_load_firmware(vic); + if (err < 0) + return err;
+ if (vic->can_use_context) + return 0x30; + else + return -ENOTSUPP; +}
static const struct tegra_drm_client_ops vic_ops = { .open_channel = vic_open_channel, .close_channel = vic_close_channel, .submit = tegra_drm_submit, + .get_streamid_offset = vic_get_streamid_offset,
The patch order seems off here, since the .get_streamid_offset member isn't defined yet.
Robin.
Indeed, will fix.
Thanks, Mikko
}; #define NVIDIA_TEGRA_124_VIC_FIRMWARE "nvidia/tegra124/vic03_ucode.bin"
For engines that support context isolation, allocate a context when opening a channel, and set up stream ID offset and context fields when submitting a job.
Signed-off-by: Mikko Perttunen mperttunen@nvidia.com --- drivers/gpu/drm/tegra/drm.h | 2 ++ drivers/gpu/drm/tegra/submit.c | 13 ++++++++++++ drivers/gpu/drm/tegra/uapi.c | 36 ++++++++++++++++++++++++++++++++-- 3 files changed, 49 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h index fc0a19554eac..717e9f81ee1f 100644 --- a/drivers/gpu/drm/tegra/drm.h +++ b/drivers/gpu/drm/tegra/drm.h @@ -80,6 +80,7 @@ struct tegra_drm_context {
/* Only used by new UAPI. */ struct xarray mappings; + struct host1x_context *memory_context; };
struct tegra_drm_client_ops { @@ -91,6 +92,7 @@ struct tegra_drm_client_ops { int (*submit)(struct tegra_drm_context *context, struct drm_tegra_submit *args, struct drm_device *drm, struct drm_file *file); + int (*get_streamid_offset)(struct tegra_drm_client *client); };
int tegra_drm_submit(struct tegra_drm_context *context, diff --git a/drivers/gpu/drm/tegra/submit.c b/drivers/gpu/drm/tegra/submit.c index 6d6dd8c35475..8d74b82b83a5 100644 --- a/drivers/gpu/drm/tegra/submit.c +++ b/drivers/gpu/drm/tegra/submit.c @@ -498,6 +498,9 @@ static void release_job(struct host1x_job *job) struct tegra_drm_submit_data *job_data = job->user_data; u32 i;
+ if (job->context) + host1x_context_put(job->context); + for (i = 0; i < job_data->num_used_mappings; i++) tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
@@ -599,6 +602,16 @@ int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data, job->release = release_job; job->timeout = 10000;
+ if (context->memory_context && context->client->ops->get_streamid_offset) { + int offset = context->client->ops->get_streamid_offset(context->client); + + if (offset >= 0) { + job->context = context->memory_context; + job->engine_streamid_offset = offset; + host1x_context_get(job->context); + } + } + /* * job_data is now part of job reference counting, so don't release * it from here. diff --git a/drivers/gpu/drm/tegra/uapi.c b/drivers/gpu/drm/tegra/uapi.c index 9ab9179d2026..be33da54d12c 100644 --- a/drivers/gpu/drm/tegra/uapi.c +++ b/drivers/gpu/drm/tegra/uapi.c @@ -33,6 +33,9 @@ static void tegra_drm_channel_context_close(struct tegra_drm_context *context) struct tegra_drm_mapping *mapping; unsigned long id;
+ if (context->memory_context) + host1x_context_put(context->memory_context); + xa_for_each(&context->mappings, id, mapping) tegra_drm_mapping_put(mapping);
@@ -72,6 +75,7 @@ static struct tegra_drm_client *tegra_drm_find_client(struct tegra_drm *tegra, u
int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data, struct drm_file *file) { + struct host1x *host = tegra_drm_to_host1x(drm->dev_private); struct tegra_drm_file *fpriv = file->driver_priv; struct tegra_drm *tegra = drm->dev_private; struct drm_tegra_channel_open *args = data; @@ -102,10 +106,29 @@ int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data, struct drm_ } }
+ /* Only allocate context if the engine supports context isolation. */ + if (client->ops->get_streamid_offset && + client->ops->get_streamid_offset(client) >= 0) { + context->memory_context = + host1x_context_alloc(host, get_task_pid(current, PIDTYPE_TGID)); + if (IS_ERR(context->memory_context)) { + if (PTR_ERR(context->memory_context) != -EOPNOTSUPP) { + err = PTR_ERR(context->memory_context); + goto put_channel; + } else { + /* + * OK, HW does not support contexts or contexts + * are disabled. + */ + context->memory_context = NULL; + } + } + } + err = xa_alloc(&fpriv->contexts, &args->context, context, XA_LIMIT(1, U32_MAX), GFP_KERNEL); if (err < 0) - goto put_channel; + goto put_memctx;
context->client = client; xa_init_flags(&context->mappings, XA_FLAGS_ALLOC1); @@ -118,6 +141,9 @@ int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data, struct drm_
return 0;
+put_memctx: + if (context->memory_context) + host1x_context_put(context->memory_context); put_channel: host1x_channel_put(context->channel); free: @@ -156,6 +182,7 @@ int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data, struct drm_f struct tegra_drm_mapping *mapping; struct tegra_drm_context *context; enum dma_data_direction direction; + struct device *mapping_dev; int err = 0;
if (args->flags & ~DRM_TEGRA_CHANNEL_MAP_READ_WRITE) @@ -177,6 +204,11 @@ int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data, struct drm_f
kref_init(&mapping->ref);
+ if (context->memory_context) + mapping_dev = &context->memory_context->dev; + else + mapping_dev = context->client->base.dev; + mapping->bo = tegra_gem_lookup(file, args->handle); if (!mapping->bo) { err = -EINVAL; @@ -201,7 +233,7 @@ int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data, struct drm_f goto put_gem; }
- mapping->map = host1x_bo_pin(context->client->base.dev, mapping->bo, direction, NULL); + mapping->map = host1x_bo_pin(mapping_dev, mapping->bo, direction, NULL); if (IS_ERR(mapping->map)) { err = PTR_ERR(mapping->map); goto put_gem;
18.02.2022 14:39, Mikko Perttunen пишет:
- if (context->memory_context && context->client->ops->get_streamid_offset) {
^^^
int offset = context->client->ops->get_streamid_offset(context->client);
if (offset >= 0) {
job->context = context->memory_context;
job->engine_streamid_offset = offset;
host1x_context_get(job->context);
}
You should bump refcount unconditionally or you'll get refcnt underflow on put, when offset < 0.
- }
- /*
- job_data is now part of job reference counting, so don't release
- it from here.
diff --git a/drivers/gpu/drm/tegra/uapi.c b/drivers/gpu/drm/tegra/uapi.c index 9ab9179d2026..be33da54d12c 100644 --- a/drivers/gpu/drm/tegra/uapi.c +++ b/drivers/gpu/drm/tegra/uapi.c @@ -33,6 +33,9 @@ static void tegra_drm_channel_context_close(struct tegra_drm_context *context) struct tegra_drm_mapping *mapping; unsigned long id;
- if (context->memory_context)
host1x_context_put(context->memory_context);
The "if (context->memory_context && context->client->ops->get_streamid_offset)" above doesn't match the "if (context->memory_context)". You'll get refcount underflow.
On 2/19/22 20:35, Dmitry Osipenko wrote:
18.02.2022 14:39, Mikko Perttunen пишет:
- if (context->memory_context && context->client->ops->get_streamid_offset) {
^^^
int offset = context->client->ops->get_streamid_offset(context->client);
if (offset >= 0) {
job->context = context->memory_context;
job->engine_streamid_offset = offset;
host1x_context_get(job->context);
}
You should bump refcount unconditionally or you'll get refcnt underflow on put, when offset < 0.
This refcount is intended to be dropped from 'release_job', where it's dropped if job->context is set, which it is from this path.
- }
- /*
- job_data is now part of job reference counting, so don't release
- it from here.
diff --git a/drivers/gpu/drm/tegra/uapi.c b/drivers/gpu/drm/tegra/uapi.c index 9ab9179d2026..be33da54d12c 100644 --- a/drivers/gpu/drm/tegra/uapi.c +++ b/drivers/gpu/drm/tegra/uapi.c @@ -33,6 +33,9 @@ static void tegra_drm_channel_context_close(struct tegra_drm_context *context) struct tegra_drm_mapping *mapping; unsigned long id;
- if (context->memory_context)
host1x_context_put(context->memory_context);
The "if (context->memory_context && context->client->ops->get_streamid_offset)" above doesn't match the "if (context->memory_context)". You'll get refcount underflow.
And this drop is for the refcount implicitly added when allocating the memory context through host1x_context_alloc; so these two places should be independent.
Please elaborate if I missed something.
Thanks, Mikko
21.02.2022 15:06, Mikko Perttunen пишет:
On 2/19/22 20:35, Dmitry Osipenko wrote:
18.02.2022 14:39, Mikko Perttunen пишет:
+ if (context->memory_context && context->client->ops->get_streamid_offset) {
^^^
+ int offset = context->client->ops->get_streamid_offset(context->client);
+ if (offset >= 0) { + job->context = context->memory_context; + job->engine_streamid_offset = offset; + host1x_context_get(job->context); + }
You should bump refcount unconditionally or you'll get refcnt underflow on put, when offset < 0.
This refcount is intended to be dropped from 'release_job', where it's dropped if job->context is set, which it is from this path.
+ }
/* * job_data is now part of job reference counting, so don't release * it from here. diff --git a/drivers/gpu/drm/tegra/uapi.c b/drivers/gpu/drm/tegra/uapi.c index 9ab9179d2026..be33da54d12c 100644 --- a/drivers/gpu/drm/tegra/uapi.c +++ b/drivers/gpu/drm/tegra/uapi.c @@ -33,6 +33,9 @@ static void tegra_drm_channel_context_close(struct tegra_drm_context *context) struct tegra_drm_mapping *mapping; unsigned long id; + if (context->memory_context) + host1x_context_put(context->memory_context);
The "if (context->memory_context && context->client->ops->get_streamid_offset)" above doesn't match the "if (context->memory_context)". You'll get refcount underflow.
And this drop is for the refcount implicitly added when allocating the memory context through host1x_context_alloc; so these two places should be independent.
Please elaborate if I missed something.
You named context as memory_context and then have context=context->memory_context. Please try to improve the variable names, like drm_ctx->host1x_ctx for example.
I'm also not a big fan of the "if (ref) put(ref)" pattern. Won't be better to move all the "if (!NULL)" checks inside of get()/put() and make the invocations unconditional?
On 2/21/22 22:02, Dmitry Osipenko wrote:
21.02.2022 15:06, Mikko Perttunen пишет:
On 2/19/22 20:35, Dmitry Osipenko wrote:
18.02.2022 14:39, Mikko Perttunen пишет:
+ if (context->memory_context && context->client->ops->get_streamid_offset) {
^^^
+ int offset = context->client->ops->get_streamid_offset(context->client);
+ if (offset >= 0) { + job->context = context->memory_context; + job->engine_streamid_offset = offset; + host1x_context_get(job->context); + }
You should bump refcount unconditionally or you'll get refcnt underflow on put, when offset < 0.
This refcount is intended to be dropped from 'release_job', where it's dropped if job->context is set, which it is from this path.
+ }
/* * job_data is now part of job reference counting, so don't release * it from here. diff --git a/drivers/gpu/drm/tegra/uapi.c b/drivers/gpu/drm/tegra/uapi.c index 9ab9179d2026..be33da54d12c 100644 --- a/drivers/gpu/drm/tegra/uapi.c +++ b/drivers/gpu/drm/tegra/uapi.c @@ -33,6 +33,9 @@ static void tegra_drm_channel_context_close(struct tegra_drm_context *context) struct tegra_drm_mapping *mapping; unsigned long id; + if (context->memory_context) + host1x_context_put(context->memory_context);
The "if (context->memory_context && context->client->ops->get_streamid_offset)" above doesn't match the "if (context->memory_context)". You'll get refcount underflow.
And this drop is for the refcount implicitly added when allocating the memory context through host1x_context_alloc; so these two places should be independent.
Please elaborate if I missed something.
You named context as memory_context and then have context=context->memory_context. Please try to improve the variable names, like drm_ctx->host1x_ctx for example.
I'm also not a big fan of the "if (ref) put(ref)" pattern. Won't be better to move all the "if (!NULL)" checks inside of get()/put() and make the invocations unconditional?
I agree that the naming here is confusing with different kinds of contexts flying around, though I would prefer not to change the original 'struct tegra_drm_context *context' since it's used all around the code. But I'll try to make it clearer.
Regarding moving NULL checks inside get/put, I personally dislike that pattern (also with e.g. kfree) since when reading the code, it makes it more difficult to see that the pointer can be NULL.
Mikko
dri-devel@lists.freedesktop.org