Hello,
Here are 2 fixes and one improvement for the page fault handling. Those bugs were found while working on indirect draw supports which requires the allocation of a big heap buffer for varyings, and the vertex/tiler shaders seem to have access pattern that trigger those issues. I remember discussing the first issue with Steve or Robin a while back, but we never hit it before (now we do :)).
The last patch is a perf improvement: no need to re-enable hardware interrupts if we know the threaded irq handler will be woken up right away.
Regards,
Boris
Boris Brezillon (3): drm/panfrost: Clear MMU irqs before handling the fault drm/panfrost: Don't try to map pages that are already mapped drm/panfrost: Stay in the threaded MMU IRQ handler until we've handled all IRQs
drivers/gpu/drm/panfrost/panfrost_mmu.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-)
When a fault is handled it will unblock the GPU which will continue executing its shader and might fault almost immediately on a different page. If we clear interrupts after handling the fault we might miss new faults, so clear them before.
Cc: stable@vger.kernel.org Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations") Signed-off-by: Boris Brezillon boris.brezillon@collabora.com --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 7c1b3481b785..904d63450862 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -593,6 +593,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) access_type = (fault_status >> 8) & 0x3; source_id = (fault_status >> 16);
+ mmu_write(pfdev, MMU_INT_CLEAR, mask); + /* Page fault only */ ret = -1; if ((status & mask) == BIT(i) && (exception_type & 0xF8) == 0xC0) @@ -616,8 +618,6 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) access_type, access_type_name(pfdev, fault_status), source_id);
- mmu_write(pfdev, MMU_INT_CLEAR, mask); - status &= ~mask; }
On 01/02/2021 08:21, Boris Brezillon wrote:
When a fault is handled it will unblock the GPU which will continue executing its shader and might fault almost immediately on a different page. If we clear interrupts after handling the fault we might miss new faults, so clear them before.
Cc: stable@vger.kernel.org Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations") Signed-off-by: Boris Brezillon boris.brezillon@collabora.com
Good catch (although this oddly rings a bell - so perhaps it was me you discussed it with before)
Reviewed-by: Steven Price steven.price@arm.com
drivers/gpu/drm/panfrost/panfrost_mmu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 7c1b3481b785..904d63450862 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -593,6 +593,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) access_type = (fault_status >> 8) & 0x3; source_id = (fault_status >> 16);
mmu_write(pfdev, MMU_INT_CLEAR, mask);
- /* Page fault only */ ret = -1; if ((status & mask) == BIT(i) && (exception_type & 0xF8) == 0xC0)
@@ -616,8 +618,6 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) access_type, access_type_name(pfdev, fault_status), source_id);
mmu_write(pfdev, MMU_INT_CLEAR, mask);
- status &= ~mask; }
We allocate 2MB chunks at a time, so it might appear that a page fault has already been handled by a previous page fault when we reach panfrost_mmu_map_fault_addr(). Bail out in that case to avoid mapping the same area twice.
Cc: stable@vger.kernel.org Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations") Signed-off-by: Boris Brezillon boris.brezillon@collabora.com --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 904d63450862..21e552d1ac71 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -488,8 +488,14 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, } bo->base.pages = pages; bo->base.pages_use_count = 1; - } else + } else { pages = bo->base.pages; + if (pages[page_offset]) { + /* Pages are already mapped, bail out. */ + mutex_unlock(&bo->base.pages_lock); + goto out; + } + }
mapping = bo->base.base.filp->f_mapping; mapping_set_unevictable(mapping); @@ -522,6 +528,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
+out: panfrost_gem_mapping_put(bomapping);
return 0;
On 01/02/2021 08:21, Boris Brezillon wrote:
We allocate 2MB chunks at a time, so it might appear that a page fault has already been handled by a previous page fault when we reach panfrost_mmu_map_fault_addr(). Bail out in that case to avoid mapping the same area twice.
Cc: stable@vger.kernel.org Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations") Signed-off-by: Boris Brezillon boris.brezillon@collabora.com
Reviewed-by: Steven Price steven.price@arm.com
drivers/gpu/drm/panfrost/panfrost_mmu.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 904d63450862..21e552d1ac71 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -488,8 +488,14 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, } bo->base.pages = pages; bo->base.pages_use_count = 1;
- } else
} else { pages = bo->base.pages;
if (pages[page_offset]) {
/* Pages are already mapped, bail out. */
mutex_unlock(&bo->base.pages_lock);
goto out;
}
}
mapping = bo->base.base.filp->f_mapping; mapping_set_unevictable(mapping);
@@ -522,6 +528,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as,
dev_dbg(pfdev->dev, "mapped page fault @ AS%d %llx", as, addr);
+out: panfrost_gem_mapping_put(bomapping);
return 0;
Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay in the threaded irq handler as long as we can.
Signed-off-by: Boris Brezillon boris.brezillon@collabora.com --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 21e552d1ac71..65bc20628c4e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); int i, ret;
+again: + for (i = 0; status; i++) { u32 mask = BIT(i) | BIT(i + 16); u64 addr; @@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) status &= ~mask; }
+ /* If we received new MMU interrupts, process them before returning. */ + status = mmu_read(pfdev, MMU_INT_RAWSTAT); + if (status) + goto again; + mmu_write(pfdev, MMU_INT_MASK, ~0); return IRQ_HANDLED; };
On 01/02/2021 08:21, Boris Brezillon wrote:
Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay in the threaded irq handler as long as we can.
Signed-off-by: Boris Brezillon boris.brezillon@collabora.com
Looks fine to me, but I'm interested to know if you actually saw a performance improvement. Back-to-back MMU faults should (hopefully) be fairly uncommon.
Regardless:
Reviewed-by: Steven Price steven.price@arm.com
drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 21e552d1ac71..65bc20628c4e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); int i, ret;
+again:
- for (i = 0; status; i++) { u32 mask = BIT(i) | BIT(i + 16); u64 addr;
@@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) status &= ~mask; }
- /* If we received new MMU interrupts, process them before returning. */
- status = mmu_read(pfdev, MMU_INT_RAWSTAT);
- if (status)
goto again;
- mmu_write(pfdev, MMU_INT_MASK, ~0); return IRQ_HANDLED; };
On Mon, 1 Feb 2021 12:13:49 +0000 Steven Price steven.price@arm.com wrote:
On 01/02/2021 08:21, Boris Brezillon wrote:
Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay in the threaded irq handler as long as we can.
Signed-off-by: Boris Brezillon boris.brezillon@collabora.com
Looks fine to me, but I'm interested to know if you actually saw a performance improvement. Back-to-back MMU faults should (hopefully) be fairly uncommon.
I actually didn't check the perf improvement or the actual number of back-to-back MMU faults, but dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1000x1000_drawcount_5000 seemed to generate a few of those, so I thought it'd be good to optimize that case given how trivial it is.
Regardless:
Reviewed-by: Steven Price steven.price@arm.com
drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 21e552d1ac71..65bc20628c4e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); int i, ret;
+again:
- for (i = 0; status; i++) { u32 mask = BIT(i) | BIT(i + 16); u64 addr;
@@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) status &= ~mask; }
- /* If we received new MMU interrupts, process them before returning. */
- status = mmu_read(pfdev, MMU_INT_RAWSTAT);
- if (status)
goto again;
- mmu_write(pfdev, MMU_INT_MASK, ~0); return IRQ_HANDLED; };
On 01/02/2021 12:59, Boris Brezillon wrote:
On Mon, 1 Feb 2021 12:13:49 +0000 Steven Price steven.price@arm.com wrote:
On 01/02/2021 08:21, Boris Brezillon wrote:
Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay in the threaded irq handler as long as we can.
Signed-off-by: Boris Brezillon boris.brezillon@collabora.com
Looks fine to me, but I'm interested to know if you actually saw a performance improvement. Back-to-back MMU faults should (hopefully) be fairly uncommon.
I actually didn't check the perf improvement or the actual number of back-to-back MMU faults, but dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1000x1000_drawcount_5000 seemed to generate a few of those, so I thought it'd be good to optimize that case given how trivial it is.
Fair enough! I was just a little concerned that Panfrost was somehow provoking enough interrupts that this was a measurable performance improvement.
I assume you'll push these to drm-misc-next (/fixes) as appropriate.
Thanks,
Steve
Regardless:
Reviewed-by: Steven Price steven.price@arm.com
drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 21e552d1ac71..65bc20628c4e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); int i, ret;
+again:
- for (i = 0; status; i++) { u32 mask = BIT(i) | BIT(i + 16); u64 addr;
@@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) status &= ~mask; }
- /* If we received new MMU interrupts, process them before returning. */
- status = mmu_read(pfdev, MMU_INT_RAWSTAT);
- if (status)
goto again;
- mmu_write(pfdev, MMU_INT_MASK, ~0); return IRQ_HANDLED; };
On Mon, 1 Feb 2021 13:24:00 +0000 Steven Price steven.price@arm.com wrote:
On 01/02/2021 12:59, Boris Brezillon wrote:
On Mon, 1 Feb 2021 12:13:49 +0000 Steven Price steven.price@arm.com wrote:
On 01/02/2021 08:21, Boris Brezillon wrote:
Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay in the threaded irq handler as long as we can.
Signed-off-by: Boris Brezillon boris.brezillon@collabora.com
Looks fine to me, but I'm interested to know if you actually saw a performance improvement. Back-to-back MMU faults should (hopefully) be fairly uncommon.
I actually didn't check the perf improvement or the actual number of back-to-back MMU faults, but dEQP-GLES31.functional.draw_indirect.compute_interop.large.drawelements_combined_grid_1000x1000_drawcount_5000 seemed to generate a few of those, so I thought it'd be good to optimize that case given how trivial it is.
Fair enough! I was just a little concerned that Panfrost was somehow provoking enough interrupts that this was a measurable performance improvement.
I assume you'll push these to drm-misc-next (/fixes) as appropriate.
I think I'll just queue everything to misc-next and let the stable maintainers backport the fixes (no one complained about this issue so far).
Thanks,
Steve
Regardless:
Reviewed-by: Steven Price steven.price@arm.com
drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 21e552d1ac71..65bc20628c4e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); int i, ret;
+again:
- for (i = 0; status; i++) { u32 mask = BIT(i) | BIT(i + 16); u64 addr;
@@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) status &= ~mask; }
- /* If we received new MMU interrupts, process them before returning. */
- status = mmu_read(pfdev, MMU_INT_RAWSTAT);
- if (status)
goto again;
- mmu_write(pfdev, MMU_INT_MASK, ~0); return IRQ_HANDLED; };
On Mon, Feb 1, 2021 at 2:21 AM Boris Brezillon boris.brezillon@collabora.com wrote:
Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay in the threaded irq handler as long as we can.
Signed-off-by: Boris Brezillon boris.brezillon@collabora.com
drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 21e552d1ac71..65bc20628c4e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); int i, ret;
+again:
for (i = 0; status; i++) { u32 mask = BIT(i) | BIT(i + 16); u64 addr;
@@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) status &= ~mask; }
/* If we received new MMU interrupts, process them before returning. */
status = mmu_read(pfdev, MMU_INT_RAWSTAT);
if (status)
goto again;
Can't we avoid the goto? Change the for loop like this:
i = 0; while (status = mmu_read(pfdev, MMU_INT_RAWSTAT)) { ...
i++; if (i == 16) i = 0; }
mmu_write(pfdev, MMU_INT_MASK, ~0); return IRQ_HANDLED;
};
2.26.2
On 03/02/2021 14:45, Rob Herring wrote:
On Mon, Feb 1, 2021 at 2:21 AM Boris Brezillon boris.brezillon@collabora.com wrote:
Doing a hw-irq -> threaded-irq round-trip is counter-productive, stay in the threaded irq handler as long as we can.
Signed-off-by: Boris Brezillon boris.brezillon@collabora.com
drivers/gpu/drm/panfrost/panfrost_mmu.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 21e552d1ac71..65bc20628c4e 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -580,6 +580,8 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) u32 status = mmu_read(pfdev, MMU_INT_RAWSTAT); int i, ret;
+again:
for (i = 0; status; i++) { u32 mask = BIT(i) | BIT(i + 16); u64 addr;
@@ -628,6 +630,11 @@ static irqreturn_t panfrost_mmu_irq_handler_thread(int irq, void *data) status &= ~mask; }
/* If we received new MMU interrupts, process them before returning. */
status = mmu_read(pfdev, MMU_INT_RAWSTAT);
if (status)
goto again;
Can't we avoid the goto? Change the for loop like this:
i = 0; while (status = mmu_read(pfdev, MMU_INT_RAWSTAT)) { ...
i++; if (i == 16) i = 0;
}
Well that reads from the RAWSTAT register excessively (which could be expensive at low GPU clock speeds), but we could do:
for(i = 0; status; i++) { ...
if (!status) { i = 0; status = mmu_read(pfdev, MMU_INT_RAWSTAT); } }
(or similar with a while() if you prefer). Of course we could even get rid of the 'i' loop altogether:
status = mmu_read(pfdev, MMU_INT_RAWSTAT); while (status) { int i = ffs(status | (status >> 16)) - 1;
... existing code ...
status &= ~mask; if (!status) status = mmu_read(pfdev, MMU_INT_RAWSTAT); }
Steve
mmu_write(pfdev, MMU_INT_MASK, ~0); return IRQ_HANDLED;
};
2.26.2
dri-devel@lists.freedesktop.org