[PATCH 1/3] drm/panthor: Fix tiler OOM handling to allow incremental rendering
Boris Brezillon
boris.brezillon at collabora.com
Thu Apr 25 09:45:23 UTC 2024
On Thu, 25 Apr 2024 10:28:49 +0100
Steven Price <steven.price at arm.com> wrote:
> On 25/04/2024 08:18, Boris Brezillon wrote:
> > From: Antonino Maniscalco <antonino.maniscalco at collabora.com>
> >
> > If the kernel couldn't allocate memory because we reached the maximum
> > number of chunks but no render passes are in flight
> > (panthor_heap_grow() returning -ENOMEM), we should defer the OOM
> > handling to the FW by returning a NULL chunk. The FW will then call
> > the tiler OOM exception handler, which is supposed to implement
> > incremental rendering (execute an intermediate fragment job to flush
> > the pending primitives, release the tiler memory that was used to
> > store those primitives, and start over from where it stopped).
> >
> > Fixes: de8548813824 ("drm/panthor: Add the scheduler logical block")
> > Signed-off-by: Antonino Maniscalco <antonino.maniscalco at collabora.com>
> > Signed-off-by: Boris Brezillon <boris.brezillon at collabora.com>
>
> Reviewed-by: Steven Price <steven.price at arm.com>
>
> Although I think the real issue here is that we haven't clearly defined
> the return values from panthor_heap_grow - it's a bit weird to have two
> different error codes for the same "try again later after incremental
> rendering" result. But as a fix this seems most clear.
Yeah, I actually considered returning -EBUSY for the 'max_chunks
reached' situation, but then realized we would also want to trigger
incremental rendering for actual mem allocation failures (when
chunk_count < max_chunks) once the fail-able/non-blocking allocation
logic is implemented, and for this kind of failure it makes more sense
to return -ENOMEM, even though this implies checking against two values
instead of one.
I guess returning -ENOMEM instead of -EBUSY for the case where we have
render passes in-flight wouldn't be too awkward, as this can be seen as
the kernel refusing to allocate more memory.
>
> Steve
>
> > ---
> > drivers/gpu/drm/panthor/panthor_sched.c | 8 +++++++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> > index b3a51a6de523..6de8c0c702cb 100644
> > --- a/drivers/gpu/drm/panthor/panthor_sched.c
> > +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> > @@ -1354,7 +1354,13 @@ static int group_process_tiler_oom(struct panthor_group *group, u32 cs_id)
> > pending_frag_count, &new_chunk_va);
> > }
> >
> > - if (ret && ret != -EBUSY) {
> > + /* If the kernel couldn't allocate memory because we reached the maximum
> > + * number of chunks (EBUSY if we have render passes in flight, ENOMEM
> > + * otherwise), we want to let the FW try to reclaim memory by waiting
> > + * for fragment jobs to land or by executing the tiler OOM exception
> > + * handler, which is supposed to implement incremental rendering.
> > + */
> > + if (ret && ret != -EBUSY && ret != -ENOMEM) {
> > drm_warn(&ptdev->base, "Failed to extend the tiler heap\n");
> > group->fatal_queues |= BIT(cs_id);
> > sched_queue_delayed_work(sched, tick, 0);
>
More information about the dri-devel
mailing list