[PATCH v2 1/3] drm/amdgpu: Reject submissions with too many IBs.
Christian König
ckoenig.leichtzumerken at gmail.com
Tue Apr 11 12:34:58 UTC 2023
Am 09.04.23 um 20:59 schrieb Bas Nieuwenhuizen:
> Instead of failing somewhere in the scheduler after the
> ioctl has already succeeded.
>
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2498
> Signed-off-by: Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 9 +++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 5 +++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
> 3 files changed, 15 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index 995ee9ff65c9..8db6618b9049 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -113,6 +113,15 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> if (!entity)
> return 0;
>
> + if (entity->rq && entity->rq->sched) {
I've just double checked and this stuff here is not necessary
initialized yet. We need to move this a bit around.
Probably best place for the check is in amdgpu_cs_submit() after calling
drm_sched_job_arm().
Alternatively we could go the other way around. Instead of keeping the
max_ibs in the ring we keep a max_ibs per ip_type in adev and make sure
that each ring can handle at least those during initialization.
Then we can check if the num_ibs are valid in amdgpu_cs_p1_ib() when we
count them.
Thinking more about it the later is probably the better variant.
Regards,
Christian.
> + struct amdgpu_ring *ring = to_amdgpu_ring(entity->rq->sched);
> +
> + if (num_ibs > ring->max_ibs) {
> + DRM_DEBUG("Rejected a submission with too many IBs");
> + return -EINVAL;
> + }
> + }
> +
> return drm_sched_job_init(&(*job)->base, entity, owner);
> }
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index dc474b809604..933cb95a0e30 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -324,6 +324,11 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring,
> ring->max_dw = max_dw;
> ring->hw_prio = hw_prio;
>
> + if (ring->funcs->emit_ib_size) {
> + ring->max_ibs =
> + (max_dw - ring->funcs->emit_frame_size) / ring->funcs->emit_ib_size;
> + }
> +
> if (!ring->no_scheduler) {
> hw_ip = ring->funcs->type;
> num_sched = &adev->gpu_sched[hw_ip][hw_prio].num_scheds;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 3989e755a5b4..7a295d80728b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -245,6 +245,7 @@ struct amdgpu_ring {
> unsigned ring_size;
> unsigned max_dw;
> int count_dw;
> + unsigned max_ibs;
> uint64_t gpu_addr;
> uint64_t ptr_mask;
> uint32_t buf_mask;
More information about the amd-gfx
mailing list