[PATCH] drm/amdgpu: Return -EINVAL when whole gpu reset happened
Christian König
christian.koenig at amd.com
Wed Dec 9 10:06:24 UTC 2020
Am 09.12.20 um 10:46 schrieb Liu ChengZhe:
> If CS init return -ECANCELED, UMD will free and create new context.
> Job in this new context could conitnue exexcuting. In the case of
> BACO or mode 1, we can't allow this happpen. Because VRAM has lost
> after whole gpu reset, the job can't guarantee to succeed.
NAK, this is intentional.
When ECANCELED is returned UMD should create new context after a GPU
reset to get back into an usable state and continue to submit jobs.
Regards,
Christian.
>
> Signed-off-by: Liu ChengZhe <ChengZhe.Liu at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 85e48c29a57c..2a98f58134ed 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -120,6 +120,7 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, union drm_amdgpu_cs
> uint64_t *chunk_array;
> unsigned size, num_ibs = 0;
> uint32_t uf_offset = 0;
> + uint32_t vramlost_count = 0;
> int i;
> int ret;
>
> @@ -140,7 +141,11 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, union drm_amdgpu_cs
>
> /* skip guilty context job */
> if (atomic_read(&p->ctx->guilty) == 1) {
> - ret = -ECANCELED;
> + vramlost_count = atomic_read(&p->adev->vram_lost_counter);
> + if (p->ctx->vram_lost_counter != vramlost_count)
> + ret = -EINVAL;
> + else
> + ret = -ECANCELED;
> goto free_chunk;
> }
>
> @@ -246,7 +251,7 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, union drm_amdgpu_cs
> goto free_all_kdata;
>
> if (p->ctx->vram_lost_counter != p->job->vram_lost_counter) {
> - ret = -ECANCELED;
> + ret = -EINVAL;
> goto free_all_kdata;
> }
>
More information about the amd-gfx
mailing list