[amd-gfx] [PATCH 1/3] drm/amdgpu: add disable_cu parameter

Nicolai Hähnle nhaehnle at gmail.com
Fri Jun 17 14:17:51 UTC 2016


On 17.06.2016 15:31, StDenis, Tom wrote:
> I wonder if some sort of self-test like the ring/ib tests we do is a
> good idea.  Either from the UMD or KMD.
>
>
> In this specific case though are you working around a CU that results in
> a GPU lockup?  Or does it just not respond correctly?

Computations in that CU flip bits occasionally. It actually wasn't 
noticeable at all in regular desktop use, and I didn't see traces of it 
with the usual benchmarks and games either -- only in hindsight did I 
notice some slightly wrong pixels when zooming into screenshots of the 
desktop.

I also hope to use this option to do more extensive stress tests of 
whether we can still run stably with many CUs disabled - I suspect an 
interaction between CU disabling and CU reservations for shader stages.

I don't think an automatic self-test is feasible for the kernel module, 
and from user space, "stress testing" with Piglit is precisely how I 
found it :)

Nicolai

>
>
> Tom
>
>
>
> ------------------------------------------------------------------------
> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of
> Nicolai Hähnle <nhaehnle at gmail.com>
> *Sent:* Friday, June 17, 2016 09:17
> *To:* amd-gfx at lists.freedesktop.org
> *Cc:* Haehnle, Nicolai
> *Subject:* [amd-gfx] [PATCH 1/3] drm/amdgpu: add disable_cu parameter
> From: Nicolai Hähnle <nicolai.haehnle at amd.com>
>
> This parameter will allow disabling individual CUs on module load, e.g.
> amdgpu.disable_cu=2.0.3,2.0.4 to disable CUs 3 and 4 of SE2.
>
> Signed-off-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  4 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 44
> +++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  2 ++
>   4 files changed, 51 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 01c36b8..2d35e11 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -87,6 +87,7 @@ extern int amdgpu_sched_hw_submission;
>   extern int amdgpu_powerplay;
>   extern unsigned amdgpu_pcie_gen_cap;
>   extern unsigned amdgpu_pcie_lane_cap;
> +extern char *amdgpu_disable_cu;
>
>   #define AMDGPU_WAIT_IDLE_TIMEOUT_IN_MS          3000
>   #define AMDGPU_MAX_USEC_TIMEOUT                 100000  /* 100 ms */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index f888c01..235f732 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -84,6 +84,7 @@ int amdgpu_sched_hw_submission = 2;
>   int amdgpu_powerplay = -1;
>   unsigned amdgpu_pcie_gen_cap = 0;
>   unsigned amdgpu_pcie_lane_cap = 0;
> +char *amdgpu_disable_cu = NULL;
>
>   MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in megabytes");
>   module_param_named(vramlimit, amdgpu_vram_limit, int, 0600);
> @@ -168,6 +169,9 @@ module_param_named(pcie_gen_cap,
> amdgpu_pcie_gen_cap, uint, 0444);
>   MODULE_PARM_DESC(pcie_lane_cap, "PCIE Lane Caps (0: autodetect
> (default))");
>   module_param_named(pcie_lane_cap, amdgpu_pcie_lane_cap, uint, 0444);
>
> +MODULE_PARM_DESC(disable_cu, "Disable CUs (se.sh.cu,...)");
> +module_param_named(disable_cu, amdgpu_disable_cu, charp, 0444);
> +
>   static const struct pci_device_id pciidlist[] = {
>   #ifdef CONFIG_DRM_AMDGPU_CIK
>           /* Kaveri */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 9f95da4..a074edd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -70,3 +70,47 @@ void amdgpu_gfx_scratch_free(struct amdgpu_device
> *adev, uint32_t reg)
>                   }
>           }
>   }
> +
> +/**
> + * amdgpu_gfx_parse_disable_cu - Parse the disable_cu module parameter
> + *
> + * @mask: array in which the per-shader array disable masks will be stored
> + * @max_se: number of SEs
> + * @max_sh: number of SHs
> + *
> + * The bitmask of CUs to be disabled in the shader array determined by
> se and
> + * sh is stored in mask[se * max_sh + sh].
> + */
> +void amdgpu_gfx_parse_disable_cu(unsigned *mask, unsigned max_se,
> unsigned max_sh)
> +{
> +       unsigned se, sh, cu;
> +       const char *p;
> +
> +       memset(mask, 0, sizeof(*mask) * max_se * max_sh);
> +
> +       if (!amdgpu_disable_cu || !*amdgpu_disable_cu)
> +               return;
> +
> +       p = amdgpu_disable_cu;
> +       for (;;) {
> +               char *next;
> +               int ret = sscanf(p, "%u.%u.%u", &se, &sh, &cu);
> +               if (ret < 3) {
> +                       DRM_ERROR("amdgpu: could not parse disable_cu\n");
> +                       return;
> +               }
> +
> +               if (se < max_se && sh < max_sh && cu < 16) {
> +                       DRM_INFO("amdgpu: disabling CU %u.%u.%u\n", se,
> sh, cu);
> +                       mask[se * max_sh + sh] |= 1u << cu;
> +               } else {
> +                       DRM_ERROR("amdgpu: disable_cu %u.%u.%u is out of
> range\n",
> +                                 se, sh, cu);
> +               }
> +
> +               next = strchr(p, ',');
> +               if (!next)
> +                       break;
> +               p = next + 1;
> +       }
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index dc06cbd..51321e1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -27,4 +27,6 @@
>   int amdgpu_gfx_scratch_get(struct amdgpu_device *adev, uint32_t *reg);
>   void amdgpu_gfx_scratch_free(struct amdgpu_device *adev, uint32_t reg);
>
> +unsigned amdgpu_gfx_parse_disable_cu(unsigned *mask, unsigned max_se,
> unsigned max_sh);
> +
>   #endif
> --
> 2.7.4
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>


More information about the amd-gfx mailing list