[amd-gfx] [PATCH 1/3] drm/amdgpu: add disable_cu parameter
Nicolai Hähnle
nhaehnle at gmail.com
Fri Jun 17 14:17:51 UTC 2016
On 17.06.2016 15:31, StDenis, Tom wrote:
> I wonder if some sort of self-test like the ring/ib tests we do is a
> good idea. Either from the UMD or KMD.
>
>
> In this specific case though are you working around a CU that results in
> a GPU lockup? Or does it just not respond correctly?
Computations in that CU flip bits occasionally. It actually wasn't
noticeable at all in regular desktop use, and I didn't see traces of it
with the usual benchmarks and games either -- only in hindsight did I
notice some slightly wrong pixels when zooming into screenshots of the
desktop.
I also hope to use this option to do more extensive stress tests of
whether we can still run stably with many CUs disabled - I suspect an
interaction between CU disabling and CU reservations for shader stages.
I don't think an automatic self-test is feasible for the kernel module,
and from user space, "stress testing" with Piglit is precisely how I
found it :)
Nicolai
>
>
> Tom
>
>
>
> ------------------------------------------------------------------------
> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of
> Nicolai Hähnle <nhaehnle at gmail.com>
> *Sent:* Friday, June 17, 2016 09:17
> *To:* amd-gfx at lists.freedesktop.org
> *Cc:* Haehnle, Nicolai
> *Subject:* [amd-gfx] [PATCH 1/3] drm/amdgpu: add disable_cu parameter
> From: Nicolai Hähnle <nicolai.haehnle at amd.com>
>
> This parameter will allow disabling individual CUs on module load, e.g.
> amdgpu.disable_cu=2.0.3,2.0.4 to disable CUs 3 and 4 of SE2.
>
> Signed-off-by: Nicolai Hähnle <nicolai.haehnle at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 44
> +++++++++++++++++++++++++++++++++
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 2 ++
> 4 files changed, 51 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 01c36b8..2d35e11 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -87,6 +87,7 @@ extern int amdgpu_sched_hw_submission;
> extern int amdgpu_powerplay;
> extern unsigned amdgpu_pcie_gen_cap;
> extern unsigned amdgpu_pcie_lane_cap;
> +extern char *amdgpu_disable_cu;
>
> #define AMDGPU_WAIT_IDLE_TIMEOUT_IN_MS 3000
> #define AMDGPU_MAX_USEC_TIMEOUT 100000 /* 100 ms */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index f888c01..235f732 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -84,6 +84,7 @@ int amdgpu_sched_hw_submission = 2;
> int amdgpu_powerplay = -1;
> unsigned amdgpu_pcie_gen_cap = 0;
> unsigned amdgpu_pcie_lane_cap = 0;
> +char *amdgpu_disable_cu = NULL;
>
> MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in megabytes");
> module_param_named(vramlimit, amdgpu_vram_limit, int, 0600);
> @@ -168,6 +169,9 @@ module_param_named(pcie_gen_cap,
> amdgpu_pcie_gen_cap, uint, 0444);
> MODULE_PARM_DESC(pcie_lane_cap, "PCIE Lane Caps (0: autodetect
> (default))");
> module_param_named(pcie_lane_cap, amdgpu_pcie_lane_cap, uint, 0444);
>
> +MODULE_PARM_DESC(disable_cu, "Disable CUs (se.sh.cu,...)");
> +module_param_named(disable_cu, amdgpu_disable_cu, charp, 0444);
> +
> static const struct pci_device_id pciidlist[] = {
> #ifdef CONFIG_DRM_AMDGPU_CIK
> /* Kaveri */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 9f95da4..a074edd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -70,3 +70,47 @@ void amdgpu_gfx_scratch_free(struct amdgpu_device
> *adev, uint32_t reg)
> }
> }
> }
> +
> +/**
> + * amdgpu_gfx_parse_disable_cu - Parse the disable_cu module parameter
> + *
> + * @mask: array in which the per-shader array disable masks will be stored
> + * @max_se: number of SEs
> + * @max_sh: number of SHs
> + *
> + * The bitmask of CUs to be disabled in the shader array determined by
> se and
> + * sh is stored in mask[se * max_sh + sh].
> + */
> +void amdgpu_gfx_parse_disable_cu(unsigned *mask, unsigned max_se,
> unsigned max_sh)
> +{
> + unsigned se, sh, cu;
> + const char *p;
> +
> + memset(mask, 0, sizeof(*mask) * max_se * max_sh);
> +
> + if (!amdgpu_disable_cu || !*amdgpu_disable_cu)
> + return;
> +
> + p = amdgpu_disable_cu;
> + for (;;) {
> + char *next;
> + int ret = sscanf(p, "%u.%u.%u", &se, &sh, &cu);
> + if (ret < 3) {
> + DRM_ERROR("amdgpu: could not parse disable_cu\n");
> + return;
> + }
> +
> + if (se < max_se && sh < max_sh && cu < 16) {
> + DRM_INFO("amdgpu: disabling CU %u.%u.%u\n", se,
> sh, cu);
> + mask[se * max_sh + sh] |= 1u << cu;
> + } else {
> + DRM_ERROR("amdgpu: disable_cu %u.%u.%u is out of
> range\n",
> + se, sh, cu);
> + }
> +
> + next = strchr(p, ',');
> + if (!next)
> + break;
> + p = next + 1;
> + }
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index dc06cbd..51321e1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -27,4 +27,6 @@
> int amdgpu_gfx_scratch_get(struct amdgpu_device *adev, uint32_t *reg);
> void amdgpu_gfx_scratch_free(struct amdgpu_device *adev, uint32_t reg);
>
> +unsigned amdgpu_gfx_parse_disable_cu(unsigned *mask, unsigned max_se,
> unsigned max_sh);
> +
> #endif
> --
> 2.7.4
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
More information about the amd-gfx
mailing list