[PATCH 4/4] drm/amdgpu: Implement ignore_bad_page_threshold parameter
Kent Russell
kent.russell at amd.com
Tue Oct 19 17:50:50 UTC 2021
If the ignore_bad_page_threshold kernel parameter is set to true,
continue to post the GPU. Print an warning to dmesg that this action has
been done, and that page retirement will obviously not work for said GPU
Cc: Luben Tuikov <luben.tuikov at amd.com>
Cc: Mukul Joshi <Mukul.Joshi at amd.com>
Signed-off-by: Kent Russell <kent.russell at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 7bb506a0ebd6..63a0548a05bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -1108,11 +1108,16 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control,
res = amdgpu_ras_eeprom_correct_header_tag(control,
RAS_TABLE_HDR_VAL);
} else {
- *exceed_err_limit = true;
- dev_err(adev->dev,
- "RAS records:%d exceed threshold:%d, "
- "GPU will not be initialized. Replace this GPU or increase the threshold",
+ dev_err(adev->dev, "RAS records:%d exceed threshold:%d",
control->ras_num_recs, ras->bad_page_cnt_threshold);
+ if (amdgpu_ignore_bad_page_threshold) {
+ dev_warn(adev->dev, "GPU will be initialized due to ignore_bad_page_threshold.");
+ dev_warn(adev->dev, "Page retirement will not work for this GPU in this state.");
+ res = 0;
+ } else {
+ *exceed_err_limit = true;
+ dev_err(adev->dev, "GPU will not be initialized. Replace this GPU or increase the threshold.");
+ }
}
} else {
DRM_INFO("Creating a new EEPROM table");
--
2.25.1
More information about the amd-gfx
mailing list