[PATCH 12/12] drm/amdgpu: reset eeprom once specifying one bigger threshold

Guchun Chen guchun.chen at amd.com
Tue Jul 28 07:49:34 UTC 2020


During driver's probe, when it hits bad gpu tag in eeprom i2c
init calling(the tag was set when reported bad page reaches
bad page threshold in last driver's working loop), there are
some strategys to deal with the cases:

1. when the module parameter amdgpu_bad_page_threshold = 0,
that means page retirement feature is disabled, so just resetting
the eeprom is fine.
2. When amdgpu_bad_page_threshold is not 0, and moreover, user
sets one bigger valid value in order to make current boot up
succeeds, reset the eeprom data and do not break booting.
3. For other cases, driver's probe will be broken.

Signed-off-by: Guchun Chen <guchun.chen at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index be895dc2d739..02933050081b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -248,6 +248,7 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control,
 	struct amdgpu_device *adev = to_amdgpu_device(control);
 	unsigned char buff[EEPROM_ADDRESS_SIZE + EEPROM_TABLE_HEADER_SIZE] = { 0 };
 	struct amdgpu_ras_eeprom_table_header *hdr = &control->tbl_hdr;
+	struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
 	struct i2c_msg msg = {
 			.addr	= 0,
 			.flags	= I2C_M_RD,
@@ -287,9 +288,15 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control,
 
 	} else if ((hdr->header == EEPROM_TABLE_HDR_BAD) &&
 			(amdgpu_bad_page_threshold != 0)) {
-		*exceed_err_limit = true;
-		DRM_ERROR("Exceeding the bad_page_threshold parameter, "
+		if (ras->bad_page_cnt_threshold > control->num_recs) {
+			DRM_INFO("One valid bigger bad page threshold is "
+					"used, reset eeprom.\n");
+			ret = amdgpu_ras_eeprom_reset_table(control);
+		} else {
+			*exceed_err_limit = true;
+			DRM_ERROR("Exceeding the bad_page_threshold parameter, "
 				"disabling the GPU.\n");
+		}
 	} else {
 		DRM_INFO("Creating new EEPROM table");
 
-- 
2.17.1



More information about the amd-gfx mailing list