<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:14.0pt;
font-family:"Calibri",sans-serif;}
p.msonormal0, li.msonormal0, div.msonormal0
{mso-style-name:msonormal;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:"Calibri",sans-serif;}
p.msipheadera92e061b, li.msipheadera92e061b, div.msipheadera92e061b
{mso-style-name:msipheadera92e061b;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle21
{mso-style-type:personal-compose;
font-family:"Arial",sans-serif;
color:#0078D7;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="msipheadera92e061b" style="margin:0in;margin-bottom:.0001pt"><span style="font-size:10.0pt;font-family:"Arial",sans-serif;color:#0078D7">[AMD Official Use Only - Internal Distribution Only]</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">-----Original Message-----<br>
From: Andrey Grodzovsky <andrey.grodzovsky@amd.com> <br>
Sent: Thursday, December 12, 2019 4:39 AM<br>
To: dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org<br>
Cc: Deucher, Alexander <Alexander.Deucher@amd.com>; Ma, Le <Le.Ma@amd.com>; Zhang, Hawking <Hawking.Zhang@amd.com>; Quan, Evan <Evan.Quan@amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com><br>
Subject: [RESEND PATCH 1/5] drm/amdgpu: reverts commit b01245ff54db66073b104ac9d9fbefb7b264b36d.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">In preparation for doing XGMI reset synchronization using task barrier.<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com><o:p></o:p></p>
<p class="MsoPlainText">---<o:p></o:p></p>
<p class="MsoPlainText">drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 -<o:p></o:p></p>
<p class="MsoPlainText">drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 76 +++++-------------------------<o:p></o:p></p>
<p class="MsoPlainText">2 files changed, 12 insertions(+), 66 deletions(-)<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h<o:p></o:p></p>
<p class="MsoPlainText">index a78a363..50bab33 100644<o:p></o:p></p>
<p class="MsoPlainText">--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h<o:p></o:p></p>
<p class="MsoPlainText">+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h<o:p></o:p></p>
<p class="MsoPlainText">@@ -1001,8 +1001,6 @@ struct amdgpu_device {<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText"> bool pm_sysfs_en;<o:p></o:p></p>
<p class="MsoPlainText"> bool ucode_sysfs_en;<o:p></o:p></p>
<p class="MsoPlainText">-<o:p></o:p></p>
<p class="MsoPlainText">- bool in_baco;<o:p></o:p></p>
<p class="MsoPlainText">};<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText"> static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<o:p></o:p></p>
<p class="MsoPlainText">index 7324a5f..1d19edfa 100644<o:p></o:p></p>
<p class="MsoPlainText">--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<o:p></o:p></p>
<p class="MsoPlainText">+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c<o:p></o:p></p>
<p class="MsoPlainText">@@ -2667,7 +2667,7 @@ static void amdgpu_device_xgmi_reset_func(struct work_struct *__work)<o:p></o:p></p>
<p class="MsoPlainText"> if (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO)<o:p></o:p></p>
<p class="MsoPlainText"> adev->asic_reset_res = (adev->in_baco == false) ?<o:p></o:p></p>
<p class="MsoPlainText"> amdgpu_device_baco_enter(adev->ddev) :<o:p></o:p></p>
<p class="MsoPlainText">- amdgpu_device_baco_exit(adev->ddev);<o:p></o:p></p>
<p class="MsoPlainText">+ qamdgpu_device_baco_exit(adev->ddev);<o:p></o:p></p>
<p class="MsoPlainText"><span style="color:#203864">[Le]: Typo here. With it fixed, Reviewed-by: Le Ma <<a href="mailto:Le.Ma@amd.com"><span style="color:#203864">Le.Ma@amd.com</span></a>><o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black"><o:p> </o:p></span></p>
<p class="MsoPlainText"><span style="color:black">Regards,<o:p></o:p></span></p>
<p class="MsoPlainText"><span style="color:black">Ma Le<o:p></o:p></span></p>
<p class="MsoPlainText"> else<o:p></o:p></p>
<p class="MsoPlainText"> adev->asic_reset_res = amdgpu_asic_reset(adev);<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">@@ -3796,18 +3796,13 @@ static int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,<o:p></o:p></p>
<p class="MsoPlainText"> return r;<o:p></o:p></p>
<p class="MsoPlainText">}<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">-static int amdgpu_do_asic_reset(struct amdgpu_device *adev,<o:p></o:p></p>
<p class="MsoPlainText">- struct amdgpu_hive_info *hive,<o:p></o:p></p>
<p class="MsoPlainText">+static int amdgpu_do_asic_reset(struct amdgpu_hive_info *hive,<o:p></o:p></p>
<p class="MsoPlainText"> struct list_head *device_list_handle,<o:p></o:p></p>
<p class="MsoPlainText"> bool *need_full_reset_arg)<o:p></o:p></p>
<p class="MsoPlainText">{<o:p></o:p></p>
<p class="MsoPlainText"> struct amdgpu_device *tmp_adev = NULL;<o:p></o:p></p>
<p class="MsoPlainText"> bool need_full_reset = *need_full_reset_arg, vram_lost = false;<o:p></o:p></p>
<p class="MsoPlainText"> int r = 0;<o:p></o:p></p>
<p class="MsoPlainText">- int cpu = smp_processor_id();<o:p></o:p></p>
<p class="MsoPlainText">- bool use_baco =<o:p></o:p></p>
<p class="MsoPlainText">- (amdgpu_asic_reset_method(adev) == AMD_RESET_METHOD_BACO) ?<o:p></o:p></p>
<p class="MsoPlainText">- true : false;<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText"> /*<o:p></o:p></p>
<p class="MsoPlainText"> * ASIC reset has to be done on all HGMI hive nodes ASAP @@ -3815,62 +3810,22 @@ static int amdgpu_do_asic_reset(struct amdgpu_device *adev,<o:p></o:p></p>
<p class="MsoPlainText"> */<o:p></o:p></p>
<p class="MsoPlainText"> if (need_full_reset) {<o:p></o:p></p>
<p class="MsoPlainText"> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {<o:p></o:p></p>
<p class="MsoPlainText">- /*<o:p></o:p></p>
<p class="MsoPlainText">- * For XGMI run all resets in parallel to speed up the<o:p></o:p></p>
<p class="MsoPlainText">- * process by scheduling the highpri wq on different<o:p></o:p></p>
<p class="MsoPlainText">- * cpus. For XGMI with baco reset, all nodes must enter<o:p></o:p></p>
<p class="MsoPlainText">- * baco within close proximity before anyone exit.<o:p></o:p></p>
<p class="MsoPlainText">- */<o:p></o:p></p>
<p class="MsoPlainText">+ /* For XGMI run all resets in parallel to speed up the process */<o:p></o:p></p>
<p class="MsoPlainText"> if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {<o:p></o:p></p>
<p class="MsoPlainText">- if (!queue_work_on(cpu, system_highpri_wq,<o:p></o:p></p>
<p class="MsoPlainText">- &tmp_adev->xgmi_reset_work))<o:p></o:p></p>
<p class="MsoPlainText">+ if (!queue_work(system_highpri_wq, &tmp_adev->xgmi_reset_work))<o:p></o:p></p>
<p class="MsoPlainText"> r = -EALREADY;<o:p></o:p></p>
<p class="MsoPlainText">- cpu = cpumask_next(cpu, cpu_online_mask);<o:p></o:p></p>
<p class="MsoPlainText"> } else<o:p></o:p></p>
<p class="MsoPlainText"> r = amdgpu_asic_reset(tmp_adev);<o:p></o:p></p>
<p class="MsoPlainText">- if (r)<o:p></o:p></p>
<p class="MsoPlainText">- break;<o:p></o:p></p>
<p class="MsoPlainText">- }<o:p></o:p></p>
<p class="MsoPlainText">-<o:p></o:p></p>
<p class="MsoPlainText">- /* For XGMI wait for all work to complete before proceed */<o:p></o:p></p>
<p class="MsoPlainText">- if (!r) {<o:p></o:p></p>
<p class="MsoPlainText">- list_for_each_entry(tmp_adev, device_list_handle,<o:p></o:p></p>
<p class="MsoPlainText">- gmc.xgmi.head) {<o:p></o:p></p>
<p class="MsoPlainText">- if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {<o:p></o:p></p>
<p class="MsoPlainText">- flush_work(&tmp_adev->xgmi_reset_work);<o:p></o:p></p>
<p class="MsoPlainText">- r = tmp_adev->asic_reset_res;<o:p></o:p></p>
<p class="MsoPlainText">- if (r)<o:p></o:p></p>
<p class="MsoPlainText">- break;<o:p></o:p></p>
<p class="MsoPlainText">- if (use_baco)<o:p></o:p></p>
<p class="MsoPlainText">- tmp_adev->in_baco = true;<o:p></o:p></p>
<p class="MsoPlainText">- }<o:p></o:p></p>
<p class="MsoPlainText">- }<o:p></o:p></p>
<p class="MsoPlainText">- }<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">- /*<o:p></o:p></p>
<p class="MsoPlainText">- * For XGMI with baco reset, need exit baco phase by scheduling<o:p></o:p></p>
<p class="MsoPlainText">- * xgmi_reset_work one more time. PSP reset and sGPU skips this<o:p></o:p></p>
<p class="MsoPlainText">- * phase. Not assume the situation that PSP reset and baco reset<o:p></o:p></p>
<p class="MsoPlainText">- * coexist within an XGMI hive.<o:p></o:p></p>
<p class="MsoPlainText">- */<o:p></o:p></p>
<p class="MsoPlainText">-<o:p></o:p></p>
<p class="MsoPlainText">- if (!r && use_baco) {<o:p></o:p></p>
<p class="MsoPlainText">- cpu = smp_processor_id();<o:p></o:p></p>
<p class="MsoPlainText">- list_for_each_entry(tmp_adev, device_list_handle,<o:p></o:p></p>
<p class="MsoPlainText">- gmc.xgmi.head) {<o:p></o:p></p>
<p class="MsoPlainText">- if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) {<o:p></o:p></p>
<p class="MsoPlainText">- if (!queue_work_on(cpu,<o:p></o:p></p>
<p class="MsoPlainText">- system_highpri_wq,<o:p></o:p></p>
<p class="MsoPlainText">- &tmp_adev->xgmi_reset_work))<o:p></o:p></p>
<p class="MsoPlainText">- r = -EALREADY;<o:p></o:p></p>
<p class="MsoPlainText">- if (r)<o:p></o:p></p>
<p class="MsoPlainText">- break;<o:p></o:p></p>
<p class="MsoPlainText">- cpu = cpumask_next(cpu, cpu_online_mask);<o:p></o:p></p>
<p class="MsoPlainText">- }<o:p></o:p></p>
<p class="MsoPlainText">+ if (r) {<o:p></o:p></p>
<p class="MsoPlainText">+ DRM_ERROR("ASIC reset failed with error, %d for drm dev, %s",<o:p></o:p></p>
<p class="MsoPlainText">+ r, tmp_adev->ddev->unique);<o:p></o:p></p>
<p class="MsoPlainText">+ break;<o:p></o:p></p>
<p class="MsoPlainText"> }<o:p></o:p></p>
<p class="MsoPlainText"> }<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">- if (!r && use_baco) {<o:p></o:p></p>
<p class="MsoPlainText">+ /* For XGMI wait for all PSP resets to complete before proceed */<o:p></o:p></p>
<p class="MsoPlainText">+ if (!r) {<o:p></o:p></p>
<p class="MsoPlainText"> list_for_each_entry(tmp_adev, device_list_handle,<o:p></o:p></p>
<p class="MsoPlainText"> gmc.xgmi.head) {<o:p></o:p></p>
<p class="MsoPlainText"> if (tmp_adev->gmc.xgmi.num_physical_nodes > 1) { @@ -3878,21 +3833,15 @@ static int amdgpu_do_asic_reset(struct amdgpu_device *adev,<o:p></o:p></p>
<p class="MsoPlainText"> r = tmp_adev->asic_reset_res;<o:p></o:p></p>
<p class="MsoPlainText"> if (r)<o:p></o:p></p>
<p class="MsoPlainText"> break;<o:p></o:p></p>
<p class="MsoPlainText">- tmp_adev->in_baco = false;<o:p></o:p></p>
<p class="MsoPlainText"> }<o:p></o:p></p>
<p class="MsoPlainText"> }<o:p></o:p></p>
<p class="MsoPlainText"> }<o:p></o:p></p>
<p class="MsoPlainText">-<o:p></o:p></p>
<p class="MsoPlainText">- if (r) {<o:p></o:p></p>
<p class="MsoPlainText">- DRM_ERROR("ASIC reset failed with error, %d for drm dev, %s",<o:p></o:p></p>
<p class="MsoPlainText">- r, tmp_adev->ddev->unique);<o:p></o:p></p>
<p class="MsoPlainText">- goto end;<o:p></o:p></p>
<p class="MsoPlainText">- }<o:p></o:p></p>
<p class="MsoPlainText"> }<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText"> if (!r && amdgpu_ras_intr_triggered())<o:p></o:p></p>
<p class="MsoPlainText"> amdgpu_ras_intr_cleared();<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
<p class="MsoPlainText">+<o:p></o:p></p>
<p class="MsoPlainText"> list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {<o:p></o:p></p>
<p class="MsoPlainText"> if (need_full_reset) {<o:p></o:p></p>
<p class="MsoPlainText"> /* post card */<o:p></o:p></p>
<p class="MsoPlainText">@@ -4181,8 +4130,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,<o:p></o:p></p>
<p class="MsoPlainText"> if (r)<o:p></o:p></p>
<p class="MsoPlainText"> adev->asic_reset_res = r;<o:p></o:p></p>
<p class="MsoPlainText"> } else {<o:p></o:p></p>
<p class="MsoPlainText">- r = amdgpu_do_asic_reset(adev, hive, device_list_handle,<o:p></o:p></p>
<p class="MsoPlainText">- &need_full_reset);<o:p></o:p></p>
<p class="MsoPlainText">+ r = amdgpu_do_asic_reset(hive, device_list_handle,
<o:p></o:p></p>
<p class="MsoPlainText">+&need_full_reset);<o:p></o:p></p>
<p class="MsoPlainText"> if (r && r == -EAGAIN)<o:p></o:p></p>
<p class="MsoPlainText"> goto retry;<o:p></o:p></p>
<p class="MsoPlainText"> }<o:p></o:p></p>
<p class="MsoPlainText">--<o:p></o:p></p>
<p class="MsoPlainText">2.7.4<o:p></o:p></p>
<p class="MsoPlainText"><o:p> </o:p></p>
</div>
</body>
</html>