CIK hangs with kernel 3.15, bisected

Marek Olšák maraeo at gmail.com
Tue May 27 14:55:21 PDT 2014


Hi Christian,

I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
fixed yet. They are very rare and very random. Therefore, I have come
up with a patch which evicts page tables between IBs. See the
attachment. With that patch applied, the system starts fine, compiz
and glxgears work, but once I start playing openarena, it locks up
pretty quickly.

The patch shouldn't do anything in theory, because pages are moved
back to VRAM immediately after that. However, the VRAM address of page
tables may end up being different from before, which might be the root
cause.

Marek

On Wed, May 14, 2014 at 2:11 PM, Christian König
<deathsimple at vodafone.de> wrote:
> Crap, any chance you can narrow it down a bit more?
>
> I've just tried a piglit quick test on my Bonaire and it seems to work
> perfectly fine.
>
> What hw do you test on?
>
> Regards,
> Christian.
>
> Am 13.05.2014 23:21, schrieb Marek Olšák:
>
>> Hi Christian,
>>
>> Even though some regressions are fixed by these patches:
>>
>> drm/radeon: fix page directory update size estimation
>> drm/radeon: fix buffer placement under memory pressure v2
>>
>> and indeed, the texelFetch tests no longer hang, there is one more
>> hang which needs to be fixed. :( All I know is the exact same commit
>> causes it and it can only be reproduced by running whole piglit with
>> concurrency enabled.
>>
>> My kernel git log:
>>
>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>> (10 hours ago) <Christian König>
>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>> hours ago) <Christian König>
>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>> months ago) <Christian König>
>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>> months ago) <Christian König>
>>
>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>> of the two fixes is the first bad commit.
>>
>> Marek
>>
>> On Fri, May 9, 2014 at 8:03 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>
>>> Hi Christian,
>>>
>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>
>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>>> Author: Christian König <christian.koenig at amd.com>
>>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>>
>>>      drm/radeon: use normal BOs for the page tables v4
>>>
>>>      No need to make it more complicated than necessary,
>>>      just allocate the page tables as normal BO and
>>>      flush whenever the address change.
>>>
>>>      v2: update comments and function name
>>>      v3: squash bug fixes, page directory and tables patch
>>>      v4: rebased on Mareks changes
>>>
>>>      Signed-off-by: Christian König <christian.koenig at amd.com>
>>>
>>>
>>> Reverting the commit gives me a lot of merge conflicts.
>>>
>>> The simplest way to reproduce the hangs is to run piglit with these
>>> parameters:
>>> -t texelFetch.fs
>>>
>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>> run in parallel, which creates a lot of memory pressure and probably
>>> causes buffer evictions.
>>>
>>> Any idea what is wrong with it?
>>>
>>> Thanks,
>>>
>>> Marek
>
>
-------------- next part --------------
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index d9ab99f..365e36f 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -116,6 +116,19 @@ void radeon_vm_manager_fini(struct radeon_device *rdev)
 	rdev->vm_manager.enabled = false;
 }
 
+static void force_gtt(struct radeon_bo *bo)
+{
+	if (radeon_bo_reserve(bo, false))
+		return;
+
+	radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_GTT);
+
+	if (ttm_bo_validate(&bo->tbo, &bo->placement, true, false)) {
+		DRM_ERROR("failed to force a GTT placement\n");
+	}
+	radeon_bo_unreserve(bo);
+}
+
 /**
  * radeon_vm_get_bos - add the vm BOs to a validation list
  *
@@ -147,6 +160,8 @@ struct radeon_cs_reloc *radeon_vm_get_bos(struct radeon_device *rdev,
 	list[0].handle = 0;
 	list_add(&list[0].tv.head, head);
 
+	force_gtt(vm->page_directory);
+
 	for (i = 0, idx = 1; i <= vm->max_pde_used; i++) {
 		if (!vm->page_tables[i].bo)
 			continue;
@@ -159,6 +174,8 @@ struct radeon_cs_reloc *radeon_vm_get_bos(struct radeon_device *rdev,
 		list[idx].tiling_flags = 0;
 		list[idx].handle = 0;
 		list_add(&list[idx++].tv.head, head);
+
+		force_gtt(vm->page_tables[i].bo);
 	}
 
 	return list;


More information about the dri-devel mailing list