<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p><br>
</p>
<br>
<div class="moz-cite-prefix">On 12/29/2017 03:19 PM, Koenig,
Christian wrote:<br>
</div>
<blockquote type="cite"
cite="mid:f817cb0f-9c6f-472b-8bdc-025b34da595a@email.android.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="auto">The difference is that the OOM killer doesn't know
of the pages an application allocates through the driver.
<div dir="auto"><br>
</div>
<div dir="auto">This results in a bad decision which process to
kill.</div>
<div dir="auto"><br>
</div>
<div dir="auto">I had patches to fix this a long time ago on the
list, but never found time to clean them up and push them
upstream.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Andrey is now working on this, but I don't know
the status of hand.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Christian.</div>
</div>
</blockquote>
<br>
Don't have updates since the last status I sent on this (on vacation
currently), so will just reiterate the last status here - <br>
<br>
"Worked a bit more on this, did find a function to properly get free
swap space but this solution is not working for when evicting to
swap a BO from LRU list which is larger then available RAM (as you
predicted) and it causes OOM in the swapper work thread, as you can
see from attachment shmem_read_mapping_page will use default
allocation policy for swap pages while I think we should have used
__GFP_RETRY_MAYFAIL here and the way to do it is to use
shmem_read_mapping_page_gfp which allows to set GFP flags.
<br>
<br>
In general I think that the approach you suggested (and then one i
was advised at #mm channel) is the right one, to avoid OOM killer we
should not try to do any estimations on free RAM or SWAP since it's
not reliable any way to assume that by the time we allocate things
will not change, what we should do is to allocate pages without
triggering OOM killer. I think I should again to try and set all
system page allocation code paths we use to __GFP_RETRY_MAYFAIL and
debug again why it didn't work last time. One reason could be
because i missed the SWAP pages allocation, another is a possible
memory leak when failing allocation and rolling back all previously
allocated pages for the BO, which leads to OOM anyway. "<br>
<br>
Thanks,<br>
Andrey<br>
<br>
<blockquote type="cite"
cite="mid:f817cb0f-9c6f-472b-8bdc-025b34da595a@email.android.com">
<div dir="auto">
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">Am 29.12.2017 20:36 schrieb "Kuehling,
Felix" <a class="moz-txt-link-rfc2396E" href="mailto:Felix.Kuehling@amd.com"><Felix.Kuehling@amd.com></a>:<br type="attribution">
<blockquote class="quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><font size="2"><span style="font-size:11pt">
<div>Is it possible that the test is broken? A test
that allocates memory to<br>
exhaustion may well trigger the OOM killer. A test
can do that by using<br>
malloc. Why not by using the graphics driver? The
OOM killer does what<br>
it's supposed to do, and kills the broken
application.<br>
<br>
As I understand it, this change is adds artificial
limitations to<br>
workaround a bug in a user mode test. However, it
ends up limiting the<br>
memory available for well behaved applications, more
than necessary.<br>
<br>
For compute applications that work with huge data
sets, we want to be<br>
able to allocate lots of system memory. Tying
available system memory to<br>
the VRAM size makes no sense for compute
applications that want to work<br>
with such huge data sets.<br>
<br>
Regards,<br>
Felix<br>
<br>
<br>
On 2017-12-15 02:09 PM, Andrey Grodzovsky wrote:<br>
> This reverts commit
ba851eed895c76be0eb4260bdbeb7e26f9ccfaa2.<br>
> With that change piglit max size tests (running
with -t max.*size) are causing<br>
> OOM and hard hang on my CZ with 1GB RAM.<br>
><br>
> Signed-off-by: Andrey Grodzovsky
<a class="moz-txt-link-rfc2396E" href="mailto:andrey.grodzovsky@amd.com"><andrey.grodzovsky@amd.com></a><br>
> ---<br>
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8
+++++---<br>
> 1 file changed, 5 insertions(+), 3
deletions(-)<br>
><br>
> diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c<br>
> index c307a7d..814a9c1 100644<br>
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c<br>
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c<br>
> @@ -1329,9 +1329,11 @@ int
amdgpu_ttm_init(struct amdgpu_device *adev)<br>
> struct sysinfo si;<br>
> <br>
> si_meminfo(&si);<br>
> - gtt_size =
max(AMDGPU_DEFAULT_GTT_SIZE_MB << 20,<br>
> - (uint64_t)si.totalram *
si.mem_unit * 3/4);<br>
> - } else<br>
> + gtt_size =
min(max((AMDGPU_DEFAULT_GTT_SIZE_MB << 20),<br>
> +
adev->mc.mc_vram_size),<br>
> +
((uint64_t)si.totalram * si.mem_unit * 3/4));<br>
> + }<br>
> + else<br>
> gtt_size =
(uint64_t)amdgpu_gtt_size << 20;<br>
> r =
ttm_bo_init_mm(&adev->mman.bdev, TTM_PL_TT,
gtt_size >> PAGE_SHIFT);<br>
> if (r) {<!-- --><br>
<br>
</div>
</span></font></div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>