Plan: BO move throttling for visible VRAM evictions

Thu May 18 08:17:04 UTC 2017

On 17/05/17 09:35 PM, Marek Olšák wrote:
> On May 16, 2017 3:57 AM, "Michel Dänzer" <michel at daenzer.net
> <mailto:michel at daenzer.net>> wrote:
>     On 15/05/17 07:11 PM, Marek Olšák wrote:
>     > On May 15, 2017 4:29 AM, "Michel Dänzer" <michel at daenzer.net
>     <mailto:michel at daenzer.net>
>     > <mailto:michel at daenzer.net <mailto:michel at daenzer.net>>> wrote:
>     >
>     >     I think the next step should be to make radeonsi keep track of
>     how much
>     >     VRAM it's trying to use that's expected to be accessed by the
>     CPU, and
>     >     to use GTT instead when that exceeds a threshold (probably
>     derived from
>     >     vram_vis_size).
>     >
>     > That's difficult to estimate. There are apps with 600MB of mapped VRAM
>     > and don't experience any performance issues. And some apps with
>     300MB of
>     > mapped VRAM do. It only depends on the CPU access pattern, not what
>     > radeonsi sees.
> 
>     What I mean is keeping track of the total size of resources which have
>     RADEON_DOMAIN_VRAM and RADEON_FLAG_CPU_ACCESS set, and if it exceeds a
>     threshold, create new ones having those flags in GTT instead. Even
>     though this might not be strictly necessary with amdgpu in the long run,
>     it probably is for radeon anyway, and in the short term it might help
>     even with amdgpu.
> 
> 
> That might hurt us more than it can help.

You may be right, but I think I'll play with that idea a little anyway
to see how it goes. :)

> All mappable buffers have the CPU access flag set, but many of them are
> immutable.

You mean they're only written to once by the CPU? We shouldn't set the
RADEON_FLAG_CPU_ACCESS flag for BOs where we expect that, because it
will currently prevent them from being in the CPU invisible part of VRAM.

> The only place where this can be handled is the kernel.

Ideally, the placement of a BO should be determined based on how it's
actually being used by the GPU vs CPU. But I'm not sure how to determine
that in a useful way.

> Even if it's as simple as: if (bo->numcpufaults > 10) domain = GTT_WC;

I'm skeptical about the number of CPU page faults per se being a useful
metric. It doesn't tell us much about how the BO is used even by the
CPU, let alone the GPU. But let's see where this leads you.

One thing that might help would be if we could swap individual memory
nodes between visible and invisible VRAM for CPU page faults, instead of
moving/evicting whole BOs. Christian, do you think something like that
would be possible?

Another idea (to avoid issues such as the recent one with Rocket League)
was to make VRAM CPU mappings write-only, and move the BO to GTT if
there's a read fault. But not sure if this is possible at all, or how
much effort it would be.

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer