[RFC PATCH v3 09/11] drm, cgroup: Add per cgroup bw measure and control
Kenny Ho
y2kenny at gmail.com
Thu Jun 27 04:34:05 UTC 2019
On Wed, Jun 26, 2019 at 12:25 PM Daniel Vetter <daniel at ffwll.ch> wrote:
>
> On Wed, Jun 26, 2019 at 11:05:20AM -0400, Kenny Ho wrote:
> > The bandwidth is measured by keeping track of the amount of bytes moved
> > by ttm within a time period. We defined two type of bandwidth: burst
> > and average. Average bandwidth is calculated by dividing the total
> > amount of bytes moved within a cgroup by the lifetime of the cgroup.
> > Burst bandwidth is similar except that the byte and time measurement is
> > reset after a user configurable period.
>
> So I'm not too sure exposing this is a great idea, at least depending upon
> what you're trying to do with it. There's a few concerns here:
>
> - I think bo movement stats might be useful, but they're not telling you
> everything. Applications can also copy data themselves and put buffers
> where they want them, especially with more explicit apis like vk.
>
> - which kind of moves are we talking about here? Eviction related bo moves
> seem not counted here, and if you have lots of gpus with funny
> interconnects you might also get other kinds of moves, not just system
> ram <-> vram.
Eviction move is counted but I think I placed the delay in the wrong
place (the tracking of byte moved is in previous patch in
ttm_bo_handle_move_mem, which is common to all move as far as I can
tell.)
> - What happens if we slow down, but someone else needs to evict our
> buffers/move them (ttm is atm not great at this, but Christian König is
> working on patches). I think there's lots of priority inversion
> potential here.
>
> - If the goal is to avoid thrashing the interconnects, then this isn't the
> full picture by far - apps can use copy engines and explicit placement,
> again that's how vulkan at least is supposed to work.
>
> I guess these all boil down to: What do you want to achieve here? The
> commit message doesn't explain the intended use-case of this.
Thrashing prevention is the intent. I am not familiar with Vulkan so
I will have to get back to you on that. I don't know how those
explicit placement translate into the kernel. At this stage, I think
it's still worth while to have this as a resource even if some
applications bypass the kernel. I certainly welcome more feedback on
this topic.
Regards,
Kenny
More information about the amd-gfx
mailing list