[Intel-gfx] [RFC PATCH 00/42] Introduce memory region concept (including device local memory)

Tue Feb 26 19:14:22 UTC 2019

On Tue, Feb 26, 2019 at 12:20 PM Alex Deucher <alexdeucher at gmail.com> wrote:
>
> On Tue, Feb 26, 2019 at 7:17 AM Joonas Lahtinen
> <joonas.lahtinen at linux.intel.com> wrote:
> >
> > Quoting Alex Deucher (2019-02-25 21:31:43)
> > > On Mon, Feb 25, 2019 at 9:35 PM Joonas Lahtinen
> > > <joonas.lahtinen at linux.intel.com> wrote:
> > > >
> > > > Quoting Dave Airlie (2019-02-25 12:24:48)
> > > > > On Tue, 19 Feb 2019 at 23:32, Joonas Lahtinen
> > > > > <joonas.lahtinen at linux.intel.com> wrote:
> > > > > >
> > > > > > + dri-devel mailing list, especially for the buddy allocator part
> > > > > >
> > > > > > Quoting Dave Airlie (2019-02-15 02:47:07)
> > > > > > > On Fri, 15 Feb 2019 at 00:57, Matthew Auld <matthew.auld at intel.com> wrote:
> > > > > > > >
> > > > > > > > In preparation for upcoming devices with device local memory, introduce the
> > > > > > > > concept of different memory regions, and a simple buddy allocator to manage
> > > > > > > > them.
> > > > > > >
> > > > > > > This is missing the information on why it's not TTM.
> > > > > > >
> > > > > > > Nothing against extending i915 gem off into doing stuff we already
> > > > > > > have examples off in tree, but before you do that it would be good to
> > > > > > > have a why we can't use TTM discussion in public.
> > > > > >
> > > > > > Glad that you asked. It's my fault that it was not included in
> > > > > > the cover letter. I anticipated the question, but was travelling
> > > > > > for a couple of days at the time this was sent. I didn't want
> > > > > > to write a hasty explanation and then disappear, leaving others to
> > > > > > take the heat.
> > > > > >
> > > > > > So here goes the less-hasty version:
> > > > > >
> > > > > > We did an analysis on the effort needed vs benefit gained of using
> > > > > > TTM when this was started initially. The conclusion was that we
> > > > > > already share the interesting bits of code through core DRM, really.
> > > > > >
> > > > > > Re-writing the memory handling to TTM would buy us more fine-grained
> > > > > > locking. But it's more a trait of rewriting the memory handling with
> > > > > > the information we have learned, than rewriting it to use TTM :)
> > > > > >
> > > > > > And further, we've been getting rid of struct_mutex at a steady phase
> > > > > > in the past years, so we have a clear path to the fine-grained locking
> > > > > > already in the not-so-distant future. With all this we did not see
> > > > > > much gained from converting over, as the code sharing is already
> > > > > > substantial.
> > > > > >
> > > > > > We also wanted to have the buddy allocator instead of a for loop making
> > > > > > drm_mm allocations to make sure we can keep the memory fragmentation
> > > > > > at bay. The intent is to move the buddy allocator to core DRM, to the
> > > > > > benefit of all the drivers, if there is interest from community. It has
> > > > > > been written as a strictly separate component with that in mind.
> > > > > >
> > > > > > And if you take the buddy allocator out of the patch set, the rest is
> > > > > > mostly just vfuncing things up to be able to have different backing
> > > > > > storages for objects. We took the opportunity to move over to the more
> > > > > > valgrind friendly mmap while touching things, but it's something we
> > > > > > have been contemplating anyway. And yeah, loads of selftests.
> > > > > >
> > > > > > That's really all that needed adding, and most of it is internal to
> > > > > > i915 and not to do with uAPI. This means porting over an userspace
> > > > > > driver doesn't require a substantial rewrite, but adding new a few
> > > > > > new IOCTLs to set the preferred backing storage placements.
> > > > > >
> > > > > > All the previous GEM abstractions keep applying, so we did not see
> > > > > > a justification to rewrite the kernel driver and userspace drivers.
> > > > > > It would have just to made things look like TTM, when we already
> > > > > > have the important parts of the code shared with TTM drivers
> > > > > > behind the GEM interfaces which all our drivers sit on top of.
> > > > >
> > > > > a) you guys should be the community as well, if the buddy allocator is
> > > > > useful in the core DRM get out there and try and see if anyone else
> > > > > has a use case for it, like the GPU scheduler we have now (can i915
> > > > > use that yet? :-)
> > > >
> > > > Well, the buddy allocator should be useful for anybody wishing to have
> > > > as continuous physical allocations as possible. I have naively assumed
> > > > that would be almost everyone. So it would be only a question if others
> > > > see the amount of work required to convert over is justified for them.
> > > >
> > > > For the common DRM scheduler, I think a solid move from the beginning
> > > > would have been to factor out the i915 scheduler as it was most advanced
> > > > in features :) Now there is a way more trivial common scheduler core with
> > > > no easy path to transition without a feature regression.
> > >
> > > Can you elaborate?  What features are missing from the drm gpu scheduler?
> >
> > Priority based pre-emption is the big thing coming to mind. But maybe we
> > should not derail the discussion in this thread. The discussion should
> > be in the archives, or we can start a new thread.
>
> Probably not worth continuing here, but I think there are probably
> features that the drm scheduler has that the i915 scheduler does not.
> For example, engine load balancing.  So i wouldn't necessarily say
> it's more advanced, just different feature sets.  We could all be
> enjoying all of these features if we all worked on the same
> infrastructure.
>
> >
> > > > We'd have to rewrite many of the more advanced features for that codebase
> > > > before we could transition over. It's hard to justify such work, for
> > > > that it would buy us very little compared to amount of work.
> > > >
> > > > Situation would be different if there was something gained from
> > > > switching over. This would be the situation if the more advanced
> > > > scheduler was picked as the shared codebase.
> > > >
> > > > > b) however this last two paragraphs fill me with no confidence that
> > > > > you've looked at TTM at all. It sounds like you took comments about
> > > > > TTM made 10 years ago, and didn't update them. There should be no
> > > > > major reason for a uapi change just because you adopt TTM. TTM hasn't
> > > > > ever had a common uapi across drivers upstream, one was proposed
> > > > > initially > 10 years ago.
> > > >
> > > > This is one part my confusion on what the question was for and other
> > > > part bad wording on my behalf.
> > > >
> > > > So an attempt to re-answer: When this effort was started it was obvious
> > > > that the amount of new code required was low (as you can see). Feedback
> > > > about what converting to TTM would require vs. give us was gathered from
> > > > folks including Daniel and Chris and the unanimous decision was that it
> > > > would not be justifiable.
> > > >
> > > > > All the current TTM using drivers except
> > > > > vmware use a GEM based API as well. TTM is an internal driver helper
> > > > > for managing pools of RAM.
> > > > >
> > > > > I'm just not sure what rebuilding a chunk of shared code inside the
> > > > > i915 driver is buying you, except a transition path into divergence
> > > > > from all the other discrete RAM drivers.
> > > >
> > > > I guess what I'm trying to say, there isn't that much code being added
> > > > that wouldn't be there anyway. The i915 GEM code has already grown to
> > > > what it is before this.
> > > >
> > > > Adding the suggested smaller amount of code vs. doing a much bigger
> > > > rewrite is something of a straightforward choice with the amount of
> > > > platforms and features in flight, especially when the end result is
> > > > the same.
> > >
> > > Because you will probably never do it.  It's almost always easier to
> > > just incrementally add on to existing code.  One could argue that GEM
> > > evolved into more or less the exact same thing as TTM anyway so why
> > > not bite the bullet and switch at some point?  TTM works fine even for
> > > UMA hardware.
> >
> > By my understanding there are quite a few differences in low on memory
> > handling and other more subtle aspects between the two.
> > Converting over to TTM would be a pretty big rewrite of i915, and the
> > code sharing is already happening for important parts.
> >
> > > > > Like the gallium aversion in
> > > > > userspace, having a TTM aversion in kernel space is going to be the
> > > > > wrong path, and I'd rather not have to clean it up in 5 years when you
> > > > > guys eventually realise it.
> > > > >
> > > > > The i915 GEM code get rewritten and refactored quite often
> > > >
> > > > Well, we continually try to improve things and accommodate upcoming
> > > > product requirements and hardware feature. The thing is that we do it
> > > > upstream first, so hard to avoid the churn.
> > > >
> > > > When moving rapidly, it's hard to project what the shared portion of
> > > > the code might be or exactly look. I'm myself much more believer in
> > > > extracting the generic portions out when the code is there and working.
> > > >
> > > > The churn would be even bigger if there is a strict requirement to
> > > > refactor the common parts out straight from the beginning, before
> > > > anything is even proven in practice.
> > > >
> > > > That'd just mean it gets much harder to sell doing things upstream
> > > > first to the management.
> > >
> > > What does upstream first have to do with contributing to shared code?
> > > There is a common misconception in big companies that if you utilize
> > > shared infrastructure it will slow you down or you'll lose control of
> > > your code which is I think what you are referring to.
> >
> > That'd be the wrong impression. Quite the contrary, the biggest chunk
> > of code here, buddy allocator, was written specifically with the fact
> > in mind that we can easily move it to DRM core.
> >
> > And I think we have a solid track history of contributing to the DRM
> > core, so I'm not sure why you'd think that.
>
> I didn't say you don't have a track record of contributing, my point
> was that you only seem to use what you contributed in the first place.
> This is another example.  I haven't seen Intel use any infrastructure
> they didn't write directly.  Off the top of my head:
> - drm GPU scheduler
> - TTM
> - Gallium
> - Multi-media (creating VAAPI when there were arguably superior
> alternatives, VDPAU and OpenMAX)
> - OpenCL
>
> In all those cases Intel did something equivalent in parallel, pushing
> its projects while trying to downplay the existing projects.  This
> just seems like more of that.
>
> >
> > > Ultimately, it
> > > does sometimes raise the bar, but in the long term it benefits
> > > everyone and usually it doesn't really add that much overhead.
> >
> > That we both agree on. But in the case of this specific patch series
> > it is about rewriting the driver to TTM, which is quite an overhead.
> >
> > And if most of the important code is shared anyway through DRM core,
> > doing the rewrite + covering all the subtle differences, does not
> > sound too compelling.
>
> Sometimes it is a lot of overhead.  That's the pact you make when you
> put code upstream.  The community wants the best solution for everyone
> and to be able to maintain it if you don't.  Sometimes that means
> extra overhead.
>
> At the end of the day, I don't really care that much.  I get it, we
> all have large projects with scarce resources.  I just think a few
> years down the road we'll all regret it as a community.

AMD and others have also spent years tuning TTM for both UMA and VRAM,
but especially VRAM.  It comes across a bit daft to complain about the
effort to move to TTM, but say nothing about the effort to tune GEM
for optimal VRAM performance.  Effort that has already been expended
that you could take advantage of.

Alex