[Intel-gfx] [RFC PATCH 00/42] Introduce memory region concept (including device local memory)

Christian König ckoenig.leichtzumerken at gmail.com
Tue Feb 26 18:52:17 UTC 2019


Am 26.02.19 um 18:20 schrieb Alex Deucher:
> On Tue, Feb 26, 2019 at 7:17 AM Joonas Lahtinen
> <joonas.lahtinen at linux.intel.com> wrote:
>> Quoting Alex Deucher (2019-02-25 21:31:43)
>>> On Mon, Feb 25, 2019 at 9:35 PM Joonas Lahtinen
>>> <joonas.lahtinen at linux.intel.com> wrote:
>>>> Quoting Dave Airlie (2019-02-25 12:24:48)
>>>>> On Tue, 19 Feb 2019 at 23:32, Joonas Lahtinen
>>>>> <joonas.lahtinen at linux.intel.com> wrote:
>>>>>> + dri-devel mailing list, especially for the buddy allocator part
>>>>>>
>>>>>> Quoting Dave Airlie (2019-02-15 02:47:07)
>>>>>>> On Fri, 15 Feb 2019 at 00:57, Matthew Auld <matthew.auld at intel.com> wrote:
>>>>>>>> In preparation for upcoming devices with device local memory, introduce the
>>>>>>>> concept of different memory regions, and a simple buddy allocator to manage
>>>>>>>> them.
>>>>>>> This is missing the information on why it's not TTM.
>>>>>>>
>>>>>>> Nothing against extending i915 gem off into doing stuff we already
>>>>>>> have examples off in tree, but before you do that it would be good to
>>>>>>> have a why we can't use TTM discussion in public.
>>>>>> Glad that you asked. It's my fault that it was not included in
>>>>>> the cover letter. I anticipated the question, but was travelling
>>>>>> for a couple of days at the time this was sent. I didn't want
>>>>>> to write a hasty explanation and then disappear, leaving others to
>>>>>> take the heat.
>>>>>>
>>>>>> So here goes the less-hasty version:
>>>>>>
>>>>>> We did an analysis on the effort needed vs benefit gained of using
>>>>>> TTM when this was started initially. The conclusion was that we
>>>>>> already share the interesting bits of code through core DRM, really.
>>>>>>
>>>>>> Re-writing the memory handling to TTM would buy us more fine-grained
>>>>>> locking. But it's more a trait of rewriting the memory handling with
>>>>>> the information we have learned, than rewriting it to use TTM :)
>>>>>>
>>>>>> And further, we've been getting rid of struct_mutex at a steady phase
>>>>>> in the past years, so we have a clear path to the fine-grained locking
>>>>>> already in the not-so-distant future. With all this we did not see
>>>>>> much gained from converting over, as the code sharing is already
>>>>>> substantial.
>>>>>>
>>>>>> We also wanted to have the buddy allocator instead of a for loop making
>>>>>> drm_mm allocations to make sure we can keep the memory fragmentation
>>>>>> at bay. The intent is to move the buddy allocator to core DRM, to the
>>>>>> benefit of all the drivers, if there is interest from community. It has
>>>>>> been written as a strictly separate component with that in mind.
>>>>>>
>>>>>> And if you take the buddy allocator out of the patch set, the rest is
>>>>>> mostly just vfuncing things up to be able to have different backing
>>>>>> storages for objects. We took the opportunity to move over to the more
>>>>>> valgrind friendly mmap while touching things, but it's something we
>>>>>> have been contemplating anyway. And yeah, loads of selftests.
>>>>>>
>>>>>> That's really all that needed adding, and most of it is internal to
>>>>>> i915 and not to do with uAPI. This means porting over an userspace
>>>>>> driver doesn't require a substantial rewrite, but adding new a few
>>>>>> new IOCTLs to set the preferred backing storage placements.
>>>>>>
>>>>>> All the previous GEM abstractions keep applying, so we did not see
>>>>>> a justification to rewrite the kernel driver and userspace drivers.
>>>>>> It would have just to made things look like TTM, when we already
>>>>>> have the important parts of the code shared with TTM drivers
>>>>>> behind the GEM interfaces which all our drivers sit on top of.
>>>>> a) you guys should be the community as well, if the buddy allocator is
>>>>> useful in the core DRM get out there and try and see if anyone else
>>>>> has a use case for it, like the GPU scheduler we have now (can i915
>>>>> use that yet? :-)
>>>> Well, the buddy allocator should be useful for anybody wishing to have
>>>> as continuous physical allocations as possible. I have naively assumed
>>>> that would be almost everyone. So it would be only a question if others
>>>> see the amount of work required to convert over is justified for them.
>>>>
>>>> For the common DRM scheduler, I think a solid move from the beginning
>>>> would have been to factor out the i915 scheduler as it was most advanced
>>>> in features :) Now there is a way more trivial common scheduler core with
>>>> no easy path to transition without a feature regression.
>>> Can you elaborate?  What features are missing from the drm gpu scheduler?
>> Priority based pre-emption is the big thing coming to mind. But maybe we
>> should not derail the discussion in this thread. The discussion should
>> be in the archives, or we can start a new thread.
> Probably not worth continuing here, but I think there are probably
> features that the drm scheduler has that the i915 scheduler does not.
> For example, engine load balancing.  So i wouldn't necessarily say
> it's more advanced, just different feature sets.  We could all be
> enjoying all of these features if we all worked on the same
> infrastructure.

Actually it is worth continuing. To be honest the DRM scheduler came 
first and the i915 scheduler should have been pushed back a bit more.

Additional to that it does support priority based pre-emption from the 
very beginning, you guys just didn't cared to take a look :(

Regards,
Christian.

>
>>>> We'd have to rewrite many of the more advanced features for that codebase
>>>> before we could transition over. It's hard to justify such work, for
>>>> that it would buy us very little compared to amount of work.
>>>>
>>>> Situation would be different if there was something gained from
>>>> switching over. This would be the situation if the more advanced
>>>> scheduler was picked as the shared codebase.
>>>>
>>>>> b) however this last two paragraphs fill me with no confidence that
>>>>> you've looked at TTM at all. It sounds like you took comments about
>>>>> TTM made 10 years ago, and didn't update them. There should be no
>>>>> major reason for a uapi change just because you adopt TTM. TTM hasn't
>>>>> ever had a common uapi across drivers upstream, one was proposed
>>>>> initially > 10 years ago.
>>>> This is one part my confusion on what the question was for and other
>>>> part bad wording on my behalf.
>>>>
>>>> So an attempt to re-answer: When this effort was started it was obvious
>>>> that the amount of new code required was low (as you can see). Feedback
>>>> about what converting to TTM would require vs. give us was gathered from
>>>> folks including Daniel and Chris and the unanimous decision was that it
>>>> would not be justifiable.
>>>>
>>>>> All the current TTM using drivers except
>>>>> vmware use a GEM based API as well. TTM is an internal driver helper
>>>>> for managing pools of RAM.
>>>>>
>>>>> I'm just not sure what rebuilding a chunk of shared code inside the
>>>>> i915 driver is buying you, except a transition path into divergence
>>>>> from all the other discrete RAM drivers.
>>>> I guess what I'm trying to say, there isn't that much code being added
>>>> that wouldn't be there anyway. The i915 GEM code has already grown to
>>>> what it is before this.
>>>>
>>>> Adding the suggested smaller amount of code vs. doing a much bigger
>>>> rewrite is something of a straightforward choice with the amount of
>>>> platforms and features in flight, especially when the end result is
>>>> the same.
>>> Because you will probably never do it.  It's almost always easier to
>>> just incrementally add on to existing code.  One could argue that GEM
>>> evolved into more or less the exact same thing as TTM anyway so why
>>> not bite the bullet and switch at some point?  TTM works fine even for
>>> UMA hardware.
>> By my understanding there are quite a few differences in low on memory
>> handling and other more subtle aspects between the two.
>> Converting over to TTM would be a pretty big rewrite of i915, and the
>> code sharing is already happening for important parts.
>>
>>>>> Like the gallium aversion in
>>>>> userspace, having a TTM aversion in kernel space is going to be the
>>>>> wrong path, and I'd rather not have to clean it up in 5 years when you
>>>>> guys eventually realise it.
>>>>>
>>>>> The i915 GEM code get rewritten and refactored quite often
>>>> Well, we continually try to improve things and accommodate upcoming
>>>> product requirements and hardware feature. The thing is that we do it
>>>> upstream first, so hard to avoid the churn.
>>>>
>>>> When moving rapidly, it's hard to project what the shared portion of
>>>> the code might be or exactly look. I'm myself much more believer in
>>>> extracting the generic portions out when the code is there and working.
>>>>
>>>> The churn would be even bigger if there is a strict requirement to
>>>> refactor the common parts out straight from the beginning, before
>>>> anything is even proven in practice.
>>>>
>>>> That'd just mean it gets much harder to sell doing things upstream
>>>> first to the management.
>>> What does upstream first have to do with contributing to shared code?
>>> There is a common misconception in big companies that if you utilize
>>> shared infrastructure it will slow you down or you'll lose control of
>>> your code which is I think what you are referring to.
>> That'd be the wrong impression. Quite the contrary, the biggest chunk
>> of code here, buddy allocator, was written specifically with the fact
>> in mind that we can easily move it to DRM core.
>>
>> And I think we have a solid track history of contributing to the DRM
>> core, so I'm not sure why you'd think that.
> I didn't say you don't have a track record of contributing, my point
> was that you only seem to use what you contributed in the first place.
> This is another example.  I haven't seen Intel use any infrastructure
> they didn't write directly.  Off the top of my head:
> - drm GPU scheduler
> - TTM
> - Gallium
> - Multi-media (creating VAAPI when there were arguably superior
> alternatives, VDPAU and OpenMAX)
> - OpenCL
>
> In all those cases Intel did something equivalent in parallel, pushing
> its projects while trying to downplay the existing projects.  This
> just seems like more of that.
>
>>> Ultimately, it
>>> does sometimes raise the bar, but in the long term it benefits
>>> everyone and usually it doesn't really add that much overhead.
>> That we both agree on. But in the case of this specific patch series
>> it is about rewriting the driver to TTM, which is quite an overhead.
>>
>> And if most of the important code is shared anyway through DRM core,
>> doing the rewrite + covering all the subtle differences, does not
>> sound too compelling.
> Sometimes it is a lot of overhead.  That's the pact you make when you
> put code upstream.  The community wants the best solution for everyone
> and to be able to maintain it if you don't.  Sometimes that means
> extra overhead.
>
> At the end of the day, I don't really care that much.  I get it, we
> all have large projects with scarce resources.  I just think a few
> years down the road we'll all regret it as a community.
>
> Alex
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



More information about the dri-devel mailing list