[Intel-gfx] [RFC] drm/i915: store all mmio bases in intel_engines
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Mon Mar 12 11:04:16 UTC 2018
On 09/03/2018 19:44, Tvrtko Ursulin wrote:
>
> On 09/03/2018 18:47, Daniele Ceraolo Spurio wrote:
>> On 09/03/18 01:53, Tvrtko Ursulin wrote:
>>>
>>> On 08/03/2018 18:46, Daniele Ceraolo Spurio wrote:
>>>> On 08/03/18 01:31, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 07/03/2018 19:45, Daniele Ceraolo Spurio wrote:
>>>>>> The mmio bases we're currently storing in the intel_engines array are
>>>>>> only valid for a subset of gens, so we need to ignore them and use
>>>>>> different values in some cases. Instead of doing that, we can have a
>>>>>> table of [starting gen, mmio base] pairs for each engine in
>>>>>> intel_engines and select the correct one based on the gen we're
>>>>>> running
>>>>>> on in a consistent way.
>>>>>>
>>>>>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>>>> Signed-off-by: Daniele Ceraolo Spurio
>>>>>> <daniele.ceraolospurio at intel.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/i915/intel_engine_cs.c | 77
>>>>>> +++++++++++++++++++++------------
>>>>>> drivers/gpu/drm/i915/intel_ringbuffer.c | 1 -
>>>>>> 2 files changed, 49 insertions(+), 29 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c
>>>>>> b/drivers/gpu/drm/i915/intel_engine_cs.c
>>>>>> index 4ba139c27fba..1dd92cac8d67 100644
>>>>>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
>>>>>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
>>>>>> @@ -81,12 +81,16 @@ static const struct engine_class_info
>>>>>> intel_engine_classes[] = {
>>>>>> },
>>>>>> };
>>>>>> +#define MAX_MMIO_BASES 3
>>>>>> struct engine_info {
>>>>>> unsigned int hw_id;
>>>>>> unsigned int uabi_id;
>>>>>> u8 class;
>>>>>> u8 instance;
>>>>>> - u32 mmio_base;
>>>>>> + struct engine_mmio_base {
>>>>>> + u32 gen : 8;
>>>>>> + u32 base : 24;
>>>>>> + } mmio_bases[MAX_MMIO_BASES];
>>>>>> unsigned irq_shift;
>>>>>> };
>>>>>
>>>>> It's not bad, but I am just wondering if it is too complicated
>>>>> versus simply duplicating the tables.
>>>>>
>>>>> Duplicated tables would certainly be much less code and complexity,
>>>>> but what about the size of pure data?
>>>>>
>>>>> In this patch sizeof(struct engine_info) grows by MAX_MMIO_BASES *
>>>>> sizeof(u32), so 12, to the total of 30 if I am not mistaken. Times
>>>>> NUM_ENGINES equals 240 bytes for intel_engines[] array with this
>>>>> scheme.
>>>>>
>>>>
>>>> we remove a u32 (the old mmio base) so we only grow 8 per engine,
>>>> but the compiler rounds up to a multiple of u32 so 28 per engine,
>>>> for a total of 224.
>>>>
>>>>> Separate tables per gens would be:
>>>>>
>>>>> sizeof(struct engine_info) is 18 bytes.
>>>>>
>>>>> For < gen6 we'd need 3 engines * 18 = 54
>>>>> < gen11 = 5 engines * 18 = 90
>>>>> >= gen11 = 8 engines * 18 = 144
>>>>>
>>>>> 54 + 90 + 144 = 288 bytes
>>>>>
>>>>> So a little bit bigger. Hm. Plus we would need to refactor so
>>>>> intel_engines[] is not indexed by engine->id but is contiguous
>>>>> array with engine->id in the elements. Code to lookup the compact
>>>>> array should be simpler than combined new code from this patch,
>>>>> especially if you add the test as Chris suggested. So separate
>>>>> engine info arrays would win I think overall - when considering
>>>>> size of text + size of data + size of source code.
>>>>>
>>>>
>>>> Not sure. I would exclude the selftest, which is usually not
>>>> compiled for released kernels, which leads to:
>>>
>>> Yes, but we cannot exclude its source since selftests for separate
>>> tables wouldn't be needed.
>>>
>>>> add/remove: 0/0 grow/shrink: 3/1 up/down: 100/-7 (93)
>>>> Function old new delta
>>>> intel_engines 160 224 +64
>>>> __func__ 13891 13910 +19
>>>> intel_engines_init_mmio 1247 1264 +17
>>>> intel_init_bsd_ring_buffer 142 135 -7
>>>> Total: Before=1479626, After=1479719, chg +0.01%
>>>>
>>>> Total growth is 93, which is less then your estimation for the
>>>> growth introduced by replicating the table.
>>>>
>>>>> But on the other hand your solution might be more future proof. So
>>>>> I don't know. Use the crystal ball to decide? :)
>>>>>
>>>>
>>>> I think we should expect that the total engine count could grow with
>>>> future gens. In this case to me adding a new entry to a unified
>>>> table seems much cleaner (and uses less data) than adding a
>>>> completely new table each time.
>>>
>>> Okay I was off in my estimates. But in general I was coming from the
>>> angle of thinking that every new mmio base difference in this scheme
>>> grows the size for all engines. So if just VCS grows one new base,
>>> impact is multiplied by NUM_ENGINES. But maybe separate tables are
>>> also not a solution. Perhaps pulling out mmio_base arrays to outside
>>> struct engine_info in another set of static tables, so they could be
>>> null terminated would be best of both worlds?
>>>
>>> struct engine_mmio_base {
>>> u32 gen : 8;
>>> u32 base : 24;
>>> };
>>>
>>> static const struct engine_mmio_base vcs0_mmio_bases[] = {
>>> { .gen = 11, .base = GEN11_BSD_RING_BASE },
>>> { .gen = 6, .base = GEN6_BSD_RING_BASE },
>>> { .gen = 4, .base = BSD_RING_BASE },
>>> { },
>>> };
>>>
>>> And then in intel_engines array, for BSD:
>>>
>>> {
>>> ...
>>> .mmio_bases = &vcs0_mmio_bases;
>>> ...
>>> },
>>>
>>> This way we limit data growth only to engines which change their mmio
>>> bases frequently.
>>>
>>> Just an idea.
>>>
>>
>> I gave this a try and the code actually grows:
>>
>> add/remove: 8/0 grow/shrink: 2/0 up/down: 120/0 (120)
>> Function old new delta
>> intel_engines 224 256 +32
>> vcs0_mmio_bases - 16 +16
>> vecs1_mmio_bases - 12 +12
>> vecs0_mmio_bases - 12 +12
>> vcs1_mmio_bases - 12 +12
>> vcs3_mmio_bases - 8 +8
>> vcs2_mmio_bases - 8 +8
>> rcs_mmio_bases - 8 +8
>> intel_engines_init_mmio 1264 1272 +8
>> bcs_mmio_bases - 4 +4
>> Total: Before=1479719, After=1479839, chg +0.01%
>>
>> I have no idea what the compiler is doing to grow intel_engines, since
>> by replacing and array of 3 u32s with a pointer I would expect a
>> shrink of 4 bytes per-engine. Anyway, even without the compiler
>> weirdness with
>
> It probably has to align the pointer to 8 bytes so that creates a hole.
> Moving mmio_base pointer higher up, either to the top or after two
> unsigned ints should work. Or packing the struct.
>
>> this method we would grow the code size of at least 4 bytes per engine
>> because we replace an array of 3 u32 (12B) with a pointer (8B) and an
>> array of at 2 or more u32 (8B+). I guess we can reconsider if/when one
>> engine reaches more than 3 mmio bases.
>
> Yeah it's fine. I was thinking that since you are almost there it makes
> sense to future proof it more, since no one will probably remember it
> later. But OK.
One idea to compact more, in addition to avoiding alignment holes, could
be to store the engine_mmio_base directly in the engine_mmio_base
pointer when it is a single entry, othewrise use a pointer to null
terminated array. Actually we could store two without table indirection
to be most optimal on 64-bit. Something like:
struct engine_mmio_base {
u32 base : 24; /* Must be LSB, */
u32 gen : 8;
};
union engine_mmio {
struct engine_mmio_base mmio_base[2];
struct engine_mmio_base *mmio_base_list;
};
#define MMIO_PTR_LIST BIT(0)
{
... render engine ...
.mmio = { (struct engine_mmio_base){.base = ..., ..gen = ...} },
...
},
{
...
.mmio = { .mmio_base_list = &vcs0_mmio_bases | MMIO_PTR_LIST }
...
},
This could be the best of both worlds, with up to two bases built-in,
and engines with more point to an array.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list