[Intel-gfx] [RFC] drm/i915: store all mmio bases in intel_engines
Daniele Ceraolo Spurio
daniele.ceraolospurio at intel.com
Fri Mar 9 18:47:46 UTC 2018
On 09/03/18 01:53, Tvrtko Ursulin wrote:
>
> On 08/03/2018 18:46, Daniele Ceraolo Spurio wrote:
>> On 08/03/18 01:31, Tvrtko Ursulin wrote:
>>>
>>> On 07/03/2018 19:45, Daniele Ceraolo Spurio wrote:
>>>> The mmio bases we're currently storing in the intel_engines array are
>>>> only valid for a subset of gens, so we need to ignore them and use
>>>> different values in some cases. Instead of doing that, we can have a
>>>> table of [starting gen, mmio base] pairs for each engine in
>>>> intel_engines and select the correct one based on the gen we're running
>>>> on in a consistent way.
>>>>
>>>> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>>> ---
>>>> drivers/gpu/drm/i915/intel_engine_cs.c | 77
>>>> +++++++++++++++++++++------------
>>>> drivers/gpu/drm/i915/intel_ringbuffer.c | 1 -
>>>> 2 files changed, 49 insertions(+), 29 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c
>>>> b/drivers/gpu/drm/i915/intel_engine_cs.c
>>>> index 4ba139c27fba..1dd92cac8d67 100644
>>>> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
>>>> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
>>>> @@ -81,12 +81,16 @@ static const struct engine_class_info
>>>> intel_engine_classes[] = {
>>>> },
>>>> };
>>>> +#define MAX_MMIO_BASES 3
>>>> struct engine_info {
>>>> unsigned int hw_id;
>>>> unsigned int uabi_id;
>>>> u8 class;
>>>> u8 instance;
>>>> - u32 mmio_base;
>>>> + struct engine_mmio_base {
>>>> + u32 gen : 8;
>>>> + u32 base : 24;
>>>> + } mmio_bases[MAX_MMIO_BASES];
>>>> unsigned irq_shift;
>>>> };
>>>
>>> It's not bad, but I am just wondering if it is too complicated versus
>>> simply duplicating the tables.
>>>
>>> Duplicated tables would certainly be much less code and complexity,
>>> but what about the size of pure data?
>>>
>>> In this patch sizeof(struct engine_info) grows by MAX_MMIO_BASES *
>>> sizeof(u32), so 12, to the total of 30 if I am not mistaken. Times
>>> NUM_ENGINES equals 240 bytes for intel_engines[] array with this scheme.
>>>
>>
>> we remove a u32 (the old mmio base) so we only grow 8 per engine, but
>> the compiler rounds up to a multiple of u32 so 28 per engine, for a
>> total of 224.
>>
>>> Separate tables per gens would be:
>>>
>>> sizeof(struct engine_info) is 18 bytes.
>>>
>>> For < gen6 we'd need 3 engines * 18 = 54
>>> < gen11 = 5 engines * 18 = 90
>>> >= gen11 = 8 engines * 18 = 144
>>>
>>> 54 + 90 + 144 = 288 bytes
>>>
>>> So a little bit bigger. Hm. Plus we would need to refactor so
>>> intel_engines[] is not indexed by engine->id but is contiguous array
>>> with engine->id in the elements. Code to lookup the compact array
>>> should be simpler than combined new code from this patch, especially
>>> if you add the test as Chris suggested. So separate engine info
>>> arrays would win I think overall - when considering size of text +
>>> size of data + size of source code.
>>>
>>
>> Not sure. I would exclude the selftest, which is usually not compiled
>> for released kernels, which leads to:
>
> Yes, but we cannot exclude its source since selftests for separate
> tables wouldn't be needed.
>
>> add/remove: 0/0 grow/shrink: 3/1 up/down: 100/-7 (93)
>> Function old new delta
>> intel_engines 160 224 +64
>> __func__ 13891 13910 +19
>> intel_engines_init_mmio 1247 1264 +17
>> intel_init_bsd_ring_buffer 142 135 -7
>> Total: Before=1479626, After=1479719, chg +0.01%
>>
>> Total growth is 93, which is less then your estimation for the growth
>> introduced by replicating the table.
>>
>>> But on the other hand your solution might be more future proof. So I
>>> don't know. Use the crystal ball to decide? :)
>>>
>>
>> I think we should expect that the total engine count could grow with
>> future gens. In this case to me adding a new entry to a unified table
>> seems much cleaner (and uses less data) than adding a completely new
>> table each time.
>
> Okay I was off in my estimates. But in general I was coming from the
> angle of thinking that every new mmio base difference in this scheme
> grows the size for all engines. So if just VCS grows one new base,
> impact is multiplied by NUM_ENGINES. But maybe separate tables are also
> not a solution. Perhaps pulling out mmio_base arrays to outside struct
> engine_info in another set of static tables, so they could be null
> terminated would be best of both worlds?
>
> struct engine_mmio_base {
> u32 gen : 8;
> u32 base : 24;
> };
>
> static const struct engine_mmio_base vcs0_mmio_bases[] = {
> { .gen = 11, .base = GEN11_BSD_RING_BASE },
> { .gen = 6, .base = GEN6_BSD_RING_BASE },
> { .gen = 4, .base = BSD_RING_BASE },
> { },
> };
>
> And then in intel_engines array, for BSD:
>
> {
> ...
> .mmio_bases = &vcs0_mmio_bases;
> ...
> },
>
> This way we limit data growth only to engines which change their mmio
> bases frequently.
>
> Just an idea.
>
I gave this a try and the code actually grows:
add/remove: 8/0 grow/shrink: 2/0 up/down: 120/0 (120)
Function old new delta
intel_engines 224 256 +32
vcs0_mmio_bases - 16 +16
vecs1_mmio_bases - 12 +12
vecs0_mmio_bases - 12 +12
vcs1_mmio_bases - 12 +12
vcs3_mmio_bases - 8 +8
vcs2_mmio_bases - 8 +8
rcs_mmio_bases - 8 +8
intel_engines_init_mmio 1264 1272 +8
bcs_mmio_bases - 4 +4
Total: Before=1479719, After=1479839, chg +0.01%
I have no idea what the compiler is doing to grow intel_engines, since
by replacing and array of 3 u32s with a pointer I would expect a shrink
of 4 bytes per-engine. Anyway, even without the compiler weirdness with
this method we would grow the code size of at least 4 bytes per engine
because we replace an array of 3 u32 (12B) with a pointer (8B) and an
array of at 2 or more u32 (8B+). I guess we can reconsider if/when one
engine reaches more than 3 mmio bases.
Thanks,
Daniele
> Regards,
>
> Tvrtko
More information about the Intel-gfx
mailing list