[Intel-gfx] [PATCH RESEND 1/2] drm/i915: make device info bitfield flags bools
Jani Nikula
jani.nikula at linux.intel.com
Mon May 16 15:24:36 UTC 2016
On Mon, 16 May 2016, Dave Gordon <david.s.gordon at intel.com> wrote:
> On 13/05/16 16:05, Tvrtko Ursulin wrote:
>>
>> On 13/05/16 15:47, Jani Nikula wrote:
>>> On Fri, 13 May 2016, Chris Wilson <chris at chris-wilson.co.uk> wrote:
>>>> On Fri, May 13, 2016 at 03:25:05PM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 13/05/16 15:04, Jani Nikula wrote:
>>>>>> This is more robust for assignments and comparisons.
>>>>>>
>>>>>> Signed-off-by: Jani Nikula <jani.nikula at intel.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/i915/i915_drv.h | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>>>>> b/drivers/gpu/drm/i915/i915_drv.h
>>>>>> index d9d07b70f05c..bb0b6f64000e 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>>> @@ -752,7 +752,7 @@ struct intel_csr {
>>>>>> func(has_ddi) sep \
>>>>>> func(has_fpga_dbg)
>>>>>>
>>>>>> -#define DEFINE_FLAG(name) u8 name:1
>>>>>> +#define DEFINE_FLAG(name) bool name:1
>>>>>> #define SEP_SEMICOLON ;
>>>>>>
>>>>>> struct intel_device_info {
>>>>>>
>>>>>
>>>>> The churn virus spreads? :)
>>>>>
>>>>> I tried that but it was negatively impacting the compiler. For some
>>>>> reason it increases .text by 2.5k here. Don't see anything obvious,
>>>>> would have to look at the code more closely to spot exactly why.
>>>>
>>>> Oh, that's not fun. bool:1 holds such promise for a clear explanation of
>>>> the most common form of bitfield.
>>>
>>> Really a bummer, especially since assigning any positive even number to
>>> unsigned int foo:1 will result in 0.
>>
>> That is a pretty strong argument to go for this rather than make sure
>> all places which set them are correct.
>>
>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> Regards,
>> Tvrtko
>
> Unfortunately, the compiler doesn't generate such good code when using
> 'bool' vs. 'u8' for these bitfields.
I've pushed patch 2/2 while the jury seems to be still out for 1/2.
Generally we might prefer bools anyway, but this struct is const in
dev_priv with the idea that this is immutable data, and we override the
const briefly on driver load in intel_device_info_runtime_init().
Checking all usage is not a huge effort, although we might still screw
this up later on...
BR,
Jani.
>
> Firstly, it looks like the compiler DOESN'T combine bool-bitfield tests
> such as "if (IS_A() || IS_B())", whereas it does if they're u8. Example:
>
> With bitfields as 'u8'
>
> if (IS_HASWELL(dev) || IS_BROADWELL(dev))
> 48 8b 57 28 mov 0x28(%rdi),%rdx # dev->dev_priv
> 0f b6 52 30 movzbl 0x30(%rdx),%edx # dev_priv->flag byte
> f6 c2 60 test $0x60,%dl # (haswell|broadwell)
> 0f 85 bb 00 00 00 jne 12e4 # branch to if-true
>
> else if (IS_VALLEYVIEW(dev) || IS_CHERRYVIEW(dev))
> 83 e2 18 and $0x18,%edx # test two more flags
> 0f 84 1c 01 00 00 je 134e # skip if neither
> ... code for VLV/CHV ...
> (code for HSW/BDW is out-of-line, at the end of the function)
>
> With bitfields as bools:
>
> 48 8b 57 28 mov 0x28(%rdi),%rdx # dev->dev_priv
> 0f b6 52 30 movzbl 0x30(%rdx),%edx # dev_priv->flag byte
> f6 c2 20 test $0x20,%dl # test one flag
> 75 09 jne 1241 # jump if true
> f6 c2 40 test $0x40,%dl # test other flag
> 0f 84 b7 00 00 00 je 12f8 # skip if not
> ... code for HSW/BDW ...
> (code for VLV/CHV/other is later)
>
> So that would suggest that the compiler generates more efficient code
> for 'u8' (at least it didn't reload the flag byte even in the 'bool'
> case). Here's another case:
>
> With bitfields as bools:
>
> dev_priv->psr_mmio_base = IS_HASWELL(dev_priv) ?
> HSW_EDP_PSR_BASE : BDW_EDP_PSR_BASE;
> 0f b6 43 30 movzbl 0x30(%rbx),%eax # flags byte
> c0 e8 05 shr $0x5,%al # haswell->bit 0
> 83 e0 01 and $0x1,%eax # leaves bit 0 only
> 3c 01 cmp $0x1,%al # compare to 1u
> 19 c0 sbb %eax,%eax # convert to 0/-1
> 25 00 b0 00 00 and $0xb000,%eax # convert to 0/b000
> 05 00 48 06 00 add $0x64800,%eax # final result
> 89 83 e8 20 00 00 mov %eax,0x20e8(%rbx)# store it
>
> This is ingenious code, avoiding any branching. Now with 'u8':
>
> 0f b6 43 30 movzbl 0x30(%rbx),%eax
> 83 e0 20 and $0x20,%eax # leaves bit 5 only
> 3c 01 cmp $0x1,%al # compare to 1u
> 19 c0 sbb %eax,%eax # convert to 0/-1
> 25 00 b0 00 00 and $0xb000,%eax # etc
> 05 00 48 06 00 add $0x64800,%eax
> 89 83 e8 20 00 00 mov %eax,0x20e8(%rbx)
>
> Again it's shorter, because the compiler has been able to extract a
> truth-value without shifting the "haswell" bit down to bit 0 first.
>
> .Dave.
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Jani Nikula, Intel Open Source Technology Center
More information about the Intel-gfx
mailing list