[PATCH] drm/panic: Use a decimal fifo to avoid u64 by u64 divide

Andrei Lalaev andrey.lalaev at gmail.com
Thu Jun 26 15:16:15 UTC 2025


On 26.06.25 16:19, Jocelyn Falempe wrote:
> On 25/06/2025 00:18, Jocelyn Falempe wrote:
>> On 24/06/2025 20:55, Andrei Lalaev wrote:
>>> On 18.04.25 18:48, Jocelyn Falempe wrote:
>>>> On 32bits ARM, u64/u64 is not supported [1], so change the algorithm
>>>> to use a simple fifo with decimal digits as u8 instead.
>>>> This is slower but should compile on all architecture.
>>>>
>>>> Link: https://lore.kernel.org/dri-devel/ CANiq72ke45eOwckMhWHvmwxc03dxr4rnxxKvx+HvWdBLopZfrQ at mail.gmail.com/ [1]
>>>> Signed-off-by: Jocelyn Falempe <jfalempe at redhat.com>
>>>> ---
>>>>   drivers/gpu/drm/drm_panic_qr.rs | 71 ++++++++++++++++++++++-----------
>>>>   1 file changed, 48 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/ drm_panic_qr.rs
>>>> index 6025a705530e..dd55b1cb764d 100644
>>>> --- a/drivers/gpu/drm/drm_panic_qr.rs
>>>> +++ b/drivers/gpu/drm/drm_panic_qr.rs
>>>> @@ -366,8 +366,48 @@ fn iter(&self) -> SegmentIterator<'_> {
>>>>           SegmentIterator {
>>>>               segment: self,
>>>>               offset: 0,
>>>> -            carry: 0,
>>>> -            carry_len: 0,
>>>> +            decfifo: Default::default(),
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>> +/// Max fifo size is 17 (max push) + 2 (max remaining)
>>>> +const MAX_FIFO_SIZE: usize = 19;
>>>> +
>>>> +/// A simple Decimal digit FIFO
>>>> +#[derive(Default)]
>>>> +struct DecFifo {
>>>> +    decimals: [u8; MAX_FIFO_SIZE],
>>>> +    len: usize,
>>>> +}
>>>> +
>>>> +impl DecFifo {
>>>> +    fn push(&mut self, data: u64, len: usize) {
>>>> +        let mut chunk = data;
>>>> +        for i in (0..self.len).rev() {
>>>> +            self.decimals[i + len] = self.decimals[i];
>>>> +        }
>>>> +        for i in 0..len {
>>>> +            self.decimals[i] = (chunk % 10) as u8;
>>>> +            chunk /= 10;
>>>> +        }
>>>> +        self.len += len;
>>>> +    }
>>>> +
>>>> +    /// Pop 3 decimal digits from the FIFO
>>>> +    fn pop3(&mut self) -> Option<(u16, usize)> {
>>>> +        if self.len == 0 {
>>>> +            None
>>>> +        } else {
>>>> +            let poplen = 3.min(self.len);
>>>> +            self.len -= poplen;
>>>> +            let mut out = 0;
>>>> +            let mut exp = 1;
>>>> +            for i in 0..poplen {
>>>> +                out += self.decimals[self.len + i] as u16 * exp;
>>>> +                exp *= 10;
>>>> +            }
>>>> +            Some((out, NUM_CHARS_BITS[poplen]))
>>>>           }
>>>>       }
>>>>   }
>>>> @@ -375,8 +415,7 @@ fn iter(&self) -> SegmentIterator<'_> {
>>>>   struct SegmentIterator<'a> {
>>>>       segment: &'a Segment<'a>,
>>>>       offset: usize,
>>>> -    carry: u64,
>>>> -    carry_len: usize,
>>>> +    decfifo: DecFifo,
>>>>   }
>>>>   impl Iterator for SegmentIterator<'_> {
>>>> @@ -394,31 +433,17 @@ fn next(&mut self) -> Option<Self::Item> {
>>>>                   }
>>>>               }
>>>>               Segment::Numeric(data) => {
>>>> -                if self.carry_len < 3 && self.offset < data.len() {
>>>> -                    // If there are less than 3 decimal digits in the carry,
>>>> -                    // take the next 7 bytes of input, and add them to the carry.
>>>> +                if self.decfifo.len < 3 && self.offset < data.len() {
>>>> +                    // If there are less than 3 decimal digits in the fifo,
>>>> +                    // take the next 7 bytes of input, and push them to the fifo.
>>>>                       let mut buf = [0u8; 8];
>>>>                       let len = 7.min(data.len() - self.offset);
>>>>                       buf[..len].copy_from_slice(&data[self.offset..self.offset + len]);
>>>>                       let chunk = u64::from_le_bytes(buf);
>>>> -                    let pow = u64::pow(10, BYTES_TO_DIGITS[len] as u32);
>>>> -                    self.carry = chunk + self.carry * pow;
>>>> +                    self.decfifo.push(chunk, BYTES_TO_DIGITS[len]);
>>>>                       self.offset += len;
>>>> -                    self.carry_len += BYTES_TO_DIGITS[len];
>>>> -                }
>>>> -                match self.carry_len {
>>>> -                    0 => None,
>>>> -                    len => {
>>>> -                        // take the next 3 decimal digits of the carry
>>>> -                        // and return 10bits of numeric data.
>>>> -                        let out_len = 3.min(len);
>>>> -                        self.carry_len -= out_len;
>>>> -                        let pow = u64::pow(10, self.carry_len as u32);
>>>> -                        let out = (self.carry / pow) as u16;
>>>> -                        self.carry %= pow;
>>>> -                        Some((out, NUM_CHARS_BITS[out_len]))
>>>> -                    }
>>>>                   }
>>>> +                self.decfifo.pop3()
>>>>               }
>>>>           }
>>>>       }
>>>>
>>>> base-commit: 74757ad1c105c8fc00b4cac0b7918fe3262cdb18
>>>
>>> Hi Jocelyn,
>>>
>>> Apologies for reviving this old thread, but I'm still encountering
>>> the same issue with the latest master (78f4e737a53e).
>>>
>>> When compiling this module for ARM32 (multi_v7_defconfig),
>>> I get the following error:
>>>
>>>      ld.lld: error: undefined symbol: __aeabi_uldivmod
>>>      >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/ drm_panic_qr.rs:392)
>>>      >>>               drivers/gpu/drm/drm_panic_qr.o: (<drm_panic_qr::SegmentIterator as core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a
>>>      >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/ drm_panic_qr.rs:392)
>>>      >>>               drivers/gpu/drm/drm_panic_qr.o: (<drm_panic_qr::SegmentIterator as core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a
>>>      >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/ drm_panic_qr.rs:392)
>>>      >>>               drivers/gpu/drm/drm_panic_qr.o: (<drm_panic_qr::SegmentIterator as core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a
>>>      >>> referenced 14 more times
>>>      >>> did you mean: __aeabi_uidivmod
>>>      >>> defined in: vmlinux.a(arch/arm/lib/lib1funcs.o)
>>>
>>> Since no one else has reported this in two months, I’m wondering
>>> if this might be a configuration issue on my end.
>>
>> Ok, that's surprising, the lines 391 and 392 are:
>>
>> self.decimals[i] = (chunk % 10) as u8;
>> chunk /= 10;
>>
>> So the compiler should be smart enough to do that without using a division.
>> I will try to reproduce, and see if I can fix that.
> 
> I reproduced the issues, it looks like clang doesn't do the optimization on ARM32:
> 
> https://github.com/llvm/llvm-project/issues/37280
> 
> So I've made a quick test with the following changes, and it builds:
> 
> 
> diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/drm_panic_qr.rs
> index dd55b1cb764d..57bd3c6465bb 100644
> --- a/drivers/gpu/drm/drm_panic_qr.rs
> +++ b/drivers/gpu/drm/drm_panic_qr.rs
> @@ -381,6 +381,20 @@ struct DecFifo {
>      len: usize,
>  }
> 
> +fn div10(val: u64) -> u64
> +{
> +    let val_h = val >> 32;
> +    let val_l = val & 0xFFFFFFFF;
> +    let b_h: u64 = 0x66666666;
> +    let b_l: u64 = 0x66666667;
> +
> +    let tmp1 = val_h * b_l + ((val_l * b_l) >> 32);
> +    let tmp2 = val_l * b_h + (tmp1 & 0xffffffff);
> +    let tmp3 = val_h * b_h + (tmp1 >> 32) + (tmp2 >> 32);
> +
> +    tmp3 >> 2
> +}
> +
>  impl DecFifo {
>      fn push(&mut self, data: u64, len: usize) {
>          let mut chunk = data;
> @@ -389,7 +403,7 @@ fn push(&mut self, data: u64, len: usize) {
>          }
>          for i in 0..len {
>              self.decimals[i] = (chunk % 10) as u8;
> -            chunk /= 10;
> +            chunk = div10(chunk);
>          }
>          self.len += len;
>      }
> 
> 
> Best regards,
> 

Compiles for me too on clang 20.1.6.

Thanks a lot!


-- 
Best regards,
Andrei Lalaev


More information about the dri-devel mailing list