[PATCH] drm/panic: Use a decimal fifo to avoid u64 by u64 divide
Andrei Lalaev
andrey.lalaev at gmail.com
Thu Jun 26 15:16:15 UTC 2025
On 26.06.25 16:19, Jocelyn Falempe wrote:
> On 25/06/2025 00:18, Jocelyn Falempe wrote:
>> On 24/06/2025 20:55, Andrei Lalaev wrote:
>>> On 18.04.25 18:48, Jocelyn Falempe wrote:
>>>> On 32bits ARM, u64/u64 is not supported [1], so change the algorithm
>>>> to use a simple fifo with decimal digits as u8 instead.
>>>> This is slower but should compile on all architecture.
>>>>
>>>> Link: https://lore.kernel.org/dri-devel/ CANiq72ke45eOwckMhWHvmwxc03dxr4rnxxKvx+HvWdBLopZfrQ at mail.gmail.com/ [1]
>>>> Signed-off-by: Jocelyn Falempe <jfalempe at redhat.com>
>>>> ---
>>>> drivers/gpu/drm/drm_panic_qr.rs | 71 ++++++++++++++++++++++-----------
>>>> 1 file changed, 48 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/ drm_panic_qr.rs
>>>> index 6025a705530e..dd55b1cb764d 100644
>>>> --- a/drivers/gpu/drm/drm_panic_qr.rs
>>>> +++ b/drivers/gpu/drm/drm_panic_qr.rs
>>>> @@ -366,8 +366,48 @@ fn iter(&self) -> SegmentIterator<'_> {
>>>> SegmentIterator {
>>>> segment: self,
>>>> offset: 0,
>>>> - carry: 0,
>>>> - carry_len: 0,
>>>> + decfifo: Default::default(),
>>>> + }
>>>> + }
>>>> +}
>>>> +
>>>> +/// Max fifo size is 17 (max push) + 2 (max remaining)
>>>> +const MAX_FIFO_SIZE: usize = 19;
>>>> +
>>>> +/// A simple Decimal digit FIFO
>>>> +#[derive(Default)]
>>>> +struct DecFifo {
>>>> + decimals: [u8; MAX_FIFO_SIZE],
>>>> + len: usize,
>>>> +}
>>>> +
>>>> +impl DecFifo {
>>>> + fn push(&mut self, data: u64, len: usize) {
>>>> + let mut chunk = data;
>>>> + for i in (0..self.len).rev() {
>>>> + self.decimals[i + len] = self.decimals[i];
>>>> + }
>>>> + for i in 0..len {
>>>> + self.decimals[i] = (chunk % 10) as u8;
>>>> + chunk /= 10;
>>>> + }
>>>> + self.len += len;
>>>> + }
>>>> +
>>>> + /// Pop 3 decimal digits from the FIFO
>>>> + fn pop3(&mut self) -> Option<(u16, usize)> {
>>>> + if self.len == 0 {
>>>> + None
>>>> + } else {
>>>> + let poplen = 3.min(self.len);
>>>> + self.len -= poplen;
>>>> + let mut out = 0;
>>>> + let mut exp = 1;
>>>> + for i in 0..poplen {
>>>> + out += self.decimals[self.len + i] as u16 * exp;
>>>> + exp *= 10;
>>>> + }
>>>> + Some((out, NUM_CHARS_BITS[poplen]))
>>>> }
>>>> }
>>>> }
>>>> @@ -375,8 +415,7 @@ fn iter(&self) -> SegmentIterator<'_> {
>>>> struct SegmentIterator<'a> {
>>>> segment: &'a Segment<'a>,
>>>> offset: usize,
>>>> - carry: u64,
>>>> - carry_len: usize,
>>>> + decfifo: DecFifo,
>>>> }
>>>> impl Iterator for SegmentIterator<'_> {
>>>> @@ -394,31 +433,17 @@ fn next(&mut self) -> Option<Self::Item> {
>>>> }
>>>> }
>>>> Segment::Numeric(data) => {
>>>> - if self.carry_len < 3 && self.offset < data.len() {
>>>> - // If there are less than 3 decimal digits in the carry,
>>>> - // take the next 7 bytes of input, and add them to the carry.
>>>> + if self.decfifo.len < 3 && self.offset < data.len() {
>>>> + // If there are less than 3 decimal digits in the fifo,
>>>> + // take the next 7 bytes of input, and push them to the fifo.
>>>> let mut buf = [0u8; 8];
>>>> let len = 7.min(data.len() - self.offset);
>>>> buf[..len].copy_from_slice(&data[self.offset..self.offset + len]);
>>>> let chunk = u64::from_le_bytes(buf);
>>>> - let pow = u64::pow(10, BYTES_TO_DIGITS[len] as u32);
>>>> - self.carry = chunk + self.carry * pow;
>>>> + self.decfifo.push(chunk, BYTES_TO_DIGITS[len]);
>>>> self.offset += len;
>>>> - self.carry_len += BYTES_TO_DIGITS[len];
>>>> - }
>>>> - match self.carry_len {
>>>> - 0 => None,
>>>> - len => {
>>>> - // take the next 3 decimal digits of the carry
>>>> - // and return 10bits of numeric data.
>>>> - let out_len = 3.min(len);
>>>> - self.carry_len -= out_len;
>>>> - let pow = u64::pow(10, self.carry_len as u32);
>>>> - let out = (self.carry / pow) as u16;
>>>> - self.carry %= pow;
>>>> - Some((out, NUM_CHARS_BITS[out_len]))
>>>> - }
>>>> }
>>>> + self.decfifo.pop3()
>>>> }
>>>> }
>>>> }
>>>>
>>>> base-commit: 74757ad1c105c8fc00b4cac0b7918fe3262cdb18
>>>
>>> Hi Jocelyn,
>>>
>>> Apologies for reviving this old thread, but I'm still encountering
>>> the same issue with the latest master (78f4e737a53e).
>>>
>>> When compiling this module for ARM32 (multi_v7_defconfig),
>>> I get the following error:
>>>
>>> ld.lld: error: undefined symbol: __aeabi_uldivmod
>>> >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/ drm_panic_qr.rs:392)
>>> >>> drivers/gpu/drm/drm_panic_qr.o: (<drm_panic_qr::SegmentIterator as core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a
>>> >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/ drm_panic_qr.rs:392)
>>> >>> drivers/gpu/drm/drm_panic_qr.o: (<drm_panic_qr::SegmentIterator as core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a
>>> >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/ drm_panic_qr.rs:392)
>>> >>> drivers/gpu/drm/drm_panic_qr.o: (<drm_panic_qr::SegmentIterator as core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a
>>> >>> referenced 14 more times
>>> >>> did you mean: __aeabi_uidivmod
>>> >>> defined in: vmlinux.a(arch/arm/lib/lib1funcs.o)
>>>
>>> Since no one else has reported this in two months, I’m wondering
>>> if this might be a configuration issue on my end.
>>
>> Ok, that's surprising, the lines 391 and 392 are:
>>
>> self.decimals[i] = (chunk % 10) as u8;
>> chunk /= 10;
>>
>> So the compiler should be smart enough to do that without using a division.
>> I will try to reproduce, and see if I can fix that.
>
> I reproduced the issues, it looks like clang doesn't do the optimization on ARM32:
>
> https://github.com/llvm/llvm-project/issues/37280
>
> So I've made a quick test with the following changes, and it builds:
>
>
> diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/drm_panic_qr.rs
> index dd55b1cb764d..57bd3c6465bb 100644
> --- a/drivers/gpu/drm/drm_panic_qr.rs
> +++ b/drivers/gpu/drm/drm_panic_qr.rs
> @@ -381,6 +381,20 @@ struct DecFifo {
> len: usize,
> }
>
> +fn div10(val: u64) -> u64
> +{
> + let val_h = val >> 32;
> + let val_l = val & 0xFFFFFFFF;
> + let b_h: u64 = 0x66666666;
> + let b_l: u64 = 0x66666667;
> +
> + let tmp1 = val_h * b_l + ((val_l * b_l) >> 32);
> + let tmp2 = val_l * b_h + (tmp1 & 0xffffffff);
> + let tmp3 = val_h * b_h + (tmp1 >> 32) + (tmp2 >> 32);
> +
> + tmp3 >> 2
> +}
> +
> impl DecFifo {
> fn push(&mut self, data: u64, len: usize) {
> let mut chunk = data;
> @@ -389,7 +403,7 @@ fn push(&mut self, data: u64, len: usize) {
> }
> for i in 0..len {
> self.decimals[i] = (chunk % 10) as u8;
> - chunk /= 10;
> + chunk = div10(chunk);
> }
> self.len += len;
> }
>
>
> Best regards,
>
Compiles for me too on clang 20.1.6.
Thanks a lot!
--
Best regards,
Andrei Lalaev
More information about the dri-devel
mailing list