[PATCH] drm/panic: Use a decimal fifo to avoid u64 by u64 divide
Jocelyn Falempe
jfalempe at redhat.com
Thu Jun 26 14:19:00 UTC 2025
On 25/06/2025 00:18, Jocelyn Falempe wrote:
> On 24/06/2025 20:55, Andrei Lalaev wrote:
>> On 18.04.25 18:48, Jocelyn Falempe wrote:
>>> On 32bits ARM, u64/u64 is not supported [1], so change the algorithm
>>> to use a simple fifo with decimal digits as u8 instead.
>>> This is slower but should compile on all architecture.
>>>
>>> Link: https://lore.kernel.org/dri-devel/
>>> CANiq72ke45eOwckMhWHvmwxc03dxr4rnxxKvx+HvWdBLopZfrQ at mail.gmail.com/ [1]
>>> Signed-off-by: Jocelyn Falempe <jfalempe at redhat.com>
>>> ---
>>> drivers/gpu/drm/drm_panic_qr.rs | 71 ++++++++++++++++++++++-----------
>>> 1 file changed, 48 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_panic_qr.rs b/drivers/gpu/drm/
>>> drm_panic_qr.rs
>>> index 6025a705530e..dd55b1cb764d 100644
>>> --- a/drivers/gpu/drm/drm_panic_qr.rs
>>> +++ b/drivers/gpu/drm/drm_panic_qr.rs
>>> @@ -366,8 +366,48 @@ fn iter(&self) -> SegmentIterator<'_> {
>>> SegmentIterator {
>>> segment: self,
>>> offset: 0,
>>> - carry: 0,
>>> - carry_len: 0,
>>> + decfifo: Default::default(),
>>> + }
>>> + }
>>> +}
>>> +
>>> +/// Max fifo size is 17 (max push) + 2 (max remaining)
>>> +const MAX_FIFO_SIZE: usize = 19;
>>> +
>>> +/// A simple Decimal digit FIFO
>>> +#[derive(Default)]
>>> +struct DecFifo {
>>> + decimals: [u8; MAX_FIFO_SIZE],
>>> + len: usize,
>>> +}
>>> +
>>> +impl DecFifo {
>>> + fn push(&mut self, data: u64, len: usize) {
>>> + let mut chunk = data;
>>> + for i in (0..self.len).rev() {
>>> + self.decimals[i + len] = self.decimals[i];
>>> + }
>>> + for i in 0..len {
>>> + self.decimals[i] = (chunk % 10) as u8;
>>> + chunk /= 10;
>>> + }
>>> + self.len += len;
>>> + }
>>> +
>>> + /// Pop 3 decimal digits from the FIFO
>>> + fn pop3(&mut self) -> Option<(u16, usize)> {
>>> + if self.len == 0 {
>>> + None
>>> + } else {
>>> + let poplen = 3.min(self.len);
>>> + self.len -= poplen;
>>> + let mut out = 0;
>>> + let mut exp = 1;
>>> + for i in 0..poplen {
>>> + out += self.decimals[self.len + i] as u16 * exp;
>>> + exp *= 10;
>>> + }
>>> + Some((out, NUM_CHARS_BITS[poplen]))
>>> }
>>> }
>>> }
>>> @@ -375,8 +415,7 @@ fn iter(&self) -> SegmentIterator<'_> {
>>> struct SegmentIterator<'a> {
>>> segment: &'a Segment<'a>,
>>> offset: usize,
>>> - carry: u64,
>>> - carry_len: usize,
>>> + decfifo: DecFifo,
>>> }
>>> impl Iterator for SegmentIterator<'_> {
>>> @@ -394,31 +433,17 @@ fn next(&mut self) -> Option<Self::Item> {
>>> }
>>> }
>>> Segment::Numeric(data) => {
>>> - if self.carry_len < 3 && self.offset < data.len() {
>>> - // If there are less than 3 decimal digits in
>>> the carry,
>>> - // take the next 7 bytes of input, and add them
>>> to the carry.
>>> + if self.decfifo.len < 3 && self.offset < data.len() {
>>> + // If there are less than 3 decimal digits in
>>> the fifo,
>>> + // take the next 7 bytes of input, and push them
>>> to the fifo.
>>> let mut buf = [0u8; 8];
>>> let len = 7.min(data.len() - self.offset);
>>>
>>> buf[..len].copy_from_slice(&data[self.offset..self.offset + len]);
>>> let chunk = u64::from_le_bytes(buf);
>>> - let pow = u64::pow(10, BYTES_TO_DIGITS[len] as
>>> u32);
>>> - self.carry = chunk + self.carry * pow;
>>> + self.decfifo.push(chunk, BYTES_TO_DIGITS[len]);
>>> self.offset += len;
>>> - self.carry_len += BYTES_TO_DIGITS[len];
>>> - }
>>> - match self.carry_len {
>>> - 0 => None,
>>> - len => {
>>> - // take the next 3 decimal digits of the carry
>>> - // and return 10bits of numeric data.
>>> - let out_len = 3.min(len);
>>> - self.carry_len -= out_len;
>>> - let pow = u64::pow(10, self.carry_len as u32);
>>> - let out = (self.carry / pow) as u16;
>>> - self.carry %= pow;
>>> - Some((out, NUM_CHARS_BITS[out_len]))
>>> - }
>>> }
>>> + self.decfifo.pop3()
>>> }
>>> }
>>> }
>>>
>>> base-commit: 74757ad1c105c8fc00b4cac0b7918fe3262cdb18
>>
>> Hi Jocelyn,
>>
>> Apologies for reviving this old thread, but I'm still encountering
>> the same issue with the latest master (78f4e737a53e).
>>
>> When compiling this module for ARM32 (multi_v7_defconfig),
>> I get the following error:
>>
>> ld.lld: error: undefined symbol: __aeabi_uldivmod
>> >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/
>> drm_panic_qr.rs:392)
>> >>> drivers/gpu/drm/drm_panic_qr.o:
>> (<drm_panic_qr::SegmentIterator as
>> core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a
>> >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/
>> drm_panic_qr.rs:392)
>> >>> drivers/gpu/drm/drm_panic_qr.o:
>> (<drm_panic_qr::SegmentIterator as
>> core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a
>> >>> referenced by drm_panic_qr.rs:392 (drivers/gpu/drm/
>> drm_panic_qr.rs:392)
>> >>> drivers/gpu/drm/drm_panic_qr.o:
>> (<drm_panic_qr::SegmentIterator as
>> core::iter::traits::iterator::Iterator>::next) in archive vmlinux.a
>> >>> referenced 14 more times
>> >>> did you mean: __aeabi_uidivmod
>> >>> defined in: vmlinux.a(arch/arm/lib/lib1funcs.o)
>>
>> Since no one else has reported this in two months, I’m wondering
>> if this might be a configuration issue on my end.
>
> Ok, that's surprising, the lines 391 and 392 are:
>
> self.decimals[i] = (chunk % 10) as u8;
> chunk /= 10;
>
> So the compiler should be smart enough to do that without using a division.
> I will try to reproduce, and see if I can fix that.
I reproduced the issues, it looks like clang doesn't do the optimization
on ARM32:
https://github.com/llvm/llvm-project/issues/37280
So I've made a quick test with the following changes, and it builds:
diff --git a/drivers/gpu/drm/drm_panic_qr.rs
b/drivers/gpu/drm/drm_panic_qr.rs
index dd55b1cb764d..57bd3c6465bb 100644
--- a/drivers/gpu/drm/drm_panic_qr.rs
+++ b/drivers/gpu/drm/drm_panic_qr.rs
@@ -381,6 +381,20 @@ struct DecFifo {
len: usize,
}
+fn div10(val: u64) -> u64
+{
+ let val_h = val >> 32;
+ let val_l = val & 0xFFFFFFFF;
+ let b_h: u64 = 0x66666666;
+ let b_l: u64 = 0x66666667;
+
+ let tmp1 = val_h * b_l + ((val_l * b_l) >> 32);
+ let tmp2 = val_l * b_h + (tmp1 & 0xffffffff);
+ let tmp3 = val_h * b_h + (tmp1 >> 32) + (tmp2 >> 32);
+
+ tmp3 >> 2
+}
+
impl DecFifo {
fn push(&mut self, data: u64, len: usize) {
let mut chunk = data;
@@ -389,7 +403,7 @@ fn push(&mut self, data: u64, len: usize) {
}
for i in 0..len {
self.decimals[i] = (chunk % 10) as u8;
- chunk /= 10;
+ chunk = div10(chunk);
}
self.len += len;
}
Best regards,
--
Jocelyn
>
> Best regards,
>
>>
>> Thanks a lot!
>>
>
More information about the dri-devel
mailing list