Screen corruption using radeon kernel driver
Mikhail Krylov
sqarert at gmail.com
Wed Nov 30 19:59:49 UTC 2022
On Wed, Nov 30, 2022 at 11:07:32AM -0500, Alex Deucher wrote:
> On Wed, Nov 30, 2022 at 10:42 AM Robin Murphy <robin.murphy at arm.com> wrote:
> >
> > On 2022-11-30 14:28, Alex Deucher wrote:
> > > On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy <robin.murphy at arm.com> wrote:
> > >>
> > >> On 2022-11-29 17:11, Mikhail Krylov wrote:
> > >>> On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:
> > >>>> On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov <sqarert at gmail.com> wrote:
> > >>>>>
> > >>>>> On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
> > >>>>>> On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov <sqarert at gmail.com> wrote:
> > >>>>>>>
> > >>>>>>> On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
> > >>>>>>>
> > >>>>>>>>>> [excessive quoting removed]
> > >>>>>>>
> > >>>>>>>>> So, is there any progress on this issue? I do understand it's not a high
> > >>>>>>>>> priority one, and today I've checked it on 6.0 kernel, and
> > >>>>>>>>> unfortunately, it still persists...
> > >>>>>>>>>
> > >>>>>>>>> I'm considering writing a patch that will allow user to override
> > >>>>>>>>> need_dma32/dma_bits setting with a module parameter. I'll have some time
> > >>>>>>>>> after the New Year for that.
> > >>>>>>>>>
> > >>>>>>>>> Is it at all possible that such a patch will be merged into kernel?
> > >>>>>>>>>
> > >>>>>>>> On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov <sqarert at gmail.com> wrote:
> > >>>>>>>> Unless someone familiar with HIMEM can figure out what is going wrong
> > >>>>>>>> we should just revert the patch.
> > >>>>>>>>
> > >>>>>>>> Alex
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Okay, I was suggesting that mostly because
> > >>>>>>>
> > >>>>>>> a) it works for me with dma_bits = 40 (I understand that's what it is
> > >>>>>>> without the original patch applied);
> > >>>>>>>
> > >>>>>>> b) there's a hint of uncertainity on this line
> > >>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> > >>>>>>> saying that for AGP dma_bits = 32 is the safest option, so apparently there are
> > >>>>>>> setups, unlike mine, where dma_bits = 32 is better than 40.
> > >>>>>>>
> > >>>>>>> But I'm in no position to argue, just wanted to make myself clear.
> > >>>>>>> I'm okay with rebuilding the kernel for my machine until the original
> > >>>>>>> patch is reverted or any other fix is applied.
> > >>>>>>
> > >>>>>> What GPU do you have and is it AGP? If it is AGP, does setting
> > >>>>>> radeon.agpmode=-1 also fix it?
> > >>>>>>
> > >>>>>> Alex
> > >>>>>
> > >>>>> That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
> > >>>>> help, it just makes 3D acceleration in games such as OpenArena stop
> > >>>>> working.
> > >>>>
> > >>>> Just to confirm, is the board AGP or PCIe?
> > >>>>
> > >>>> Alex
> > >>>
> > >>> It is AGP. That's an old machine.
> > >>
> > >> Can you check whether dma_addressing_limited() is actually returning the
> > >> expected result at the point of radeon_ttm_init()? Disabling highmem is
> > >> presumably just hiding whatever problem exists, by throwing away all
> > >> >32-bit RAM such that use_dma32 doesn't matter.
> > >
> > > The device in question only supports a 32 bit DMA mask so
> > > dma_addressing_limited() should return true. Bounce buffers are not
> > > really usable on GPUs because they map so much memory. If
> > > dma_addressing_limited() returns false, that would explain it.
> >
> > Right, it appears to be the only part of the offending commit that
> > *could* reasonably make any difference, so I'm primarily wondering if
> > dma_get_required_mask() somehow gets confused.
>
> Mikhail,
>
> Can you see that dma_addressing_limited() and dma_get_required_mask()
> return in this case?
>
> Alex
>
>
> >
> > Thanks,
> > Robin.
Unfortunately, right now I don't have enough time for kernel
modifications and rebuilds (I will later!), so I did a quick-and-dirty
research with kprobe.
The problem is that dma_addressing_limited() seems to be inlined and
kprobe fails to intercept it.
But I managed to get the result of dma_get_required_mask(). It returns
0x7fffffff (!) on the vanilla (with the patch, buggy) kernel:
$ sudo kprobe-perf 'r:dma_get_required_mask $retval'
Tracing kprobe dma_get_required_mask. Ctrl-C to end.
modprobe-1244 [000] d... 105.582816: dma_get_required_mask: (radeon_ttm_init+0x61/0x240 [radeon] <- dma_get_required_mask) arg1=0x7fffffff
This function does not even get called in the kernel without the patch
that I built myself. I believe that's because ttm_bo_device_init()
doesn't call it without the patch.
Hope that helps at least a bit. If not, I'll be able to do more thorough
research in a couple of weeks, probably.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20221130/f163350b/attachment-0001.sig>
More information about the amd-gfx
mailing list