[PATCH 2/2] drm/xe/display: Unify display page table mapping
Tvrtko Ursulin
tursulin at ursulin.net
Tue Feb 11 11:22:26 UTC 2025
On 07/02/2025 14:33, Juha-Pekka Heikkilä wrote:
> On Tue, Feb 4, 2025 at 8:05 PM Tvrtko Ursulin <tursulin at ursulin.net> wrote:
>>
>>
>> On 04/02/2025 13:11, Juha-Pekka Heikkilä wrote:
>>> On Tue, Feb 4, 2025 at 2:31 PM Tvrtko Ursulin <tursulin at ursulin.net> wrote:
>>>>
>>>>
>>>> On 04/02/2025 12:10, Juha-Pekka Heikkilä wrote:
>>>>> On Tue, Feb 4, 2025 at 1:38 PM Tvrtko Ursulin <tursulin at ursulin.net> wrote:
>>>>>>
>>>>>>
>>>>>> On 04/02/2025 11:25, Tvrtko Ursulin wrote:
>>>>>>>
>>>>>>> On 14/01/2025 18:04, Juha-Pekka Heikkila wrote:
>>>>>>>> Unify writing of remapped and straight DPTs. Take out writing of
>>>>>>>> rotated DPT since Xe doesn't support any platform that does
>>>>>>>> 90 degree rotated framebuffers.
>>>>>>>
>>>>>>> Consider giving my series which fixes AuxCSS a look instead? Given there
>>>>>>> is external interest to have all that work on ADL-P. I could respin with
>>>>>>> rotated cleaned up in a similar fashion I did for remapped. Maintenance
>>>>>>> cost is minimal since it is all very contained.
>>>>>>
>>>>>> Oh ADL-P does not support hardware rotation. I missed the fact it is
>>>>>> display version 13..
>>>>>>
>>>>>> But the point regarding changes to remapping code are still valid I
>>>>>> think, since to suport aux plane the refactoring from this patch looks
>>>>>> in the wrong direction.
>>>>>>
>>>>>
>>>>> Yea, I kinda agree here with Tvrtko. I did earlier mention this to
>>>>> JSaa. That aux thing will need to be checked.
>>>>
>>>> Ah cool, fingers crossed.
>>>>
>>>>> Though, how far are we with your set Tvrtko? It did seem lot kms_ccs
>>>>> tests failed.
>>>>> https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-144186v1/shards-all.html?testfilter=kms_ccs.*crc.*gen12&hosts=adlp
>>>>>
>>>>> Did you check you had compression when you had clean image on your
>>>>> machine? (aux surface was not zeroes) Those failures on my link ci are
>>>>> with failing crc which mean the picture on screen was not correct.
>>>>
>>>> I was looking into this just this morning and posted some
>>>> half-conclusions here:
>>>>
>>>> https://lore.kernel.org/intel-xe/eb3a3729-e8ef-4868-b8ff-b6efd66ef4fa@igalia.com/T/#u
>>>>
>>>> When you say test with a clear image that I did not. I smoke tested it
>>>> in desktop usage and looked at kms_flip_tiling which visually looks
>>>> correct but as the above msg-id talks about, something is potentially
>>>> fishy with crc collections and/or vblank handling. Since display isn't
>>>> my area of expertise that is as far as I could get.
>>>
>>> Those test explicitly testing for flipping and such may show there
>>> something interesting but imo unrelated to this situation (for now).
>>>
>>> If you have adlp machine where you see the screen you could try those
>>> kms_ccs crc tests, you can run the tests in interactive mode with
>>> '--i' parameter where the test always stop at commit so you'll see
>>> what's there on screen. On kms_ccs I would not expect any vblank/other
>>> issues to interfere crc because kms_ccs uses
>>> igt_pipe_crc_collect_crc() to get the crc, what this mean is once the
>>> test image is on screen crc counting is started, collected one crc and
>>> crc counting is stopped .. to get the first crc this way few vblanks
>>> have passed already since getting image on screen. Then on kms_ccs
>>> test there is -c flag which should check there is something written on
>>> aux surface. Getting these to work reliably I think is the first
>>> thing...and on that subject I'd at first concentrate just one gen12rc
>>> ccs, not ccs-cc and not mc ccs. I'd get these results by myself but
>>> I'm bit of recovering at home and don't have needed hw in my hands.
>>
>> kms_crc with and without -c are both a complete pass.
>
> I tried this too, the result is..interesting. I tried adlp and mtlp.
> Reading crc on adlp seems to do something unexpected for display on
> Xe. If you run kms_ccs on adlp with interactive parameter and step
> forward those images you'll likely see lot of broken images but crc
> still matches as if image was correct.
>
> On adlp it look like image is given to display hw too early, some
> timing somewhere go wrong, I often saw images where primary surface
> was written but aux surface was not yet fully written (you'll get
> compression markers on screen from main surface because unwritten aux
> says nothing is compressed). If I fix kms_ccs to read crcs
> continuously it will never pass crc checks, same as is seen in many
> other tests. It did look like enabling crc creation caused update on
> frontbuffer which fixed the image since at that moment compressors
> were already done. It did seem mc ccs was much more easy to get this
> effect to show, rc ccs was 50-50 on Lenovo laptop I tried while mc ccs
> was far more frequent.
>
> I did also try run kms_plane pixel-format tests and there I see lot of
> broken images. On pixel-format tests there's supposing to be 64x64 fb
> with either horizontal lines or full color (for planar formats) but I
> see lot of broken images which make me think this issue is not about
> ccs but maybe some timing or caching issue. Essential difference with
> kms_plane pixel format tests and kms_ccs is kms_ccs use render engine
> (and media engine for mc) while pixel format tests use copy engine and
> they failed quite equally.
>
> On mtlp there are similar issues but mc ccs cannot run at all, any try
> to use media engine (which is needed to compress mc ccs) just crashed
> the engine. Probably nobody ever tried any of these with Xe.
>
>>
>>> On that lore.kernel.org thread where you mention "Bad CRC also varies
>>> run to run, while visually things look fine to me on screen. Also a
>>> delay before collecting the crc seems to improve things." could be
>>> quite much anything, my first quess with those always is some
>>> alignment missed somewhere.
>>
>> Going back to kms_flip_tiling it is hard to figure out what's up. Good
>> news is that I have spotted some visual glitching but it mostly only
>> happens withing the first few frames after commit of the reference frame.
>>
>> For instance when I add a loop such as:
>>
>> diff --git a/tests/intel/kms_flip_tiling.c b/tests/intel/kms_flip_tiling.c
>> index e937c21716ad..5538fe88c5a0 100644
>> --- a/tests/intel/kms_flip_tiling.c
>> +++ b/tests/intel/kms_flip_tiling.c
>> @@ -131,7 +131,10 @@ test_flip_tiling(data_t *data, enum pipe pipe,
>> igt_output_t *output, uint64_t mo
>> "commit failed with " IGT_MODIFIER_FMT "\n",
>> IGT_MODIFIER_ARGS(modifier[1]));
>> pipe_crc_new(data, pipe);
>> - igt_pipe_crc_get_current(data->drm_fd, data->pipe_crc,
>> &reference_crc);
>> + ret = 20;
>> + while (ret--)
>> + igt_pipe_crc_get_current(data->drm_fd, data->pipe_crc,
>> + &reference_crc);
>
> This matches to what I saw, on adlp/mtlp framebuffers are given to
> display too early or something need to be triggered before display get
> correct image. Once the image is in frontbuffer it is on screen and if
> caches/engines/.. were not ready then that show on screen as broken
> image.
>
>>
>> /* Commit the first fb. */
>> igt_plane_set_fb(primary, &data->fb[0]);
>>
>> All tests pass.
>>
>> Could this be interacting with some other display feature like fbc or psr?
>
> I wouldn't expect this to be fbc/psr issue. While those can be source
> for screen glithes, when I see partially ccs compressed images on
> screen I know all updating was correct for fbc/psr, just that the
> original was broken.
>
>>
>> There is also something interesting with CRC if I put a sleep into that
>> loop. Apparently something gets powered off and CRC garbage such as
>> 0xffffffff starts appearing. But only if there is a sleep for like
>> 250ms. A busy loop can go for 3000 frames and not hit that.
>
> I have hunch this relate to intel_crtc_crc_setup_workarounds but can't
> say for sure. I've been trying to find crc setup related bugs
> recently.
>
> Anyway, with those random bugs it's difficult to say how far adlp is
> from being supported on Xe and not sure how media engine really work
> there since it doesn't work at all on mtlp with Xe.
Thanks for looking into this!
I explored a bit more and eventually figured out it is most probably
about the missing AuxCCS invalidation. I'll have a go transplanting that
code over from i915 and see how that will go.
Regards,
Tvrtko
>> Regards,
>>
>> Tvrtko
>>
>>>>>>>> Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila at gmail.com>
>>>>>>>> ---
>>>>>>>> drivers/gpu/drm/xe/display/xe_fb_pin.c | 141 +++++++++----------------
>>>>>>>> 1 file changed, 48 insertions(+), 93 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>>>>>> b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>>>>>> index c28885316986..70322f28eee5 100644
>>>>>>>> --- a/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>>>>>> +++ b/drivers/gpu/drm/xe/display/xe_fb_pin.c
>>>>>>>> @@ -14,66 +14,44 @@
>>>>>>>> #include "xe_ggtt.h"
>>>>>>>> #include "xe_pm.h"
>>>>>>>> -static void
>>>>>>>> -write_dpt_rotated(struct xe_bo *bo, struct iosys_map *map, u32
>>>>>>>> *dpt_ofs, u32 bo_ofs,
>>>>>>>> - u32 width, u32 height, u32 src_stride, u32 dst_stride)
>>>>>>>> +static void encode_and_write_pte(struct xe_bo *bo, struct iosys_map
>>>>>>>> *map,
>>>>>>>> + u32 *ofs, u32 src_idx, struct xe_device *xe)
>>>>>>>> {
>>>>>>>> - struct xe_device *xe = xe_bo_device(bo);
>>>>>>>> struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
>>>>>>>> - u32 column, row;
>>>>>>>> -
>>>>>>>> - /* TODO: Maybe rewrite so we can traverse the bo addresses
>>>>>>>> sequentially,
>>>>>>>> - * by writing dpt/ggtt in a different order?
>>>>>>>> - */
>>>>>>>> -
>>>>>>>> - for (column = 0; column < width; column++) {
>>>>>>>> - u32 src_idx = src_stride * (height - 1) + column + bo_ofs;
>>>>>>>> -
>>>>>>>> - for (row = 0; row < height; row++) {
>>>>>>>> - u64 pte = ggtt->pt_ops->pte_encode_bo(bo, src_idx *
>>>>>>>> XE_PAGE_SIZE,
>>>>>>>> - xe->pat.idx[XE_CACHE_NONE]);
>>>>>>>> -
>>>>>>>> - iosys_map_wr(map, *dpt_ofs, u64, pte);
>>>>>>>> - *dpt_ofs += 8;
>>>>>>>> - src_idx -= src_stride;
>>>>>>>> - }
>>>>>>>> -
>>>>>>>> - /* The DE ignores the PTEs for the padding tiles */
>>>>>>>> - *dpt_ofs += (dst_stride - height) * 8;
>>>>>>>> - }
>>>>>>>> -
>>>>>>>> - /* Align to next page */
>>>>>>>> - *dpt_ofs = ALIGN(*dpt_ofs, 4096);
>>>>>>>> + u64 pte = ggtt->pt_ops->pte_encode_bo(bo, src_idx * XE_PAGE_SIZE,
>>>>>>>> + xe->pat.idx[XE_CACHE_NONE]);
>>>>>>>> + iosys_map_wr(map, *ofs, u64, pte);
>>>>>>>> + *ofs += 8;
>>>>>>>> }
>>>>>>>> -static void
>>>>>>>> -write_dpt_remapped(struct xe_bo *bo, struct iosys_map *map, u32
>>>>>>>> *dpt_ofs,
>>>>>>>> - u32 bo_ofs, u32 width, u32 height, u32 src_stride,
>>>>>>>> - u32 dst_stride)
>>>>>>>> +static void write_dpt(struct xe_bo *bo, struct iosys_map *map, u32
>>>>>>>> *dpt_ofs,
>>>>>>>> + const struct intel_remapped_plane_info *plane,
>>>>>>>> + enum i915_gtt_view_type type)
>>>>>>>> {
>>>>>>>> struct xe_device *xe = xe_bo_device(bo);
>>>>>>>> - struct xe_ggtt *ggtt = xe_device_get_root_tile(xe)->mem.ggtt;
>>>>>>>> - u64 (*pte_encode_bo)(struct xe_bo *bo, u64 bo_offset, u16 pat_index)
>>>>>>>> - = ggtt->pt_ops->pte_encode_bo;
>>>>>>>> - u32 column, row;
>>>>>>>> -
>>>>>>>> - for (row = 0; row < height; row++) {
>>>>>>>> - u32 src_idx = src_stride * row + bo_ofs;
>>>>>>>> -
>>>>>>>> - for (column = 0; column < width; column++) {
>>>>>>>> - iosys_map_wr(map, *dpt_ofs, u64,
>>>>>>>> - pte_encode_bo(bo, src_idx * XE_PAGE_SIZE,
>>>>>>>> - xe->pat.idx[XE_CACHE_NONE]));
>>>>>>>> -
>>>>>>>> - *dpt_ofs += 8;
>>>>>>>> - src_idx++;
>>>>>>>> + const u32 dpt_even = (plane->dst_stride - plane->width) * 8;
>>>>>>>> + const u32 dest_width = plane->width;
>>>>>>>> + const u32 dest_height = plane->height;
>>>>>>>> + u32 src_idx;
>>>>>>>> +
>>>>>>>> + for (u32 row = 0; row < dest_height; ++row) {
>>>>>>>> + for (u32 column = 0; column < dest_width; ++column) {
>>>>>>>> + switch (type) {
>>>>>>>> + case I915_GTT_VIEW_NORMAL:
>>>>>>>> + src_idx = plane->offset + column;
>>>>>>>> + break;
>>>>>>>> + case I915_GTT_VIEW_REMAPPED:
>>>>>>>> + src_idx = plane->offset +
>>>>>>>> + row * plane->src_stride + column;
>>>>>>>> + break;
>>>>>>>> + default:
>>>>>>>> + WARN(1, "Unsupported GTT view type: %d", type);
>>>>>>>> + return;
>>>>>>>> + }
>>>>>>>> + encode_and_write_pte(bo, map, dpt_ofs, src_idx, xe);
>>>>>>>> }
>>>>>>>> -
>>>>>>>> - /* The DE ignores the PTEs for the padding tiles */
>>>>>>>> - *dpt_ofs += (dst_stride - width) * 8;
>>>>>>>> + *dpt_ofs += dpt_even;
>>>>>>>> }
>>>>>>>> -
>>>>>>>> - /* Align to next page */
>>>>>>>> *dpt_ofs = ALIGN(*dpt_ofs, 4096);
>>>>>>>> }
>>>>>>>> @@ -125,59 +103,36 @@ static int __xe_pin_fb_vma_dpt(const struct
>>>>>>>> intel_framebuffer *fb,
>>>>>>>> {
>>>>>>>> struct xe_device *xe = to_xe_device(fb->base.dev);
>>>>>>>> struct xe_tile *tile0 = xe_device_get_root_tile(xe);
>>>>>>>> - struct xe_ggtt *ggtt = tile0->mem.ggtt;
>>>>>>>> struct drm_gem_object *obj = intel_fb_bo(&fb->base);
>>>>>>>> struct xe_bo *bo = gem_to_xe_bo(obj), *dpt;
>>>>>>>> u32 dpt_size, size = bo->ttm.base.size;
>>>>>>>> + const struct intel_remapped_plane_info *plane;
>>>>>>>> + u32 i, plane_count, dpt_ofs = 0;
>>>>>>>> + struct intel_remapped_plane_info normal_plane;
>>>>>>>> - if (view->type == I915_GTT_VIEW_NORMAL)
>>>>>>>> + if (view->type == I915_GTT_VIEW_NORMAL) {
>>>>>>>> dpt_size = ALIGN(size / XE_PAGE_SIZE * 8, XE_PAGE_SIZE);
>>>>>>>> - else if (view->type == I915_GTT_VIEW_REMAPPED)
>>>>>>>> - dpt_size =
>>>>>>>> ALIGN(intel_remapped_info_size(&fb->remapped_view.gtt.remapped) * 8,
>>>>>>>> - XE_PAGE_SIZE);
>>>>>>>> - else
>>>>>>>> - /* display uses 4K tiles instead of bytes here, convert to
>>>>>>>> entries.. */
>>>>>>>> - dpt_size = ALIGN(intel_rotation_info_size(&view->rotated) * 8,
>>>>>>>> + normal_plane.offset = 0;
>>>>>>>> + normal_plane.width = size / XE_PAGE_SIZE;
>>>>>>>> + normal_plane.height = 1;
>>>>>>>> + normal_plane.src_stride = size / XE_PAGE_SIZE;
>>>>>>>> + normal_plane.dst_stride = size / XE_PAGE_SIZE;
>>>>>>>> + plane = &normal_plane;
>>>>>>>> + plane_count = 1;
>>>>>>>> + } else {
>>>>>>>> + dpt_size = ALIGN(intel_remapped_info_size(&view->remapped) * 8,
>>>>>>>> XE_PAGE_SIZE);
>>>>>>>> + plane = view->remapped.plane;
>>>>>>>> + plane_count = ARRAY_SIZE(view->remapped.plane);
>>>>>>>> + }
>>>>>>>> dpt = xe_alloc_dpt_bo(xe, tile0, dpt_size, physical_alignment);
>>>>>>>> if (IS_ERR(dpt))
>>>>>>>> return PTR_ERR(dpt);
>>>>>>>> - if (view->type == I915_GTT_VIEW_NORMAL) {
>>>>>>>> - u32 x;
>>>>>>>> -
>>>>>>>> - for (x = 0; x < size / XE_PAGE_SIZE; x++) {
>>>>>>>> - u64 pte = ggtt->pt_ops->pte_encode_bo(bo, x * XE_PAGE_SIZE,
>>>>>>>> - xe->pat.idx[XE_CACHE_NONE]);
>>>>>>>> -
>>>>>>>> - iosys_map_wr(&dpt->vmap, x * 8, u64, pte);
>>>>>>>> - }
>>>>>>>> - } else if (view->type == I915_GTT_VIEW_REMAPPED) {
>>>>>>>> - const struct intel_remapped_info *remap_info = &view->remapped;
>>>>>>>> - u32 i, dpt_ofs = 0;
>>>>>>>> -
>>>>>>>> - for (i = 0; i < ARRAY_SIZE(remap_info->plane); i++)
>>>>>>>> - write_dpt_remapped(bo, &dpt->vmap, &dpt_ofs,
>>>>>>>> - remap_info->plane[i].offset,
>>>>>>>> - remap_info->plane[i].width,
>>>>>>>> - remap_info->plane[i].height,
>>>>>>>> - remap_info->plane[i].src_stride,
>>>>>>>> - remap_info->plane[i].dst_stride);
>>>>>>>> -
>>>>>>>> - } else {
>>>>>>>> - const struct intel_rotation_info *rot_info = &view->rotated;
>>>>>>>> - u32 i, dpt_ofs = 0;
>>>>>>>> -
>>>>>>>> - for (i = 0; i < ARRAY_SIZE(rot_info->plane); i++)
>>>>>>>> - write_dpt_rotated(bo, &dpt->vmap, &dpt_ofs,
>>>>>>>> - rot_info->plane[i].offset,
>>>>>>>> - rot_info->plane[i].width,
>>>>>>>> - rot_info->plane[i].height,
>>>>>>>> - rot_info->plane[i].src_stride,
>>>>>>>> - rot_info->plane[i].dst_stride);
>>>>>>>> - }
>>>>>>>> + for (i = 0; i < plane_count; i++)
>>>>>>>> + write_dpt(bo, &dpt->vmap, &dpt_ofs, &plane[i], view->type);
>>>>>>>> vma->dpt = dpt;
>>>>>>>> vma->node = dpt->ggtt_node[tile0->id];
More information about the Intel-xe
mailing list