[Intel-gfx] [PATCH] drm/i915: Do NOT skip the first 4k of stolen memory for pre-allocated buffers

Hans de Goede hdegoede at redhat.com
Fri Mar 30 14:26:53 UTC 2018


Hi,

On 30-03-18 15:25, Hans de Goede wrote:
> Hi,
> 
> On 30-03-18 14:44, Chris Wilson wrote:
>> Quoting Hans de Goede (2018-03-30 13:37:40)
>>> Hi,
>>>
>>> On 30-03-18 14:30, Chris Wilson wrote:
>>>> Quoting Hans de Goede (2018-03-30 13:27:15)
>>>>> Before this commit the WaSkipStolenMemoryFirstPage workaround code was
>>>>> skipping the first 4k by passing 4096 as start of the address range passed
>>>>> to drm_mm_init(). This means that calling drm_mm_reserve_node() to try and
>>>>> reserve the firmware framebuffer so that we can inherit it would always
>>>>> fail, as the firmware framebuffer starts at address 0.
>>>>>
>>>>> Commit d43537610470 ("drm/i915: skip the first 4k of stolen memory on
>>>>> everything >= gen8") says in its commit message: "This is confirmed to fix
>>>>> Skylake screen flickering issues (probably caused by the fact that we
>>>>> initialized a ring in the first page of stolen, but I didn't 100% confirm
>>>>> this theory)."
>>>>>
>>>>> Which suggests that it is safe to use the first page for a linear
>>>>> framebuffer as the firmware is doing.
>>>>>
>>>>> This commit always passes 0 as start to drm_mm_init() and works around
>>>>> WaSkipStolenMemoryFirstPage in i915_gem_stolen_insert_node_in_range()
>>>>> by insuring the start address passed by to drm_mm_insert_node_in_range()
>>>>> is always 4k or more. All entry points to i915_gem_stolen.c go through
>>>>> i915_gem_stolen_insert_node_in_range(), so that any newly allocated
>>>>> objects such as ring-buffers will not be allocated in the first 4k.
>>>>>
>>>>> The one exception is i915_gem_object_create_stolen_for_preallocated()
>>>>> which directly calls drm_mm_reserve_node() which now will be able to
>>>>> use the first 4k.
>>>>>
>>>>> This fixes the i915 driver no longer being able to inherit the firmware
>>>>> framebuffer on gen8+, which fixes the video output changing from the
>>>>> vendor logo to a black screen as soon as the i915 driver is loaded
>>>>> (on systems without fbcon).
>>>>
>>>> We've been told by the HW guys not to use the first page. (That's my
>>>> understanding from every time this gets questioned.)
>>>
>>> Yet the GOP is happily using the first page. I think we may need to make
>>> a difference here between the GPU not using the first page and the
>>> display engine/pipeline not using the first page. Note that my patch
>>> only influences the inheriting of the initial framebuffer as allocated
>>> by the GOP. It does not influence any other allocations from the
>>> reserved range, those will still all avoid the first page.
>>>
>>> Without this patch fastboot / flickerfree support is essentially broken
>>> on any gen8+ hardware given that one of the goals of atomic is to be
>>> able to do flickerfree transitions I think that this warrants a closer
>>> look then just simply saying never use the first page.
>>
>> The concern is what else (i.e. nothing that we allocated ourselves) that
>> may be in the first page...
> 
> Given that the GOP has put its framebuffer there at least at boot there
> is nothing there, otherwise it would show up on the display.
> 
> We have a whole bunch of code to inherit the BIOS fb in intel_display.c
> and AFAIK that code is there because this inheriting the BIOS fb is
> deemed an important feature. So I'm not happy at all with the handwavy
> best to not touch it answer I'm getting to this patch.
> 
> Unless there are some clear answer from the hardware folks which specifically
> say we cannot put a framebuffer there (and then why is the GOP doing it?)
> then I believe that the best way forward here is to get various people to
> test with this patch and the best way to do that is probably to put it
> in next. Note I deliberately did not add a Cc stable.

To elaborate on this, the excluding of the first 4k of the stolen memory
region causes intel_alloc_initial_plane_obj() from intel_display.c to fail,
which in turn causes intel_find_initial_plane_obj() to call
intel_plane_disable_noatomic(intel_crtc, intel_plane); which temporarily
completely turns off the display which leads to a very ugly flickering
of the display at boot (as well as replacing the vendor logo with a
black screen).

I think we can all agree that this behavior is undesirable and even a
regression in behavior caused by the fix to exclude the first 4k.

Chris, if my patch is not an acceptable way to fix this, then how do you
suggest that we fix this?

Digging a bit deeper I found this:

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol16-workarounds.pdf

Which says:

"WaSkipStolenMemoryFirstPage:

WA to skip the first page of stolen
memory due to sporadic HW write on *CS Idle"

And also about FBC:

"First line of FBC getting corrupted when FBC
compressed frame buffer offset is programmed to
zero. Command streamers are doing flush writes to
base of stolen.
WA: New restriction to program FBC compressed
frame buffer offset to at least 4KB."

So using the first 4kB for the *framebuffer* as done by the GOP will
not cause any major problems (freezes, hangs, etc.), and commit
d43537610470 ("drm/i915: skip the first 4k of stolen memory on
everything >= gen8") was correct in deducing that the problem was
likely that some *vital* information was being stored i the first 4k
and that go overwritten.

But the contents of the (first lines of) the framebuffer may become
corrupted once we actually start using the command-streamers, which
is still very much not wanted.

In practice Xorg or Wayland will likely have setup another framebuffer
by the time the command-streamers will start to get used.

Alternatively we could start with inheriting the BIOS framebuffer
(as my patch allows) so that we don't get the flicker and then soon
afterwards atomically transit to a new framebuffer (which should
contain a copy of the BIOS fb contents) at a different location.

I think it might be worthwhile to first just try my patch and then if
we see or receive reports of problems with the fb becoming corrupt
because in some cases it ends up being used for longer then expected,
we can do the atomic transition to a new fb thing.

Regards,

Hans


p.s.

Fi.CI.BAT and Fi.CI.IGT have run successfully for this patch, I realize
that this does not mean that much, but it is an indication that it is
not completely broken.


More information about the dri-devel mailing list