[Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)
Kristian Høgsberg
krh at bitplanet.net
Wed Nov 18 14:53:07 PST 2015
On Wed, Oct 14, 2015 at 5:11 AM, Michel Thierry
<michel.thierry at intel.com> wrote:
> On 10/14/2015 8:19 AM, Daniel Vetter wrote:
>>
>> On Tue, Oct 13, 2015 at 02:51:36PM -0700, Kristian Høgsberg wrote:
>>>
>>> On Tue, Oct 13, 2015 at 7:55 AM, Michel Thierry
>>> <michel.thierry at intel.com> wrote:
>>>>
>>>> On 10/13/2015 3:13 PM, Emil Velikov wrote:
>>>>>
>>>>>
>>>>> On 13 October 2015 at 13:16, Michel Thierry <michel.thierry at intel.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 10/6/2015 2:12 PM, Michel Thierry wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 10/5/2015 7:06 PM, Kristian Høgsberg wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry
>>>>>>>> <michel.thierry at intel.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 9/14/2015 2:54 PM, Michał Winiarski wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Gen8+ supports 48-bit virtual addresses, but some objects must
>>>>>>>>>>> always be
>>>>>>>>>>> allocated inside the 32-bit address range.
>>>>>>>>>>>
>>>>>>>>>>> In specific, any resource used with flat/heapless
>>>>>>>>>>> (0x00000000-0xfffff000)
>>>>>>>>>>> General State Heap (GSH) or Instruction State Heap (ISH) must be
>>>>>>>>>>> in
>>>>>>>>>>> a
>>>>>>>>>>> 32-bit range, because the General State Offset and Instruction
>>>>>>>>>>> State
>>>>>>>>>>> Offset
>>>>>>>>>>> are limited to 32-bits.
>>>>>>>>>>>
>>>>>>>>>>> The i915 driver has been modified to provide a flag to set when
>>>>>>>>>>> the
>>>>>>>>>>> 4GB
>>>>>>>>>>> limit is not necessary in a given bo
>>>>>>>>>>> (EXEC_OBJECT_SUPPORTS_48B_ADDRESS).
>>>>>>>>>>> 48-bit range will only be used when explicitly requested.
>>>>>>>>>>>
>>>>>>>>>>> Callers to the existing drm_intel_bo_emit_reloc function should
>>>>>>>>>>> set
>>>>>>>>>>> the
>>>>>>>>>>> use_48b_address_range flag beforehand, in order to use full ppgtt
>>>>>>>>>>> range.
>>>>>>>>>>>
>>>>>>>>>>> v2: Make set/clear functions nops on pre-gen8 platforms, and use
>>>>>>>>>>> them
>>>>>>>>>>> internally in emit_reloc functions (Ben)
>>>>>>>>>>> s/48BADDRESS/48B_ADDRESS/ (Dave)
>>>>>>>>>>> v3: Keep set/clear functions internal, no-one needs to use them
>>>>>>>>>>> directly.
>>>>>>>>>>> v4: Don't set 48bit-support flag in emit reloc, check for ppgtt
>>>>>>>>>>> type
>>>>>>>>>>> before enabling set/clear function, print full offsets in
>>>>>>>>>>> debug
>>>>>>>>>>> statements, using port of lower_32_bits and upper_32_bits
>>>>>>>>>>> from linux
>>>>>>>>>>> kernel (Michał)
>>>>>>>>>>>
>>>>>>>>>>> References:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html
>>>>>>>>>>> Cc: Ben Widawsky <ben at bwidawsk.net>
>>>>>>>>>>> Cc: Michał Winiarski <michal.winiarski at intel.com>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> +Kristian
>>>>>>>>>>
>>>>>>>>>> LGTM.
>>>>>>>>>> Acked-by: Michał Winiarski <michal.winiarski at intel.com>
>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Michel Thierry <michel.thierry at intel.com>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Kristian,
>>>>>>>>>
>>>>>>>>> More comments on this?
>>>>>>>>> I've resent the mesa patch with the last changes
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Michał has agreed with the interface and will use it for his work.
>>>>>>>>> If mesa doesn't plan to use this for now, it's ok. The kernel
>>>>>>>>> changes
>>>>>>>>> have
>>>>>>>>> been done to prevent any regressions when the support 48-bit flag
>>>>>>>>> is
>>>>>>>>> not
>>>>>>>>> set.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I didn't get any replies to my last comments on this:
>>>>>>>>
>>>>>>>>
>>>>>>>> http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html
>>>>>>>>
>>>>>>>> Basically, I tried explicitly placing buffers above 8G and didn't
>>>>>>>> see
>>>>>>>> the HW problem described (ie it all worked fine). I still think
>>>>>>>> there's some confusion as to what the W/A is about.
>>>>>>>>
>>>>>>>> Here's my take: the W/A is a SW-only workaround to handle the cases
>>>>>>>> where a driver uses heapless and 48-bit ppgtt. The problem is that
>>>>>>>> the
>>>>>>>> heaps are limited to 4G but with 48bit ppgtt a buffer can be placed
>>>>>>>> anywhere it the 48 bit address space. If that happens it's
>>>>>>>> consideredd
>>>>>>>> out-of-bounds for the heap and access fails. To prevent this we need
>>>>>>>> to make sure the bos we address in a heapless fashion are placed
>>>>>>>> below
>>>>>>>> 4g.
>>>>>>>>
>>>>>>>> For mesa, we only configure general state base as heap-less, which
>>>>>>>> affects scratch bos. What this boils down to is that we need the 4G
>>>>>>>> restriction only for the scratch bos set up on 3DSTATE_VS,
>>>>>>>> 3DSTATE_GS
>>>>>>>> and 3DSTATE_PS (and tesselation stage eventually). Look for the
>>>>>>>> OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c,
>>>>>>>> gen8_gs_state.c
>>>>>>>> and gen8_ps_state.c
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe it
>>>>>>> isn't exclusive to the scratch bos (the error state points to that
>>>>>>> instruction).
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Anymore inputs about this patch?
>>>>>> AFAIK, if changes are required based on further comments from
>>>>>> Kristian,
>>>>>> these will be in the mesa patch[1], not libdrm. Also, Michał will use
>>>>>> this
>>>>>> flag in another project.
>>>>>>
>>>>> The comment seems quite brief and I'm not sure it fully addresses
>>>>> Kristian's concern. I'd suspect that providing reference to the HW
>>>>> documentation (confirming your assumption) might be beneficial.
>>>>>
>>>>
>>>> Sure, attached is the hang I get if I don't set the restriction in
>>>> gen8_misc_state.c and try to use the full 48-bit range
>>>> (i915_error_state_nowa.txt). This is just running gfxbench t-rex.
>>>>
>>>> I see the same hang signature when it is only applied to the scratch bos
>>>> (in
>>>> gen8_vs_state.c, gen8_gs_state.c and gen8_ps_state.c -
>>>> i915_error_state_scratchbo.txt).
>>>>
>>>> 3DSTATE_VIEWPORT_STATE_POINTERS_CC (0x78230000) is defined here:
>>>>
>>>> https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol02a-commandreference-instructions_0.pdf
>>>> (page 256)
>>>
>>>
>>> I applied your mesa and libdrm patches and then backed out the 4G
>>> restriction in the STATE_BASE_ADDRESS relocations. I was not able to
>>> reproduce any hang with trex or glxgears. I confirmed (using
>>> INTEL_DEBUG=bat) that gem actually placed BOs above the 4G mark and
>>> got this:
>>>
>>> 0: 8 @ 0xfffffff90000 (miptree)
>>> 1: 9 @ 0xfffffff78000 (hiz)
>>> 2: 4 @ 0xfffffff77000 (pipe_control workaround)
>>> 3: 5 @ 0xfffffff76000 (program cache)
>>> 4: 12 @ 0x000000000000 (miptree)
>>> 5: 7 @ 0x000000001000 (image)
>>> 6: 10 @ 0xfffffff5a000 (miptree)
>>> 7: 13 @ 0xfffffff59000 (bufferobj)
>>> 8: 11 @ 0xfffffff51000 (bufferobj)
>>> 9: 14 @ 0xfffffff50000 (bufferobj)
>>> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000020 -> 8
>>> (miptree)@0x0000ffff fff90000 + 0x00000000
>>> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000040 -> 9
>>> (hiz)@0x0000ffff fff78000 + 0x00000000
>>> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000098 -> 4
>>> (pipe_control workaround)@0x0000ffff fff77000 + 0x00000000
>>> ...
>>>
>>> In particular, the mesa batchbuffer is placed at 0xfffffff40000, and
>>> all the STATE_BASE_ADDRESS heaps are in that bo. In other words, all
>>> heap bases are set to 0xfffffff40000.
>>>
>>> The offset in 3DSTATE_VIEWPORT_STATE_POINTERS_CC is indeed only 32
>>> bits, but that only affects how far from the dynamic state base
>>> address we can place that. In mesas case, it's always inside the batch
>>> buffer, which is at most 32k, so that's never a problem. If a driver
>>> uses dynamic state in a heapless fashion, then you need to be careful
>>> to place the CC viewport state below 4g.
>>>
>>> What I was able to confirm is that scratch buffer I/O (which we use
>>> for spilling) does break with 48 bit ppgtt. If you run glxgears with
>>> INTEL_DEBUG=spill_fs set, you can see lots of rendering artifacts as
>>> the 4G limit on the general state heap caps attempts to spill and fill
>>> from the vertex shader. Using OUT_RELOC64_INSIDE_4G() when setting up
>>> the scratch BOs in 3DSTATE_VS and 3DSTATE_PS fixes the problem.
>>
>>
>> Ok, so we do still need the 4G limit bit (i.e. I don't have to revert the
>> kernel side for lack of open-source userspace), just not at the place we
>> originally thought. I guess lesson learned is to actually test stuff first
>> before I pull in the kernel side. Michel, did you not see the spilling
>> corruptions when testing the mesa patches? Kristian, does this also blow
>> up with just a piglit run?
>>
>
> There's must be something else we have different in our systems.
>
> Kristian doesn't get the hang, and if I back out the 4G restriction in the
> STATE_BASE_ADDRESS, I don't see any corruption while running
> INTEL_DEBUG=spill_fs glxgears.
I don't want to block this, in fact, I very much want to see this move
forward. However, I can't ack patches that 1) fix a gpu hang I can't
reproduce and I don't agree should be a problem (the
STATE_BASE_ADDRESS probllem) and 2) don't fix the issue that I see and
reproduced on my BDW (the spilling problem).
I'll try a full piglit run tomorrow with both my mesa patch and
Michels and see what happens. However, this isn't the kind of problem
that occurs once in a blue moon or for some corner case piglit test.
If Michels understanding of the hw and the workaround is correct,
nothing should work without his patch.
To move forward on this, unless something new comes up when running
piglit, I can take ownership of making mesa work one way or the other.
The kernel and libdrm work is fine and most of Michels mesa patch is
good. If mesa breaks down the road because we need to limit the
STATE_BASE_ADDRESS relocs, I'll own the problem then.
Kristian
More information about the mesa-dev
mailing list