[Mesa-dev] [PATCH v4 1/2] intel: 48b ppgtt support (EXEC_OBJECT_SUPPORTS_48B_ADDRESS flag)
Kristian Høgsberg
krh at bitplanet.net
Thu Dec 10 11:40:40 PST 2015
On Fri, Dec 4, 2015 at 6:24 AM, Michel Thierry <michel.thierry at intel.com> wrote:
> On 11/18/2015 10:53 PM, Kristian Høgsberg wrote:
>>
>> On Wed, Oct 14, 2015 at 5:11 AM, Michel Thierry
>> <michel.thierry at intel.com> wrote:
>>>
>>> On 10/14/2015 8:19 AM, Daniel Vetter wrote:
>>>>
>>>>
>>>> On Tue, Oct 13, 2015 at 02:51:36PM -0700, Kristian Høgsberg wrote:
>>>>>
>>>>>
>>>>> On Tue, Oct 13, 2015 at 7:55 AM, Michel Thierry
>>>>> <michel.thierry at intel.com> wrote:
>>>>>>
>>>>>>
>>>>>> On 10/13/2015 3:13 PM, Emil Velikov wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 13 October 2015 at 13:16, Michel Thierry
>>>>>>> <michel.thierry at intel.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/6/2015 2:12 PM, Michel Thierry wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/5/2015 7:06 PM, Kristian Høgsberg wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 5, 2015 at 7:03 AM, Michel Thierry
>>>>>>>>>> <michel.thierry at intel.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 9/14/2015 2:54 PM, Michał Winiarski wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Sep 03, 2015 at 03:23:58PM +0100, Michel Thierry wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Gen8+ supports 48-bit virtual addresses, but some objects must
>>>>>>>>>>>>> always be
>>>>>>>>>>>>> allocated inside the 32-bit address range.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In specific, any resource used with flat/heapless
>>>>>>>>>>>>> (0x00000000-0xfffff000)
>>>>>>>>>>>>> General State Heap (GSH) or Instruction State Heap (ISH) must
>>>>>>>>>>>>> be
>>>>>>>>>>>>> in
>>>>>>>>>>>>> a
>>>>>>>>>>>>> 32-bit range, because the General State Offset and Instruction
>>>>>>>>>>>>> State
>>>>>>>>>>>>> Offset
>>>>>>>>>>>>> are limited to 32-bits.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The i915 driver has been modified to provide a flag to set when
>>>>>>>>>>>>> the
>>>>>>>>>>>>> 4GB
>>>>>>>>>>>>> limit is not necessary in a given bo
>>>>>>>>>>>>> (EXEC_OBJECT_SUPPORTS_48B_ADDRESS).
>>>>>>>>>>>>> 48-bit range will only be used when explicitly requested.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Callers to the existing drm_intel_bo_emit_reloc function should
>>>>>>>>>>>>> set
>>>>>>>>>>>>> the
>>>>>>>>>>>>> use_48b_address_range flag beforehand, in order to use full
>>>>>>>>>>>>> ppgtt
>>>>>>>>>>>>> range.
>>>>>>>>>>>>>
>>>>>>>>>>>>> v2: Make set/clear functions nops on pre-gen8 platforms, and
>>>>>>>>>>>>> use
>>>>>>>>>>>>> them
>>>>>>>>>>>>> internally in emit_reloc functions (Ben)
>>>>>>>>>>>>> s/48BADDRESS/48B_ADDRESS/ (Dave)
>>>>>>>>>>>>> v3: Keep set/clear functions internal, no-one needs to use them
>>>>>>>>>>>>> directly.
>>>>>>>>>>>>> v4: Don't set 48bit-support flag in emit reloc, check for ppgtt
>>>>>>>>>>>>> type
>>>>>>>>>>>>> before enabling set/clear function, print full offsets
>>>>>>>>>>>>> in
>>>>>>>>>>>>> debug
>>>>>>>>>>>>> statements, using port of lower_32_bits and
>>>>>>>>>>>>> upper_32_bits
>>>>>>>>>>>>> from linux
>>>>>>>>>>>>> kernel (Michał)
>>>>>>>>>>>>>
>>>>>>>>>>>>> References:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://lists.freedesktop.org/archives/intel-gfx/2015-July/072612.html
>>>>>>>>>>>>> Cc: Ben Widawsky <ben at bwidawsk.net>
>>>>>>>>>>>>> Cc: Michał Winiarski <michal.winiarski at intel.com>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> +Kristian
>>>>>>>>>>>>
>>>>>>>>>>>> LGTM.
>>>>>>>>>>>> Acked-by: Michał Winiarski <michal.winiarski at intel.com>
>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Michel Thierry <michel.thierry at intel.com>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Kristian,
>>>>>>>>>>>
>>>>>>>>>>> More comments on this?
>>>>>>>>>>> I've resent the mesa patch with the last changes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> (http://lists.freedesktop.org/archives/dri-devel/2015-October/091752.html).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Michał has agreed with the interface and will use it for his
>>>>>>>>>>> work.
>>>>>>>>>>> If mesa doesn't plan to use this for now, it's ok. The kernel
>>>>>>>>>>> changes
>>>>>>>>>>> have
>>>>>>>>>>> been done to prevent any regressions when the support 48-bit flag
>>>>>>>>>>> is
>>>>>>>>>>> not
>>>>>>>>>>> set.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I didn't get any replies to my last comments on this:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://lists.freedesktop.org/archives/mesa-dev/2015-August/091398.html
>>>>>>>>>>
>>>>>>>>>> Basically, I tried explicitly placing buffers above 8G and didn't
>>>>>>>>>> see
>>>>>>>>>> the HW problem described (ie it all worked fine). I still think
>>>>>>>>>> there's some confusion as to what the W/A is about.
>>>>>>>>>>
>>>>>>>>>> Here's my take: the W/A is a SW-only workaround to handle the
>>>>>>>>>> cases
>>>>>>>>>> where a driver uses heapless and 48-bit ppgtt. The problem is that
>>>>>>>>>> the
>>>>>>>>>> heaps are limited to 4G but with 48bit ppgtt a buffer can be
>>>>>>>>>> placed
>>>>>>>>>> anywhere it the 48 bit address space. If that happens it's
>>>>>>>>>> consideredd
>>>>>>>>>> out-of-bounds for the heap and access fails. To prevent this we
>>>>>>>>>> need
>>>>>>>>>> to make sure the bos we address in a heapless fashion are placed
>>>>>>>>>> below
>>>>>>>>>> 4g.
>>>>>>>>>>
>>>>>>>>>> For mesa, we only configure general state base as heap-less, which
>>>>>>>>>> affects scratch bos. What this boils down to is that we need the
>>>>>>>>>> 4G
>>>>>>>>>> restriction only for the scratch bos set up on 3DSTATE_VS,
>>>>>>>>>> 3DSTATE_GS
>>>>>>>>>> and 3DSTATE_PS (and tesselation stage eventually). Look for the
>>>>>>>>>> OUT_RELOC64 for stage->scratch_bo in gen8_vs_state.c,
>>>>>>>>>> gen8_gs_state.c
>>>>>>>>>> and gen8_ps_state.c
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think it also affects _3DSTATE_VIEWPORT_STATE_POINTERS_CC, maybe
>>>>>>>>> it
>>>>>>>>> isn't exclusive to the scratch bos (the error state points to that
>>>>>>>>> instruction).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Anymore inputs about this patch?
>>>>>>>> AFAIK, if changes are required based on further comments from
>>>>>>>> Kristian,
>>>>>>>> these will be in the mesa patch[1], not libdrm. Also, Michał will
>>>>>>>> use
>>>>>>>> this
>>>>>>>> flag in another project.
>>>>>>>>
>>>>>>> The comment seems quite brief and I'm not sure it fully addresses
>>>>>>> Kristian's concern. I'd suspect that providing reference to the HW
>>>>>>> documentation (confirming your assumption) might be beneficial.
>>>>>>>
>>>>>>
>>>>>> Sure, attached is the hang I get if I don't set the restriction in
>>>>>> gen8_misc_state.c and try to use the full 48-bit range
>>>>>> (i915_error_state_nowa.txt). This is just running gfxbench t-rex.
>>>>>>
>>>>>> I see the same hang signature when it is only applied to the scratch
>>>>>> bos
>>>>>> (in
>>>>>> gen8_vs_state.c, gen8_gs_state.c and gen8_ps_state.c -
>>>>>> i915_error_state_scratchbo.txt).
>>>>>>
>>>>>> 3DSTATE_VIEWPORT_STATE_POINTERS_CC (0x78230000) is defined here:
>>>>>>
>>>>>>
>>>>>> https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol02a-commandreference-instructions_0.pdf
>>>>>> (page 256)
>>>>>
>>>>>
>>>>>
>>>>> I applied your mesa and libdrm patches and then backed out the 4G
>>>>> restriction in the STATE_BASE_ADDRESS relocations. I was not able to
>>>>> reproduce any hang with trex or glxgears. I confirmed (using
>>>>> INTEL_DEBUG=bat) that gem actually placed BOs above the 4G mark and
>>>>> got this:
>>>>>
>>>>> 0: 8 @ 0xfffffff90000 (miptree)
>>>>> 1: 9 @ 0xfffffff78000 (hiz)
>>>>> 2: 4 @ 0xfffffff77000 (pipe_control workaround)
>>>>> 3: 5 @ 0xfffffff76000 (program cache)
>>>>> 4: 12 @ 0x000000000000 (miptree)
>>>>> 5: 7 @ 0x000000001000 (image)
>>>>> 6: 10 @ 0xfffffff5a000 (miptree)
>>>>> 7: 13 @ 0xfffffff59000 (bufferobj)
>>>>> 8: 11 @ 0xfffffff51000 (bufferobj)
>>>>> 9: 14 @ 0xfffffff50000 (bufferobj)
>>>>> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000020 -> 8
>>>>> (miptree)@0x0000ffff fff90000 + 0x00000000
>>>>> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000040 -> 9
>>>>> (hiz)@0x0000ffff fff78000 + 0x00000000
>>>>> 10: 15 @ 0xfffffff40000 (batchbuffer)@0x00000000 00000098 -> 4
>>>>> (pipe_control workaround)@0x0000ffff fff77000 + 0x00000000
>>>>> ...
>>>>>
>>>>> In particular, the mesa batchbuffer is placed at 0xfffffff40000, and
>>>>> all the STATE_BASE_ADDRESS heaps are in that bo. In other words, all
>>>>> heap bases are set to 0xfffffff40000.
>>>>>
>>>>> The offset in 3DSTATE_VIEWPORT_STATE_POINTERS_CC is indeed only 32
>>>>> bits, but that only affects how far from the dynamic state base
>>>>> address we can place that. In mesas case, it's always inside the batch
>>>>> buffer, which is at most 32k, so that's never a problem. If a driver
>>>>> uses dynamic state in a heapless fashion, then you need to be careful
>>>>> to place the CC viewport state below 4g.
>>>>>
>>>>> What I was able to confirm is that scratch buffer I/O (which we use
>>>>> for spilling) does break with 48 bit ppgtt. If you run glxgears with
>>>>> INTEL_DEBUG=spill_fs set, you can see lots of rendering artifacts as
>>>>> the 4G limit on the general state heap caps attempts to spill and fill
>>>>> from the vertex shader. Using OUT_RELOC64_INSIDE_4G() when setting up
>>>>> the scratch BOs in 3DSTATE_VS and 3DSTATE_PS fixes the problem.
>>>>
>>>>
>>>>
>>>> Ok, so we do still need the 4G limit bit (i.e. I don't have to revert
>>>> the
>>>> kernel side for lack of open-source userspace), just not at the place we
>>>> originally thought. I guess lesson learned is to actually test stuff
>>>> first
>>>> before I pull in the kernel side. Michel, did you not see the spilling
>>>> corruptions when testing the mesa patches? Kristian, does this also blow
>>>> up with just a piglit run?
>>>>
>>>
>>> There's must be something else we have different in our systems.
>>>
>>> Kristian doesn't get the hang, and if I back out the 4G restriction in
>>> the
>>> STATE_BASE_ADDRESS, I don't see any corruption while running
>>> INTEL_DEBUG=spill_fs glxgears.
>>
>>
>> I don't want to block this, in fact, I very much want to see this move
>> forward. However, I can't ack patches that 1) fix a gpu hang I can't
>> reproduce and I don't agree should be a problem (the
>> STATE_BASE_ADDRESS probllem) and 2) don't fix the issue that I see and
>> reproduced on my BDW (the spilling problem).
>>
>> I'll try a full piglit run tomorrow with both my mesa patch and
>> Michels and see what happens. However, this isn't the kind of problem
>> that occurs once in a blue moon or for some corner case piglit test.
>> If Michels understanding of the hw and the workaround is correct,
>> nothing should work without his patch.
>>
>> To move forward on this, unless something new comes up when running
>> piglit, I can take ownership of making mesa work one way or the other.
>> The kernel and libdrm work is fine and most of Michels mesa patch is
>> good. If mesa breaks down the road because we need to limit the
>> STATE_BASE_ADDRESS relocs, I'll own the problem then.
>>
>> Kristian
>>
>
> Thanks Kristian,
>
> Let me know if you need something from i915 or libdrm side.
I've reviewed and pushed these two patches. I'll send out the mesa
patch for review in a bit and once that lands we can do a libdrm
release.
Kristian
More information about the dri-devel
mailing list