[Intel-gfx] [PATCH] drm/i915/hsw: enable atomic in L3 for some steppings.
Francisco Jerez
currojerez at riseup.net
Tue Jan 6 06:19:08 PST 2015
Zhigang Gong <zhigang.gong at linux.intel.com> writes:
> On Mon, Jan 05, 2015 at 07:27:26PM +0200, Francisco Jerez wrote:
>> Zhigang Gong <zhigang.gong at linux.intel.com> writes:
>>
>> > On Mon, Jan 05, 2015 at 05:03:16AM +0200, Francisco Jerez wrote:
>> >> Zhigang Gong <zhigang.gong at intel.com> writes:
>> >>
>> >> > According to bspec, ROW_CHICKEN3's bit 6 which is to disable
>> >> > move of cacheable global atomics to L3 is needed for GT3 D
>> >> > stepping.
>> >> >
>> >> > I enabled it and tested it with HSW GT2 D stepping and GT3 E stepping.
>> >> > The atomics works fine in beignet. And it could get more than 10x performance
>> >> > improvement with some workload, for an example, the "splat" kernel in darktable,
>> >> > without this patch, it consumes 50 seconds in one large raw picture processing.
>> >> > But with this patch, the same process only takes less than 1 second.
>> >> >
>> >>
>> >> I tried this already (on HSW GT2 D as well) and I don't think it's
>> >> enough to get L3 atomics working reliably. Even though they did seem to
>> >> work OK at first glance I observed some corruption issues (e.g. atomic
>> >> writes not landing in system memory) when doing atomic writes to
>> >> contiguous (as in within the same cache-line) locations in memory. The
>> >> "unused" ARB_shader_image_load_store test [1] I sent to the Piglit
>> >> mailing list some time ago exposes this IIRC, and probably a couple of
>> >> other tests too.
>> > Ok, I will find that case and have a try on my systems. I just tested all
>> > the atomic related OpenCL conformance test cases without any issues.
>
> I just found the patchset hasn't been accepted by piglit. I took a look at
> the source code. And realize that that shader will not work correctly if
> mesa doesn't allocate some DC in L3 space. Don't know whether you already
> allocated some DC when you did that test.
>
Of course, and it still fails no matter how you allocate the L3.
>> >
>> >>
>> >> Also this change is going to cause an instant lock-up anytime Mesa uses
>> >> atomics because Mesa doesn't change the default L3 way allocation for
>> >> the DC, which turns out to be 0 on HSW.
>> >
>> > This is another issue, IMHO, if the application wants to use atomics,
>> > it's better to allocate some L3 space for DC. Otherwise, it could
>> > never leverage the "atomics in L3 feature". Based on my test,
>> > the performance impact as huge as more than 10x for some workloads.
>> >
>> Sure, but this change alone will cause a regression (an irrecoverable
>> system hang) with current releases of Mesa.
>>
>> I agree that Mesa should be fixed eventually to assign L3 space to the
>> DC for some workloads, but it doesn't seem like we have a satisfactory
>> API to do that right now -- The current mechanism used by Beignet
>> (poking the L3 control registers from the batch buffer) is slightly
>> concerning from a security point of view because it allows an arbitrary
>> userspace application to cause misrendering or impact the performance of
> I agree with you that the L3 configuration is very dangerous. It's better
> to do more checking in the kernel space and make sure the user space
> application will not crash the system via L3 configuration.
> And, if the kernel provide new API we will use it in beignet ASAP in beignet.
>
>> other clients (since the L3 config registers are not being saved and
>> restored as part of the context) and even crash the whole system. It
> I found the L3 config registers are part of the IVB context, but not part
> of HSW's. So if two user applications want to use different L3 config,
> the user application need to do the config registers' store/restore manually.
> Is there a way to do this type of thing gracefully in KMD? The new API
> may need to consider the safely context switching regards to L3 config change.
>
Yeah, relying on userspace saving and restoring the L3 config values
sounds rather fragile to me, and may not work if e.g. a context hangs
and gets kicked out. IMHO it would be preferrable to keep track of the
L3 config and save/restore it from the kernel on context switch.
>> also doesn't look like it could be made to work with the hardware
>> command checker.
>
> As to the hardware command checker:
> I just found with current nightly kernel, if we boot the kernel with
> "i915.enable_ppgtt=2", then we can configure the L3 related registers
> in user space. But the command checker still don't allow user space to
> chagne L3 config by default.
>
Ugh, apparently that's because the hardware command checker is not
enabled at all in that case...
> Thanks,
> Zhigang Gong.
>
>>
>> > Thanks,
>> > Zhigang Gong.
>> >
>> >>
>> >> [1] http://lists.freedesktop.org/archives/piglit/2014-December/013571.html
>> >>
>> >> > Signed-off-by: Zhigang Gong <zhigang.gong at intel.com>
>> >> > ---
>> >> > drivers/gpu/drm/i915/intel_pm.c | 10 ++++++----
>> >> > 1 file changed, 6 insertions(+), 4 deletions(-)
>> >> >
>> >> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
>> >> > index 7d99a9c..8a27802 100644
>> >> > --- a/drivers/gpu/drm/i915/intel_pm.c
>> >> > +++ b/drivers/gpu/drm/i915/intel_pm.c
>> >> > @@ -5938,10 +5938,12 @@ static void haswell_init_clock_gating(struct drm_device *dev)
>> >> >
>> >> > ilk_init_lp_watermarks(dev);
>> >> >
>> >> > - /* L3 caching of data atomics doesn't work -- disable it. */
>> >> > - I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
>> >> > - I915_WRITE(HSW_ROW_CHICKEN3,
>> >> > - _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
>> >> > + if (IS_HSW_GT3(dev) && dev->pdev->revision <= 6) {
>> >> > + /* L3 caching of data atomics doesn't work -- disable it. */
>> >> > + I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);
>> >> > + I915_WRITE(HSW_ROW_CHICKEN3,
>> >> > + _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE));
>> >> > + }
>> >> >
>> >> > /* This is required by WaCatErrorRejectionIssue:hsw */
>> >> > I915_WRITE(GEN7_SQ_CHICKEN_MBCUNIT_CONFIG,
>> >> > --
>> >> > 1.8.3.2
>> >
>> >
>> >
>> >
>> >> _______________________________________________
>> >> Intel-gfx mailing list
>> >> Intel-gfx at lists.freedesktop.org
>> >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/intel-gfx/attachments/20150106/05bdfe0f/attachment.sig>
More information about the Intel-gfx
mailing list