[Intel-gfx] [PATCH v16 08/17] drm/i915/perf: Add OA unit support for Gen 8+
Lionel Landwerlin
lionel.g.landwerlin at intel.com
Mon Jun 12 15:14:48 UTC 2017
On 12/06/17 15:02, Matthew Auld wrote:
> On 5 June 2017 at 15:48, Lionel Landwerlin
> <lionel.g.landwerlin at intel.com> wrote:
>> From: Robert Bragg <robert at sixbynine.org>
>>
>> Enables access to OA unit metrics for BDW, CHV, SKL and BXT which all
>> share (more-or-less) the same OA unit design.
>>
>> Of particular note in comparison to Haswell: some OA unit HW config
>> state has become per-context state and as a consequence it is somewhat
>> more complicated to manage synchronous state changes from the cpu while
>> there's no guarantee of what context (if any) is currently actively
>> running on the gpu.
>>
>> The periodic sampling frequency which can be particularly useful for
>> system-wide analysis (as opposed to command stream synchronised
>> MI_REPORT_PERF_COUNT commands) is perhaps the most surprising state to
>> have become per-context save and restored (while the OABUFFER
>> destination is still a shared, system-wide resource).
>>
>> This support for gen8+ takes care to consider a number of timing
>> challenges involved in synchronously updating per-context state
>> primarily by programming all config state from the cpu and updating all
>> current and saved contexts synchronously while the OA unit is still
>> disabled.
>>
>> The driver intentionally avoids depending on command streamer
>> programming to update OA state considering the lack of synchronization
>> between the automatic loading of OACTXCONTROL state (that includes the
>> periodic sampling state and enable state) on context restore and the
>> parsing of any general purpose BB the driver can control. I.e. this
>> implementation is careful to avoid the possibility of a context restore
>> temporarily enabling any out-of-date periodic sampling state. In
>> addition to the risk of transiently-out-of-date state being loaded
>> automatically; there are also internal HW latencies involved in the
>> loading of MUX configurations which would be difficult to account for
>> from the command streamer (and we only want to enable the unit when once
>> the MUX configuration is complete).
>>
>> Since the Gen8+ OA unit design no longer supports clock gating the unit
>> off for a single given context (which effectively stopped any progress
>> of counters while any other context was running) and instead supports
>> tagging OA reports with a context ID for filtering on the CPU, it means
>> we can no longer hide the system-wide progress of counters from a
>> non-privileged application only interested in metrics for its own
>> context. Although we could theoretically try and subtract the progress
>> of other contexts before forwarding reports via read() we aren't in a
>> position to filter reports captured via MI_REPORT_PERF_COUNT commands.
>> As a result, for Gen8+, we always require the
>> dev.i915.perf_stream_paranoid to be unset for any access to OA metrics
>> if not root.
>>
>> v5: Drain submitted requests when enabling metric set to ensure no
>> lite-restore erases the context image we just updated (Lionel)
>>
>> v6: In addition to drain, switch to kernel context & update all
>> context in place (Chris)
>>
>> v7: Add missing mutex_unlock() if switching to kernel context fails
>> (Matthew)
>>
>> v8: Simplify OA period/flex-eu-counters programming by using the
>> batchbuffer instead of modifying ctx-image (Lionel)
>>
>> v9: Back to updating the context image (due to erroneous testing,
>> batchbuffer programming the OA unit doesn't actually work)
>> (Lionel)
>> Pin context before updating context image (Chris)
>> Drop MMIO programming now that we switch to a kernel context with
>> right values in initial context image (Chris)
>>
>> v10: Just pin_map the contexts we want to modify or let the
>> configuration happen on first use (Chris)
>>
>> v11: Update kernel context OA config through the batchbuffer rather
>> than on the fly ctx-image update (Lionel)
>>
>> v12: Rework OA context registers update again by swithing away from
>> user contexts and reconfiguring the kernel context through the
>> batchbuffer and updating all the other contexts' context image.
>> Also take care to lock slice/subslice configuration when OA is
>> on. (Lionel)
>>
>> v13: Request rpcs updates on all engine when updating the OA config
>> (Lionel)
>>
>> v14: Drop any kind of rpcs management now that we monitor sseu
>> configuration changes in a later patch (Lionel)
>> Remove usleep after programming the NOA configs on Gen8+, this
>> doesn't seem to be needed (Lionel)
>>
>> v15: Respect coding style for block comments (Chris)
>>
>> Signed-off-by: Robert Bragg <robert at sixbynine.org>
>> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin at intel.com>
>> Reviewed-by: Matthew Auld <matthew.auld at intel.com> \o/
>> ---
> <SNIP>
>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
>> index 52b3a1fd4059..eb7bb51c0a4a 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.h
>> +++ b/drivers/gpu/drm/i915/intel_lrc.h
>> @@ -62,6 +62,8 @@ enum {
>> INTEL_CONTEXT_SCHEDULE_OUT,
>> };
>>
>> +struct sseu_dev_info;
>> +
> Seems bogus, no?
>
Indeed!
Will drop that.
More information about the Intel-gfx
mailing list