[igt-dev] [PATCH i-g-t] gem_wsim: Distinguish particular engines during calculating nop calibration.

Fri Jan 24 11:45:49 UTC 2020

On 24/01/2020 11:41, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-01-24 11:35:35)
>>
>> On 24/01/2020 11:18, Anna Karas wrote:
>>> Extend handling -n parameter by accepting multiple values of per
>>> engine nop calibration. Add raw numbers handling to set default
>>> calibration values. Print copyable and pastable string with
>>> calibrations. Allow to switch between calculating in parallel
>>> or doing it sequentially.
>>>
>>> Accepted input values:
>>> -n 123456
>>> All calibrations will be set to 123456.
>>>
>>> -n ENG=value,ENG2=value2,value3
>>> i.e.
>>> -n RCS=123456,BCS=345678,999999
>>> RCS engine's value is set to 123456, BCS engine's value is set to
>>> 345678, 999999 is copied to rest of engines. All engines must be set;
>>> you can either provide values for each of the engines, or you can set
>>> specific values and provide a default value for the others.
>>>
>>> -n value,ENG1=value1,ENG2=value2
>>> First, value is copied to all engines, then value1 overrides ENG1, and
>>> finally value2 overrides ENG2.
>>>
>>> New output follows the pattern:
>>> Nop calibrations for 1000us delay is: <eng1>=<v1>,<eng2>=<v2>,...
>>> So you can easily copy-paste it to the next invocation.
>>>
>>> Switching between calculation modes:
>>> Run program with -T parameter to calculate calibrations in parallel.
>>> The calculations are performed sequentially by default.
>>>
>>> v2: Get rid of trailing whitespaces. Skip DEFAULT and VCS engines
>>> when printing out calibrations. Reject them in the string passed
>>> to -n. Re-align rest of help text. Fix accepting unknown engines.
>>>
>>> v3: Consider all cases of arguments
>>> for -n (Tvrtko).
>>>
>>> -n 10 (raw number)
>>> -n RCS (engine without calib)
>>> -n AA (neither the engine nor the number)
>>> -n RCS=500 (valid eng=val pair)
>>> -n RCS=AA (calib is not a number)
>>> -n XYZ=10 (engine is not an engine)
>>> -n XYZ=AA (combo)
>>>
>>> v4: Print calculated values (Chris). Do not make any assumptions
>>> about the order of the engines (Tvrtko).
>>
>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> Interesting, or maybe entirely expected, observation is that parallel
>> calibration gives different values. I think more engines slower they get
>> in parallel because of memory bandwith? Probably. That's why suggested
>> default is sequential. Although in practice is depends a lot on the
>> workload to be executed what matches the reality better. Calibrating for
>> the workload sounded a step too far. In theory could be doable using
>> Chris' wsim to theoretical throughput parser. Along the lines probably.
>> But definitely to complicated for now.
> 
> I haven't checked but have you accounted for the overhead in submission
> serialisation? Finite memory bandwidth is definitely plausible.
> 
> If we can put a timing MI_MATH loop inside 128bytes, there's a chance we
> could benefit from the tiny CS prefetch and avoid having N engines all
> competing for finite CS throughput.

For calibration and workload execution? Sounds intriguing. Max batch 
length would be limited by u32 CS timestamp cycles? Assuming you would 
do some sort of counting along those lines.. Possibly that limit would 
be long enough for any practical usage, I haven't tried to remember what 
it is. But anyway, okay to merge this and leave future improvements for 
the future?

Regards,

Tvrtko