<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><br>
</p>
<br>
<div class="moz-cite-prefix">On 12/11/2016 9:57 PM, Dave Airlie
wrote:<br>
</div>
<blockquote
cite="mid:CAPM=9tx+j9-3fZNY=peLjdsVqyLS6i3V-sV3XrnYsK2YuhWRBA@mail.gmail.com"
type="cite">
<pre wrap="">On 8 December 2016 at 12:02, Harry Wentland <a class="moz-txt-link-rfc2396E" href="mailto:harry.wentland@amd.com"><harry.wentland@amd.com></a> wrote:
</pre>
<blockquote type="cite">
<pre wrap="">We propose to use the Display Core (DC) driver for display support on
AMD's upcoming GPU (referred to by uGPU in the rest of the doc). In order to
avoid a flag day the plan is to only support uGPU initially and transition
to older ASICs gradually.
</pre>
</blockquote>
<pre wrap="">[FAQ: from past few days]
1) Hey you replied to Daniel, you never addressed the points of the RFC!
I've read it being said that I hadn't addressed the RFC, and you know
I've realised I actually had, because the RFC is great but it
presupposes the codebase as designed can get upstream eventually, and
I don't think it can. The code is too littered with midlayering and
other problems, that actually addressing the individual points of the
RFC would be missing the main point I'm trying to make.
This code needs rewriting, not cleaning, not polishing, it needs to be
split into its constituent parts, and reintegrated in a form more
Linux process friendly.
I feel that if I reply to the individual points Harry has raised in
this RFC, that it means the code would then be suitable for merging,
which it still won't, and I don't want people wasting another 6
months.
If DC was ready for the next-gen GPU it would be ready for the current
GPU, it's not the specific ASIC code that is the problem, it's the
huge midlayer sitting in the middle.</pre>
</blockquote>
<br>
We would love to upstream DC for all supported asic! We made enough
change to make Sea Island work but it's really not validate to the
extend we validate Polaris on linux and no where close to what we do
for 2017 ASICs. With DC the display hardware programming, resource
optimization, power management and interaction with rest of system
will be fully validated across multiple OSs. Therefore we have high
confidence that the quality is going to better than what we have
upstreammed today.<br>
<br>
I don't have a baseline to say if DC is in good enough quality for
older generation compare to upstream. For example we don't have HW
generate bandwidth_calc for DCE 8/10 (Sea/Vocanic island family) but
our code is structured in a way that we assume bandwidth_calc is
there. None of us feel like go untangle the formulas in windows
driver at this point to create our own version of bandwidth_calc.
It sort of work with HW default values but some mode / config is
likely to underflows. If community is okay with uncertain quality,
sure we would love to upstream everything to reduce our maintaince
overhead. You do get audio with DC on DCE8 though.<br>
<br>
<blockquote
cite="mid:CAPM=9tx+j9-3fZNY=peLjdsVqyLS6i3V-sV3XrnYsK2YuhWRBA@mail.gmail.com"
type="cite">
<pre wrap="">
2) We really need to share all of this code between OSes, why does
Linux not want it?
Sharing code is a laudable goal and I appreciate the resourcing
constraints that led us to the point at which we find ourselves, but
the way forward involves finding resources to upstream this code,
dedicated people (even one person) who can spend time on a day by day
basis talking to people in the open and working upstream, improving
other pieces of the drm as they go, reading atomic patches and
reviewing them, and can incrementally build the DC experience on top
of the Linux kernel infrastructure. Then having the corresponding
changes in the DC codebase happen internally to correspond to how the
kernel code ends up looking. Lots of this code overlaps with stuff the
drm already does, lots of is stuff the drm should be doing, so patches
to the drm should be sent instead.
</pre>
</blockquote>
<br>
Maybe let me share what we are doing and see if we can come up with
something to make DC work for both upstream and our internal need.
We are sharing code not just on Linux and we will do our best to
make our code upstream friendly. Last year we focussed on having
enough code to prove that our DAL rewrite works and get more people
contributing to it. We rush a bit as a result we had a few legacy
component we port from Windows driver and we know it's bloat that
needed to go. <br>
<br>
We designed DC so HW can contribute bandwidth_calc magic and psuedo
code to program the HW blocks. The HW blocks on the bottom of
DC.JPG in models our HW blocks and the programming sequence are
provided by HW engineers. If a piece of HW need a bit toggled 7
times during power up I rather have HW engineer put that in their
psedo code rather than me trying to find that sequence in some
document. Afterall they did simulate the HW with the toggle
sequence. I guess these are back-end code Daniel talked about. Can
we agree that DRM core is not interested in how things are done in
that layer and we can upstream these as it?<br>
<br>
The next is dce_hwseq.c to program the HW blocks in correct
sequence. Some HW block can be programmed in any sequence, but some
requires strict sequence to be followed. For example Display CLK
and PHY CLK need to be up before we enable timing generator. I
would like these sequence to remain in DC as it's really not DRM's
business to know how to program the HW. In a way you can consider
hwseq as a helper to commit state to HW. <br>
<br>
Above hwseq is the dce*_resource.c. It's job is to come up with the
HW state required to realize given config. For example we would use
the exact same HW resources with same optimization setting to drive
any same given config. If 4 x 4k@60 is supported with resource
setting A on HW diagnositc suite during bring up setting B on Linux
then we have a problem. It know which HW block work with which
block and their capability and limitations. I hope you are not
asking this stuff to move up to core because in reality we should
probably hide this in some FW, as HW expose the register to config
them differently that doesn't mean all combination of HW usage is
validated. To me resource is more of a helper to put together
functional pipeline and does not make any decision that any OS might
be interested in. <br>
<br>
These yellow boxes in DC.JPG are really specific to each generation
of HW and changes frequently. These are things that HW has consider
hiding it in FW before. Can we agree on those code (under /dc/dce*)
can stay?<br>
<br>
DAL3.JPG shows how we put this all together. The core part is
design to behave like helper, except we try to limit the entry point
and opted for caller to build desire state we want DC to commit to.
It didn't make sense for us to expose hundred of function (our
windows dal interface did) and require caller to call these helpers
in correct sequence. Caller builds absolute state it want to get to
and DC will make it happen with the HW available.<br>
<br>
<blockquote
cite="mid:CAPM=9tx+j9-3fZNY=peLjdsVqyLS6i3V-sV3XrnYsK2YuhWRBA@mail.gmail.com"
type="cite">
<pre wrap="">3) Then how do we upstream it?
Resource(s) need(s) to start concentrating at splitting this thing up
and using portions of it in the upstream kernel. We don't land fully
formed code in the kernel if we can avoid it. Because you can't review
the ideas and structure as easy as when someone builds up code in
chunks and actually develops in the Linux kernel. This has always
produced better more maintainable code. Maybe the result will end up
improving the AMD codebase as well.
</pre>
</blockquote>
Is this about demonstration how basic functionality work and add
more features with series of patches to make review eaiser? If so I
don't think we are staff to do this kind of rewrite. For example it
make no sense to hooking up bandwidth_calc to calculate HW magic if
we don't have mem_input to program the memory settings. We need
portion of hw_seq to ensure these blocks are programming in correct
sequence. We will need to feed bandwidth_calc it's required inputs,
which is basically the whole system state tracked in
validate_context today, which means we basically need big bulk of
resource.c. This effort might have benefit in reviewing the code,
but we will end up with pretty much similar if not the same as what
we already have.<br>
<br>
Or is the objection that we have the white boxes in DC.JPG instead
of using DRM objects? We can probably workout something to have the
white boxes derive from DRM objects and extend atomic state with our
validate_context where dce*_resource.c stores the constructed
pipelines.<br>
<br>
<blockquote
cite="mid:CAPM=9tx+j9-3fZNY=peLjdsVqyLS6i3V-sV3XrnYsK2YuhWRBA@mail.gmail.com"
type="cite">
<pre wrap="">
4) Why can't we put this in staging?
People have also mentioned staging, Daniel has called it a dead end,
I'd have considered staging for this code base, and I still might.
However staging has rules, and the main one is code in staging needs a
TODO list, and agreed criteria for exiting staging, I don't think we'd
be able to get an agreement on what the TODO list should contain and
how we'd ever get all things on it done. If this code ended up in
staging, it would most likely require someone dedicated to recreating
it in the mainline driver in an incremental fashion, and I don't see
that resource being available.
5) Why is a midlayer bad?
I'm not going to go into specifics on the DC midlayer, but we abhor
midlayers for a fair few reasons. The main reason I find causes the
most issues is locking. When you have breaks in code flow between
multiple layers, but having layers calling back into previous layers
it becomes near impossible to track who owns the locking and what the
current locking state is.
Consider
drma -> dca -> dcb -> drmb
drmc -> dcc -> dcb -> drmb
We have two codes paths that go back into drmb, now maybe drma has a
lock taken, but drmc doesn't, but we've no indication when we hit drmb
of what the context pre entering the DC layer is. This causes all
kinds of problems. The main requirement is the driver maintains the
execution flow as much as possible. The only callback behaviour should
be from an irq or workqueue type situations where you've handed
execution flow to the hardware to do something and it is getting back
to you. The pattern we use to get our of this sort of hole is helper
libraries, we structure code as much as possible as leaf nodes that
don't call back into the parents if we can avoid it (we don't always
succeed).
</pre>
</blockquote>
Okay. by the way DC does behave like a helper for most part. There
is no locking in DC. We work enough with different OS to know they
all have different synchronization primatives and interrupt handling
and have DC lock anything is just shooting ourself in the foot. We
do have function with lock in their function name in DC but those
are HW register lock to ensure that the HW register update
atomically. ie have 50 register write latch in HW at next vsync to
ensure HW state change on vsync boundary.<br>
<br>
<blockquote
cite="mid:CAPM=9tx+j9-3fZNY=peLjdsVqyLS6i3V-sV3XrnYsK2YuhWRBA@mail.gmail.com"
type="cite">
<pre wrap="">
So the above might becomes
drma-> dca_helper
-> dcb_helper
-> drmb.
In this case the code flow is controlled by drma, dca/dcb might be
modifying data or setting hw state but when we get to drmb it's easy
to see what data is needs and what locking.
DAL/DC goes against this in so many ways, and when I look at the code
I'm never sure where to even start pulling the thread to unravel it.
</pre>
</blockquote>
I don't know where we go against it. In the case we do callback to
DRM for MST case we have<br>
<br>
amdgpu_dm_atomic_commit (implement atomic_commit)<br>
dc_commit_targets (commit helper)<br>
dce110_apply_ctx_to_hw (hw_seq)<br>
core_link_enable_stream (part of MST enable sequence)<br>
allocate_mst_payload (helper for above func in same file)<br>
dm_helpers_dp_mst_write_payload_allocation_table (glue code to call
DRM)<br>
drm_dp_mst_allocate_vcpi (DRM)<br>
<br>
As you see even in this case we are only 6 level deep before we
callback to DRM, and 2 of those functions are in same file as helper
func of the bigger sequence.<br>
<br>
Can you clarify the distinction between what you would call a mid
layer vs helper. We consulted Alex a lot and we know about this
inversion of control pattern and we are trying our best to do it.
Is it the way functions are named and files folder structure? Would
it help if we flatten amdgpu_dm_atomic_commit and
dc_commit_targets? Even if we do I would imagine we want some
helper in commit rather a giant 1000 line function. Is there any
concern that we put dc_commit_targets under /dc folder as we want
other platform to run exact same helper? Or this is about the state
dc_commit_targets is too big? or the state is stored
validate_context rather than drm_atomic_state? <br>
<br>
I don't think it make sense for DRM to get into how we decide to use
our HW blocks. For example any refactor done in core should not
result in us using different pipeline to drive the same config. We
would like to have control over how our HW pipeline is constructed.<br>
<br>
<blockquote
cite="mid:CAPM=9tx+j9-3fZNY=peLjdsVqyLS6i3V-sV3XrnYsK2YuhWRBA@mail.gmail.com"
type="cite">
<pre wrap="">
Some questions I have for AMD engineers that also I'd want to see
addressed before any consideration of merging would happen!
How do you plan on dealing with people rewriting or removing code
upstream that is redundant in the kernel, but required for internal
stuff?</pre>
</blockquote>
<br>
Honestly I don't know what these are. Like you and Jerome remove
func ptr abstraction (I know it was bad, that was one of the
component we ported from windows) and we need to keep it as function
pointer so we can still run our code on FPGA before we see first
silicon? I don't think if we nak the function ptr removal will be a
problem for community. The rest is valued and we took with open
arm. <br>
<br>
Or this is more like we have code duplication after DRM added some
functionality we can use? I would imaging its more of moving what
we got working in our code to DRM core if we are upstreamed and we
have no problem accomodate for that as the code moved out to DRM
core can be included in other platforms. We don't have any private
ioctl today and we don't plan to have any outside of using DRM
object properties.<br>
<br>
<blockquote
cite="mid:CAPM=9tx+j9-3fZNY=peLjdsVqyLS6i3V-sV3XrnYsK2YuhWRBA@mail.gmail.com"
type="cite">
<pre wrap="">
How are you going to deal with new Linux things that overlap
incompatibly with your internally developed stuff?</pre>
</blockquote>
I really don't know what those new linux things can be that could
cause us problem. If anything the new things will be probably come
from us if we are upstreammed. <br>
<br>
atomic: we had that on windows 8 years ago for windows vista, yes
sematic/abstraction is different but concept is the same. We could
have easily settled with DRM semantics or DRM could easily take some
form of our pattern.<br>
<br>
DP MST: AMD was the first source certified and we work closely with
the first branch certified. I was a part of that team and we had a
very solid implementation. If we were upstreamed I don't see you
would want to reinvent the wheel and not try to massage what we have
into shape for DRM core for other driver to reuse.<br>
<br>
drm_plane: windows multi-plane overlay and Andriod HW composer? We
had that working 2 years ago. If you are upstreammed and you are
first you usually have a say in how it should go down don't you?<br>
<br>
The new thing coming are Free Sync HDR, 8k@60 with DP DSC etc. I
would imaging we would beat all other vendor to the first open
source solution if we leverage effort from our extended display
team.<br>
<br>
<blockquote
cite="mid:CAPM=9tx+j9-3fZNY=peLjdsVqyLS6i3V-sV3XrnYsK2YuhWRBA@mail.gmail.com"
type="cite">
<pre wrap="">
If the code is upstream will it be tested in the kernel by some QA
group, or will there be some CI infrastructure used to maintain and to
watch for Linux code that breaks assumptions in the DC code?</pre>
</blockquote>
We have tester that runs set of display test every other day on
linux. We don't run on DRM_Next tree yet and Alex is working out a
plan to allow us use DRM_Next as our development branch. Upstream
is not likely to be tested by QA though.<br>
<br>
DC does not assume anything. DC require full state given in
dc_commit_targets / dc_commit_surfaces_to_target. we do whatever is
specified in the data structure. dc_commit_surfaces_to_target can
be considered as a helper function to change plane without visual
side effect on vsync bondary. dc_commit_targets can be considered
as a helper function to light up a display with black screen. DRM
core has full control if you want to light up to black screen as
soon as monitor is plugged in or you want to light up after someone
does a mode set. Hotplug interrupt goes to amdgpu_dm, and it will
take the require lock in DRM object because calling DC to detect. <br>
<blockquote
cite="mid:CAPM=9tx+j9-3fZNY=peLjdsVqyLS6i3V-sV3XrnYsK2YuhWRBA@mail.gmail.com"
type="cite">
<pre wrap="">
Can you show me you understand that upstream code is no longer 100% in
your control and things can happen to it that you might not expect and
you need to deal with it?</pre>
</blockquote>
I think so, other than we haven't been spanning the mailing list.
We already dealing with we don't control 100% our code to some
extend. We don't control bandwidth_calc. Trust me we are not
keeping up with the updates that HW is doing with it for next gen
hw. Everytime we pull there is a new term they added and we have to
find a way to feed that input. We had to clean up linux style for
them everytime we pull. Our HW diagnostic suite has different set
of requirements and they frequently contribute to our code. We took
you and Jerome's patch. If it's validated we want that code. <br>
<br>
At end of the day I think the architecture is really about what's HW
and what's DRM core. Like I said all the yellow boxes has been
proposed to running on firmware but we decide to keep them in driver
as it's easier to debug on x86 than uC. I can tell you that our HW
guys were happy when I decide to open source bandwidth_calc but we
did it anyways. I feel like because we are opening up the
complexity and inner working of our HW, we are somehow getting
penalized for being open.<br>
<blockquote
cite="mid:CAPM=9tx+j9-3fZNY=peLjdsVqyLS6i3V-sV3XrnYsK2YuhWRBA@mail.gmail.com"
type="cite">
<pre wrap="">
Dave.
</pre>
</blockquote>
Tony<br>
</body>
</html>