[RFC] Using DC in amdgpu for upcoming GPU
Harry Wentland
harry.wentland at amd.com
Thu Dec 8 14:33:25 UTC 2016
Hi Daniel,
just a quick clarification in-line about "validation" inside atomic_commit.
On 2016-12-08 04:59 AM, Daniel Vetter wrote:
> Hi Harry,
>
> On Wed, Dec 07, 2016 at 09:02:13PM -0500, Harry Wentland wrote:
>> We propose to use the Display Core (DC) driver for display support on
>> AMD's upcoming GPU (referred to by uGPU in the rest of the doc). In order to
>> avoid a flag day the plan is to only support uGPU initially and transition
>> to older ASICs gradually.
>>
>> The DC component has received extensive testing within AMD for DCE8, 10, and
>> 11 GPUs and is being prepared for uGPU. Support should be better than
>> amdgpu's current display support.
>>
>> * All of our QA effort is focused on DC
>> * All of our CQE effort is focused on DC
>> * All of our OEM preloads and custom engagements use DC
>> * DC behavior mirrors what we do for other OSes
>>
>> The new asic utilizes a completely re-designed atom interface, so we cannot
>> easily leverage much of the existing atom-based code.
>>
>> We've introduced DC to the community earlier in 2016 and received a fair
>> amount of feedback. Some of what we've addressed so far are:
>>
>> * Self-contain ASIC specific code. We did a bunch of work to pull
>> common sequences into dc/dce and leave ASIC specific code in
>> separate folders.
>> * Started to expose AUX and I2C through generic kernel/drm
>> functionality and are mostly using that. Some of that code is still
>> needlessly convoluted. This cleanup is in progress.
>> * Integrated Dave and Jerome’s work on removing abstraction in bios
>> parser.
>> * Retire adapter service and asic capability
>> * Remove some abstraction in GPIO
>>
>> Since a lot of our code is shared with pre- and post-silicon validation
>> suites changes need to be done gradually to prevent breakages due to a major
>> flag day. This, coupled with adding support for new asics and lots of new
>> feature introductions means progress has not been as quick as we would have
>> liked. We have made a lot of progress none the less.
>>
>> The remaining concerns that were brought up during the last review that we
>> are working on addressing:
>>
>> * Continue to cleanup and reduce the abstractions in DC where it
>> makes sense.
>> * Removing duplicate code in I2C and AUX as we transition to using the
>> DRM core interfaces. We can't fully transition until we've helped
>> fill in the gaps in the drm core that we need for certain features.
>> * Making sure Atomic API support is correct. Some of the semantics of
>> the Atomic API were not particularly clear when we started this,
>> however, that is improving a lot as the core drm documentation
>> improves. Getting this code upstream and in the hands of more
>> atomic users will further help us identify and rectify any gaps we
>> have.
>>
>> Unfortunately we cannot expose code for uGPU yet. However refactor / cleanup
>> work on DC is public. We're currently transitioning to a public patch
>> review. You can follow our progress on the amd-gfx mailing list. We value
>> community feedback on our work.
>>
>> As an appendix I've included a brief overview of the how the code currently
>> works to make understanding and reviewing the code easier.
>>
>> Prior discussions on DC:
>>
>> * https://lists.freedesktop.org/archives/dri-devel/2016-March/103398.html
>> *
>> https://lists.freedesktop.org/archives/dri-devel/2016-February/100524.html
>>
>> Current version of DC:
>>
>> * https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/display?h=amd-staging-4.7
>>
>> Once Alex pulls in the latest patches:
>>
>> * https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/display?h=amd-staging-4.7
>>
>> Best Regards,
>> Harry
>>
>>
>> ************************************************
>> *** Appendix: A Day in the Life of a Modeset ***
>> ************************************************
>>
>> Below is a high-level overview of a modeset with dc. Some of this might be a
>> little out-of-date since it's based on my XDC presentation but it should be
>> more-or-less the same.
>>
>> amdgpu_dm_atomic_commit()
>> {
>> /* setup atomic state */
>> drm_atomic_helper_prepare_planes(dev, state);
>> drm_atomic_helper_swap_state(dev, state);
>> drm_atomic_helper_update_legacy_modeset_state(dev, state);
>>
>> /* create or remove targets */
>>
>> /********************************************************************
>> * *** Call into DC to commit targets with list of all known targets
>> ********************************************************************/
>> /* DC is optimized not to do anything if 'targets' didn't change. */
>> dc_commit_targets(dm->dc, commit_targets, commit_targets_count)
>> {
>> /******************************************************************
>> * *** Build context (function also used for validation)
>> ******************************************************************/
>> result = core_dc->res_pool->funcs->validate_with_context(
>> core_dc,set,target_count,context);
>
> I can't dig into details of DC, so this is not a 100% assessment, but if
> you call a function called "validate" in atomic_commit, you're very, very
> likely breaking atomic. _All_ validation must happen in ->atomic_check,
> if that's not the case TEST_ONLY mode is broken. And atomic userspace is
> relying on that working.
>
This function is not really named correctly. What it does is it builds a
context and validates at the same time. In commit we simply care that it
builds the context. Validate should never fail here (since this was
already validated in atomic_check).
We call the same function at atomic_check
amdgpu_dm_atomic_check ->
dc_validate_resources ->
core_dc->res_pool->funcs->validate_with_context
As for the rest, I hear you and appreciate your feedback. Let me get
back to you on that later.
Thanks,
Harry
> The only thing that you're allowed to return from ->atomic_commit is
> out-of-memory, hw-on-fire and similar unforeseen and catastrophic issues.
> Kerneldoc expklains this.
>
> Now the reason I bring this up (and we've discussed it at length in
> private) is that DC still suffers from a massive abstraction midlayer. A
> lot of the back-end stuff (dp aux, i2c, abstractions for allocation,
> timers, irq, ...) have been cleaned up, but the midlayer is still there.
> And I understand why you have it, and why it's there - without some OS
> abstraction your grand plan of a unified driver across everything doesn't
> work out so well.
>
> But in a way the backend stuff isn't such a big deal. It's annoying since
> lots of code, and bugfixes have to be duplicated and all that, but it's
> fairly easy to fix case-by-case, and as long as AMD folks stick around
> (which I fully expect) not a maintainance issue. It makes it harder for
> others to contribute, but then since it's mostly the leaf it's generally
> easy to just improve the part you want to change (as an outsider). And if
> you want to improve shared code the only downside is that you can't also
> improve amd, but that's not so much a problem for non-amd folks ;-)
>
> The problem otoh with the abstraction layer between drm core and the amd
> driver is that you can't ignore if you want to refactor shared code. And
> because it's an entire world of its own, it's much harder to understand
> what the driver is doing (without reading it all). Some examples of what I
> mean:
>
> - All other drm drivers subclass drm objects (by embedding them) into the
> corresponding hw part that most closely matches the drm object's
> semantics. That means even when you have 0 clue about how a given piece
> of hw works, you have a reasonable chance of understanding code. If it's
> all your own stuff you always have to keep in minde the special amd
> naming conventions. That gets old real fast if you trying to figure out
> what 20+ (or are we at 30 already?) drivers are doing.
>
> - This is even more true for atomic. Atomic has a pretty complicated
> check/commmit transactional model for updating display state. It's a
> standardized interface, and it's extensible, and we want generic
> userspace to be able to run on any driver. Fairly often we realize that
> semantics of existing or newly proposed properties and state isn't
> well-defined enough, and then we need to go&read all the drivers and
> figure out how to fix up the mess. DC has it's entirely separate state
> structures which again don't subclass the atomic core structures (afaik
> at least). Again the same problems apply that you can't find things, and
> that figuring out the exact semantics and spotting differences in
> behaviour is almost impossible.
>
> - The trouble isn't just in reading code and understanding it correctly,
> it's also in finding it. If you have your own completely different world
> then just finding the right code is hard - cscope and grep fail to work.
>
> - Another issue is that very often we unify semantics in drivers by adding
> some new helpers that at least dtrt for most of the drivers. If you have
> your own world then the impendance mismatch will make sure that amd
> drivers will have slightly different semantics, and I think that's not
> good for the ecosystem and kms - people want to run a lot more than just
> a boot splash with generic kms userspace, stuff like xf86-video-$vendor
> is going out of favour heavily.
>
> Note that all this isn't about amd walking away and leaving an
> unmaintainable mess behind. Like I've said I don't think this is a big
> risk. The trouble is that having your own world makes it harder for
> everyone else to understand the amd driver, and understanding all drivers
> is very often step 1 in some big refactoring or feature addition effort.
> Because starting to refactor without understanding the problem generally
> doesn't work ;_) And you can't make this step 1 easier for others by
> promising to always maintain DC and update it to all the core changes,
> because that's only step 2.
>
> In all the DC discussions we've had thus far I haven't seen anyone address
> this issue. And this isn't just an issue in drm, it's pretty much
> established across all linux subsystems with the "no midlayer or OS
> abstraction layers in drivers" rule. There's some real solid reasons why
> such a HAl is extremely unpopular with upstream. And I haven't yet seen
> any good reason why amd needs to be different, thus far it looks like a
> textbook case, and there's been lots of vendors in lots of subsystems who
> tried to push their HAL.
>
> Thanks, Daniel
>
>>
>> /******************************************************************
>> * *** Apply safe power state
>> ******************************************************************/
>> pplib_apply_safe_state(core_dc);
>>
>> /****************************************************************
>> * *** Apply the context to HW (program HW)
>> ****************************************************************/
>> result = core_dc->hwss.apply_ctx_to_hw(core_dc,context)
>> {
>> /* reset pipes that need reprogramming */
>> /* disable pipe power gating */
>> /* set safe watermarks */
>>
>> /* for all pipes with an attached stream */
>> /************************************************************
>> * *** Programming all per-pipe contexts
>> ************************************************************/
>> status = apply_single_controller_ctx_to_hw(...)
>> {
>> pipe_ctx->tg->funcs->set_blank(...);
>> pipe_ctx->clock_source->funcs->program_pix_clk(...);
>> pipe_ctx->tg->funcs->program_timing(...);
>> pipe_ctx->mi->funcs->allocate_mem_input(...);
>> pipe_ctx->tg->funcs->enable_crtc(...);
>> bios_parser_crtc_source_select(...);
>>
>> pipe_ctx->opp->funcs->opp_set_dyn_expansion(...);
>> pipe_ctx->opp->funcs->opp_program_fmt(...);
>>
>> stream->sink->link->link_enc->funcs->setup(...);
>> pipe_ctx->stream_enc->funcs->dp_set_stream_attribute(...);
>> pipe_ctx->tg->funcs->set_blank_color(...);
>>
>> core_link_enable_stream(pipe_ctx);
>> unblank_stream(pipe_ctx,
>>
>> program_scaler(dc, pipe_ctx);
>> }
>> /* program audio for all pipes */
>> /* update watermarks */
>> }
>>
>> program_timing_sync(core_dc, context);
>> /* for all targets */
>> target_enable_memory_requests(...);
>>
>> /* Update ASIC power states */
>> pplib_apply_display_requirements(...);
>>
>> /* update surface or page flip */
>> }
>> }
>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
More information about the amd-gfx
mailing list