[Intel-gfx] [PATCH 08/12] drm/i915: Improve reliability for Displayport link training
Paulo Zanoni
przanoni at gmail.com
Wed Jul 30 16:07:34 CEST 2014
2014-07-14 16:10 GMT-03:00 Todd Previte <tprevite at gmail.com>:
> Link training for Displayport can fail in many ways and at multiple different points
> during the training process. Previously, errors were logged but no additional action
> was taken based on them. Consequently, training attempts could continue even after
> errors have occured that would prevent successful link training. This patch updates
> the link training functions and where/how they're used to be more intelligent about
> failures and to stop trying to train the link when it's a lost cause.
I agree we need to do something about this problem, but I'm not sure
how this patch improves the situation. Can you please describe more
how exactly the changes you did are getting us towards the solution of
the problem? Of course, the points where you start signaling
previously-unsignaled errors are obviously an improvement.
Anyway, this patch should probably be split in 3:
- A patch to add the boolean return values and change
intel_dp_check_link_status() + intel_enable_dp() +
intel_ddi_pre_enable().
- A patch to signal dpcd error cases we were previously ignoring.
- A patch to that changes how intel_dp_start_link_train() and
intel_dp_stop_link_train() currently behave (the "goto"s replacing
"break" statements).
Se below for better explanations.
The big problem here is that these encoder callbacks never fail, so
there's not really much to do after we detect a sink failure.
In the current code (without your patch), we already clearly signal
the link training failures with debug+error messages, so the new debug
messages at places linke intel_enable_dp() are not much of an
improvement. Also, we already run intel_dp_set_idle_link_train() at
the end of intel_dp_complete_link_train(), and we do additional things
such as calling intel_dp_stop_link_train(). And I guess we do the
non-DDI equivalent steps at some point too, so I'm not sure how
jumping straight to intel_dp_set_idle_link_train() helps, since we do
it anyway as part of the normal sequence. Also, our mode set sequence
is currently completely followed - even though the sink fails to
understand what we throw at it - and I'm always afraid of not
following the sequence exactly as described in the spec, since it
could lead to unpredicted bugs (we had this problem dozens of times in
the past).
I think the real cool solution would be to retry link training with
different parameters (different clock and number of lanes), but I
imagine this would require a lot of code refactoring since we probably
need to go back to the compute_config stages of the modeset sequence.
Or maybe just finding a way to tell the user-space modesetting app
that it has a black screen would already be helpful.
Other people may think that the real-real long-term solution would be
to fix our code so it never fails link training or gives black screens
:)
Some more below:
>
> Signed-off-by: Todd Previte <tprevite at gmail.com>
> ---
> drivers/gpu/drm/i915/intel_ddi.c | 23 +++++++++--
> drivers/gpu/drm/i915/intel_dp.c | 89 +++++++++++++++++++++++++++++-----------
> drivers/gpu/drm/i915/intel_drv.h | 7 ++--
> 3 files changed, 90 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c
> index ded6013..c0727b8 100644
> --- a/drivers/gpu/drm/i915/intel_ddi.c
> +++ b/drivers/gpu/drm/i915/intel_ddi.c
> @@ -1246,6 +1246,7 @@ static void intel_ddi_pre_enable(struct intel_encoder *intel_encoder)
> struct intel_crtc *crtc = to_intel_crtc(encoder->crtc);
> enum port port = intel_ddi_get_encoder_port(intel_encoder);
> int type = intel_encoder->type;
> + uint8_t fail_code = 0;
>
> if (crtc->config.has_audio) {
> DRM_DEBUG_DRIVER("Audio on pipe %c on DDI\n",
> @@ -1274,10 +1275,19 @@ static void intel_ddi_pre_enable(struct intel_encoder *intel_encoder)
> intel_dp->DP |= DDI_PORT_WIDTH(intel_dp->lane_count);
>
> intel_dp_sink_dpms(intel_dp, DRM_MODE_DPMS_ON);
> - intel_dp_start_link_train(intel_dp);
> - intel_dp_complete_link_train(intel_dp);
> + if (!intel_dp_start_link_train(intel_dp)) {
> + fail_code = 1;
> + goto failed;
> + }
> + if (!intel_dp_complete_link_train(intel_dp)) {
> + fail_code = 2;
> + goto failed;
> + }
> if (port != PORT_A)
> - intel_dp_stop_link_train(intel_dp);
> + if (!intel_dp_stop_link_train(intel_dp)) {
> + fail_code = 3;
> + goto failed;
> + }
> } else if (type == INTEL_OUTPUT_HDMI) {
> struct intel_hdmi *intel_hdmi = enc_to_intel_hdmi(encoder);
>
> @@ -1285,6 +1295,13 @@ static void intel_ddi_pre_enable(struct intel_encoder *intel_encoder)
> crtc->config.has_hdmi_sink,
> &crtc->config.adjusted_mode);
> }
> +
> + return;
> +
> +failed:
> + /* Clear link training here */
> + intel_dp_set_idle_link_train(enc_to_intel_dp(encoder));
> + DRM_DEBUG_KMS("Failed to pre-enable DP, fail code %d\n", fail_code);
> }
>
> static void intel_ddi_post_disable(struct intel_encoder *intel_encoder)
> diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
> index 88f1bbe..1c6ee34 100644
> --- a/drivers/gpu/drm/i915/intel_dp.c
> +++ b/drivers/gpu/drm/i915/intel_dp.c
> @@ -2018,23 +2018,42 @@ static void chv_post_disable_dp(struct intel_encoder *encoder)
> mutex_unlock(&dev_priv->dpio_lock);
> }
>
> -static void intel_enable_dp(struct intel_encoder *encoder)
> +static bool intel_enable_dp(struct intel_encoder *encoder)
> {
> struct intel_dp *intel_dp = enc_to_intel_dp(&encoder->base);
> struct drm_device *dev = encoder->base.dev;
> struct drm_i915_private *dev_priv = dev->dev_private;
> uint32_t dp_reg = I915_READ(intel_dp->output_reg);
> + uint8_t fail_code = 0;
>
> + /* FIXME: Not sure this needs to be a WARN() */
You can git-blame that line and see its history :)
Anyway, the point is that if you reach this point of the code, DP
_must_ be disabled, so "dp_reg & DP_POR_EN" tells us there's a bug
somewhere else.
> if (WARN_ON(dp_reg & DP_PORT_EN))
> - return;
> + return false;
>
> intel_edp_panel_vdd_on(intel_dp);
> intel_dp_sink_dpms(intel_dp, DRM_MODE_DPMS_ON);
> - intel_dp_start_link_train(intel_dp);
> + if (!intel_dp_start_link_train(intel_dp)) {
> + fail_code = 1;
> + goto failed;
> + }
> intel_edp_panel_on(intel_dp);
> edp_panel_vdd_off(intel_dp, true);
> - intel_dp_complete_link_train(intel_dp);
> - intel_dp_stop_link_train(intel_dp);
> + if (!intel_dp_complete_link_train(intel_dp)) {
> + fail_code = 2;
> + goto failed;
> + }
> + if (!intel_dp_stop_link_train(intel_dp)) {
> + fail_code = 3;
> + goto failed;
> + }
> +
> + return true;
> +
> +failed:
> + /* Clear link training here */
> + intel_dp_set_idle_link_train(intel_dp);
Function intel_enable_dp() is called by all non-DDI gens, but
intel_dp_set_idle_link_train() has an early return for !HAS_DDI. So
this function call is basically doing nothing here. IMHO this is the
biggest problem with this patch.
> + DRM_DEBUG_KMS("Failed to enable DP with code %d\n", fail_code);
> + return false;
> }
>
> static void g4x_enable_dp(struct intel_encoder *encoder)
> @@ -2956,7 +2975,7 @@ intel_dp_update_link_train(struct intel_dp *intel_dp, uint32_t *DP,
> return ret == intel_dp->lane_count;
> }
>
> -static void intel_dp_set_idle_link_train(struct intel_dp *intel_dp)
> +void intel_dp_set_idle_link_train(struct intel_dp *intel_dp)
> {
> struct intel_digital_port *intel_dig_port = dp_to_dig_port(intel_dp);
> struct drm_device *dev = intel_dig_port->base.base.dev;
> @@ -2988,7 +3007,7 @@ static void intel_dp_set_idle_link_train(struct intel_dp *intel_dp)
> }
>
> /* Enable corresponding port and start training pattern 1 */
> -void
> +bool
> intel_dp_start_link_train(struct intel_dp *intel_dp)
> {
> struct drm_encoder *encoder = &dp_to_dig_port(intel_dp)->base.base;
> @@ -3007,11 +3026,17 @@ intel_dp_start_link_train(struct intel_dp *intel_dp)
> link_config[1] = intel_dp->lane_count;
> if (drm_dp_enhanced_frame_cap(intel_dp->dpcd))
> link_config[1] |= DP_LANE_COUNT_ENHANCED_FRAME_EN;
> - drm_dp_dpcd_write(&intel_dp->aux, DP_LINK_BW_SET, link_config, 2);
> + if (drm_dp_dpcd_write(&intel_dp->aux, DP_LINK_BW_SET, link_config, 2) != 2) {
> + DRM_DEBUG_KMS("Failed to write sink DPCD for link rate and lane count\n");
> + goto failed;
> + }
>
> link_config[0] = 0;
> link_config[1] = DP_SET_ANSI_8B10B;
> - drm_dp_dpcd_write(&intel_dp->aux, DP_DOWNSPREAD_CTRL, link_config, 2);
> + if (drm_dp_dpcd_write(&intel_dp->aux, DP_DOWNSPREAD_CTRL, link_config, 2) != 2) {
> + DRM_DEBUG_KMS("Failed to write sink DPCD for downspread control\n");
> + goto failed;
> + }
These two above are useful additions and could probably go into a
separate patch. But maybe I'd make them be DRM_ERRORs since we
probably want to easily notice them - and get the bug reports.
Maybe we could also create a new dp_dpcd_write_safe() function/macro
that would be responsible for printing error messages in case we don't
transfer all the bits we want, then we could make the whole intel_dp.c
file use it. The nice thing of being a macro would be that the
DRM_ERROR would print the name of the caller function.
>
> DP |= DP_PORT_EN;
>
> @@ -3020,7 +3045,7 @@ intel_dp_start_link_train(struct intel_dp *intel_dp)
> DP_TRAINING_PATTERN_1 |
> DP_LINK_SCRAMBLING_DISABLE)) {
> DRM_ERROR("failed to enable link training\n");
> - return;
> + goto failed;
> }
>
> voltage = 0xff;
> @@ -3032,12 +3057,12 @@ intel_dp_start_link_train(struct intel_dp *intel_dp)
> drm_dp_link_train_clock_recovery_delay(intel_dp->dpcd);
> if (!intel_dp_get_link_status(intel_dp, link_status)) {
> DRM_ERROR("failed to get link status\n");
> - break;
> + goto failed;
> }
>
> if (drm_dp_clock_recovery_ok(link_status, intel_dp->lane_count)) {
> DRM_DEBUG_KMS("clock recovery OK\n");
> - break;
> + goto cr_done;
> }
>
> /* Check to see if we've tried the max voltage */
> @@ -3048,7 +3073,7 @@ intel_dp_start_link_train(struct intel_dp *intel_dp)
> ++loop_tries;
> if (loop_tries == 5) {
> DRM_ERROR("too many full retries, give up\n");
> - break;
> + goto failed;
> }
> intel_dp_reset_link_train(intel_dp, &DP,
> DP_TRAINING_PATTERN_1 |
> @@ -3062,7 +3087,7 @@ intel_dp_start_link_train(struct intel_dp *intel_dp)
> ++voltage_tries;
> if (voltage_tries == 5) {
> DRM_ERROR("too many voltage retries, give up\n");
> - break;
> + goto failed;
> }
> } else
> voltage_tries = 0;
> @@ -3071,14 +3096,20 @@ intel_dp_start_link_train(struct intel_dp *intel_dp)
> /* Update training set as requested by target */
> if (!intel_dp_update_link_train(intel_dp, &DP, link_status)) {
> DRM_ERROR("failed to update link training\n");
> - break;
> + goto failed;
> }
> }
>
> +cr_done:
> intel_dp->DP = DP;
> + return true;
> +
> +failed:
> + DRM_DEBUG_KMS("Failed to initiate link training\n");
> + return false;
This set of changes where you replace "break" with "goto"s should be
on a separate patch, with a nice explanation of what are the
consequences of not doing "intel_dp->DP = DP" on the cases were we
just "goto failed". If the link training failed, we should probably
disable DP_PORT_EN.
> }
>
> -void
> +bool
> intel_dp_complete_link_train(struct intel_dp *intel_dp)
> {
> bool channel_eq = false;
> @@ -3095,7 +3126,7 @@ intel_dp_complete_link_train(struct intel_dp *intel_dp)
> training_pattern |
> DP_LINK_SCRAMBLING_DISABLE)) {
> DRM_ERROR("failed to start channel equalization\n");
> - return;
> + return false;
> }
>
> tries = 0;
> @@ -3154,14 +3185,17 @@ intel_dp_complete_link_train(struct intel_dp *intel_dp)
>
> intel_dp->DP = DP;
>
> - if (channel_eq)
> + if (channel_eq) {
> DRM_DEBUG_KMS("Channel EQ done. DP Training successful\n");
> + return true;
> + }
>
> + return false;
> }
Same here.
>
> -void intel_dp_stop_link_train(struct intel_dp *intel_dp)
> +bool intel_dp_stop_link_train(struct intel_dp *intel_dp)
> {
> - intel_dp_set_link_train(intel_dp, &intel_dp->DP,
> + return intel_dp_set_link_train(intel_dp, &intel_dp->DP,
> DP_TRAINING_PATTERN_DISABLE);
> }
>
> @@ -3600,9 +3634,18 @@ intel_dp_check_link_status(struct intel_dp *intel_dp)
> if (!drm_dp_channel_eq_ok(link_status, intel_dp->lane_count)) {
> DRM_DEBUG_KMS("%s: channel EQ not ok, retraining\n",
> intel_encoder->base.name);
> - intel_dp_start_link_train(intel_dp);
> - intel_dp_complete_link_train(intel_dp);
> - intel_dp_stop_link_train(intel_dp);
> + if (!intel_dp_start_link_train(intel_dp)) {
> + DRM_DEBUG_KMS("Start link training failed\n");
> + return;
> + }
> + if (!intel_dp_complete_link_train(intel_dp)) {
> + DRM_DEBUG_KMS("Complete link training failed\n");
> + return;
> + }
> + if (!intel_dp_stop_link_train(intel_dp)) {
> + DRM_DEBUG_KMS("Stop link training failed\n");
> + return;
> + }
> }
> }
>
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index d2ae54f..79876df 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -844,9 +844,10 @@ int intel_dp_set_config(struct drm_mode_set *set);
> void intel_dp_init(struct drm_device *dev, int output_reg, enum port port);
> bool intel_dp_init_connector(struct intel_digital_port *intel_dig_port,
> struct intel_connector *intel_connector);
> -void intel_dp_start_link_train(struct intel_dp *intel_dp);
> -void intel_dp_complete_link_train(struct intel_dp *intel_dp);
> -void intel_dp_stop_link_train(struct intel_dp *intel_dp);
> +bool intel_dp_start_link_train(struct intel_dp *intel_dp);
> +bool intel_dp_complete_link_train(struct intel_dp *intel_dp);
> +bool intel_dp_stop_link_train(struct intel_dp *intel_dp);
> +void intel_dp_set_idle_link_train(struct intel_dp *intel_dp);
> void intel_dp_sink_dpms(struct intel_dp *intel_dp, int mode);
> void intel_dp_encoder_destroy(struct drm_encoder *encoder);
> void intel_dp_check_link_status(struct intel_dp *intel_dp);
> --
> 1.9.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
--
Paulo Zanoni
More information about the Intel-gfx
mailing list