[Intel-gfx] [PATCH 4/9] drm/i915: Add check for corrupt raw EDID header for Displayport compliance testing

Wed Apr 8 15:37:59 PDT 2015

2015-04-08 18:43 GMT-03:00 Todd Previte <tprevite at gmail.com>:
>
>
> On 4/8/2015 9:51 AM, Paulo Zanoni wrote:
>>
>> 2015-03-31 14:15 GMT-03:00 Todd Previte <tprevite at gmail.com>:
>>>
>>> Displayport compliance test 4.2.2.6 requires that a source device be
>>> capable of detecting
>>> a corrupt EDID. To do this, the test sets up an invalid EDID header to be
>>> read by the source
>>> device. Unfortunately, the DRM EDID reading and parsing functions are
>>> actually too good in
>>> this case and prevent the source from reading the corrupted EDID. The
>>> result is a failed
>>> compliance test.
>>>
>>> In order to successfully pass the test, the raw EDID header must be
>>> checked on each read
>>> to see if has been "corrupted". If an invalid raw header is detected, a
>>> flag is set that
>>> allows the compliance testing code to acknowledge that fact and react
>>> appropriately. The
>>> flag is automatically cleared on read.
>>>
>>> This code is designed to expressly work for compliance testing without
>>> disrupting normal
>>> operations for EDID reading and parsing.
>>>
>>> Signed-off-by: Todd Previte <tprevite at gmail.com>
>>> Cc: dri-devel at lists.freedesktop.org
>>> ---
>>>   drivers/gpu/drm/drm_edid.c       | 33 +++++++++++++++++++++++++++++++++
>>>   drivers/gpu/drm/i915/intel_dp.c  | 17 +++++++++++++++++
>>>   drivers/gpu/drm/i915/intel_drv.h |  1 +
>>>   include/drm/drm_edid.h           |  5 +++++
>>>   4 files changed, 56 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
>>> index 53bc7a6..3d4f473 100644
>>> --- a/drivers/gpu/drm/drm_edid.c
>>> +++ b/drivers/gpu/drm/drm_edid.c
>>> @@ -990,6 +990,32 @@ static const u8 edid_header[] = {
>>>          0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00
>>>   };
>>>
>>> +
>>> +/* Flag for EDID corruption testing
>>> + * Displayport Link CTS Core 1.2 rev1.1 - 4.2.2.6
>>> + */
>>> +static bool raw_edid_header_corrupted;
>>
>> A static variable like this is not a good design, especially for a
>> module like drm.ko. If you really need this, please store it inside
>> some struct. But see below first.
>
> Per our discussion this morning, I concur. This has been removed in favor of
> a different solution that uses a new boolean flag in the drm_connector
> struct.
>
> Capturing more of the discussion here, the static boolean was a bad idea to
> begin with and needed to be removed. One solution was to make the flag
> non-static and non-clear-on-read, then add a separate clear() function. But
> it still had the problem of potential misuse other places in the code. The
> current solution (which will be posted with V5) modifies the is_valid()
> function and adds a flag in the drm_connector struct that can be used to
> detect this low-level header corruption.
>
>
>>
>>> +
>>> +/**
>>> + * drm_raw_edid_header_valid - check to see if the raw header is
>>> + * corrupt or not. Used solely for Displayport compliance
>>> + * testing and required by Link CTS Core 1.2 rev1.1 4.2.2.6.
>>> + * @raw_edid: pointer to raw base EDID block
>>> + *
>>> + * Indicates whether the original EDID header as read from the
>>> + * device was corrupt or not. Clears on read.
>>> + *
>>> + * Return: true if the raw header was corrupt, otherwise false
>>> + */
>>> +bool drm_raw_edid_header_corrupt(void)
>>> +{
>>> +       bool corrupted = raw_edid_header_corrupted;
>>> +
>>> +       raw_edid_header_corrupted = 0;
>>> +       return corrupted;
>>> +}
>>> +EXPORT_SYMBOL(drm_raw_edid_header_corrupt);
>>> +
>>>   /**
>>>    * drm_edid_header_is_valid - sanity check the header of the base EDID
>>> block
>>>    * @raw_edid: pointer to raw base EDID block
>>> @@ -1006,6 +1032,13 @@ int drm_edid_header_is_valid(const u8 *raw_edid)
>>>                  if (raw_edid[i] == edid_header[i])
>>>                          score++;
>>>
>>> +       if (score != 8) {
>>> +               /* Log and set flag here for EDID corruption testing
>>> +                * Displayport Link CTS Core 1.2 rev1.1 - 4.2.2.6
>>> +                */
>>> +               DRM_DEBUG_DRIVER("Raw EDID header invalid\n");
>>> +               raw_edid_header_corrupted = 1;
>>> +       }
>>
>> The problem is that here we're limiting ourselves to just a bad edid
>> header, not a bad edid in general, so there are many things which we
>> might not get - such as a simple wrong checksum edid value. I remember
>> that on the previous patch you calculated the whole checksum manually,
>> but I don't see that code anymore. What was the reason for the change?
>
> So this code is specifically for the 4.2.2.6 compliance test that is looking
> for nothing more than an invalid EDID header.

On the version of the spec I have (1.2 Core, Aug 22 2011), 4.2.2.6 is
"EDID Corruption Detection", and it mentions "EDID corruption" without
really getting into the details of header corruption. On the "Test
procedure" description, it mentions "Reference Sink sets up EDID with
incorrect checksum", which we don't check. Of course, changing the
header may produce an incorrect checksum, but maybe the wrong header
is just a particular detail of the compliance testing device you have,
while others could potentially have other forms of corruption, such as
just a bad checksum?

In the paragraphs below you elaborate even more on the assumption of a
bad header instead of just a bad checksum, so maybe we have different
versions of the spec? (I still remember when I used version 1.0 of a
certain non-backwards-compatible spec to review a patch made against
version 0.8 of the same spec)

> In fact, the test unit only
> sets that header as invalid once, so if you miss it on the first read, you
> can't go back and check it again later - the test will now fail. So catching
> the general case isn't really what this is about - it's about being able to
> detect a corrupt EDID header even if it only happens once.
>
> Honestly, the DRM EDID code is VERY good about catching corruption cases and
> in the case of corrupted headers, fixing them and moving on. I had to tie
> into it at a fairly low level in order to catch the invalid header before
> the code fixed it.
>
> With respect to the checksum code, for quite a while the checksum
> computation was incorrect in the DRM code. Somewhere along in November of
> last year or 2013 (I remember the month, not the year, go figure) someone
> came along and added a checksum computation that actually worked. So that
> rendered that original code I wrote unnecessary.
>
>> Also, while reviewing the patch I just discovered
>> connector->bad_edid_counter. Can't we just use it instead of this
>> patch? I mean: grab the current counter, check edid, see if the
>> counter moved.
>
> I think the above description highlights why using this counter really isn't
> an option. Since the code only gets one shot at catching that invalid
> header, it's essential to make sure it's captured specifically. Comparing
> before and after values of this counter doesn't specifically say that the
> header was invalid, only that SOMEthing in the EDID was invalid.

Which is, according to the way I read the spec, not a problem.

>
>>>          return score;
>>>   }
>>>   EXPORT_SYMBOL(drm_edid_header_is_valid);
>>> diff --git a/drivers/gpu/drm/i915/intel_dp.c
>>> b/drivers/gpu/drm/i915/intel_dp.c
>>> index dc87276..57f8e43 100644
>>> --- a/drivers/gpu/drm/i915/intel_dp.c
>>> +++ b/drivers/gpu/drm/i915/intel_dp.c
>>> @@ -3824,6 +3824,9 @@ update_status:
>>>                                     &response, 1);
>>>          if (status <= 0)
>>>                  DRM_DEBUG_KMS("Could not write test response to
>>> sink\n");
>>> +
>>> +       /* Clear flag here, after testing is complete*/
>>> +       intel_dp->compliance_edid_invalid = 0;
>>>   }
>>>
>>>   static int
>>> @@ -3896,6 +3899,10 @@ intel_dp_check_link_status(struct intel_dp
>>> *intel_dp)
>>>   {
>>>          struct drm_device *dev = intel_dp_to_dev(intel_dp);
>>>          struct intel_encoder *intel_encoder =
>>> &dp_to_dig_port(intel_dp)->base;
>>> +       struct drm_connector *connector =
>>> &intel_dp->attached_connector->base;
>>> +       struct i2c_adapter *adapter = &intel_dp->aux.ddc;
>>> +       struct edid *edid_read = NULL;
>>> +
>>>          u8 sink_irq_vector;
>>>          u8 link_status[DP_LINK_STATUS_SIZE];
>>>
>>> @@ -3912,6 +3919,16 @@ intel_dp_check_link_status(struct intel_dp
>>> *intel_dp)
>>>                  return;
>>>          }
>>>
>>> +       /* Compliance testing requires an EDID read for all HPD events
>>> +        * Link CTS Core 1.2 rev 1.1: Test 4.2.2.1
>>> +        * Flag set here will be handled in the EDID test function
>>> +        */
>>> +       edid_read = drm_get_edid(connector, adapter);
>>> +       if (!edid_read || drm_raw_edid_header_corrupt() == 1) {
>>> +               DRM_DEBUG_DRIVER("EDID invalid, setting flag\n");
>>> +               intel_dp->compliance_edid_invalid = 1;
>>> +       }
>>
>> I see that on the next patch you also add a drm_get_edid() call, so we
>> have apparently added 2 calls for the edid test. Do we really need
>> both? Why is this one needed? Why is that one needed?
>
> So there's two issues here - first is the same one mentioned above, catching
> that single instance of a corrupted EDID header. The second is that the
> checksum from the test device differs between the two reads. If you remove
> either one of them, one test or the other will fail.

But then why not keep both at the same place? The one here is going to
affect a lot more than just compliance testing, while the other is
contained to DP compliance code.

>
>> Also, some more ideas:
>>
>> I also thought that we already automatically issued get_edid() calls
>> on the normal hotplug code path, so it would be a "third" call on the
>> codepath for the test. Can't we just rely on this one?
>
> Same issue as above.
>>
>>
>> Another idea would be: instead of getting the edid from inside the
>> Kernel, we could try to get it from the user-space, using the
>> GetResources/GetConnector IOCTLs, and also maybe look at the EDID
>> properties to possibly validate the EDID (in case that edid did not
>> get "fixed" by the Kernel). The nice thing about this is that it would
>> make the test be more like a real driver usage. Do you see any
>> possible problems with this approach?
>
> I don't really see this as a valid option in light of the descriptions I've
> given above. This has a good chance of introducing latency problems which
> may adversely affect the tests as well.

We have 5 seconds, that's way more than enough.

>
>
>>> +
>>>          /* Try to read the source of the interrupt */
>>>          if (intel_dp->dpcd[DP_DPCD_REV] >= 0x11 &&
>>>              intel_dp_get_sink_irq(intel_dp, &sink_irq_vector)) {
>>> diff --git a/drivers/gpu/drm/i915/intel_drv.h
>>> b/drivers/gpu/drm/i915/intel_drv.h
>>> index e7b62be..42e4251 100644
>>> --- a/drivers/gpu/drm/i915/intel_drv.h
>>> +++ b/drivers/gpu/drm/i915/intel_drv.h
>>> @@ -651,6 +651,7 @@ struct intel_dp {
>>>          /* Displayport compliance testing */
>>>          unsigned long compliance_test_type;
>>>          bool compliance_testing_active;
>>> +       bool compliance_edid_invalid;
>>>   };
>>>
>>>   struct intel_digital_port {
>>> diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h
>>> index 87d85e8..8a7eb22 100644
>>> --- a/include/drm/drm_edid.h
>>> +++ b/include/drm/drm_edid.h
>>> @@ -388,4 +388,9 @@ struct edid *drm_do_get_edid(struct drm_connector
>>> *connector,
>>>                                size_t len),
>>>          void *data);
>>>
>>> +/* Check for corruption in raw EDID header - Displayport compliance
>>> +  * Displayport Link CTS Core 1.2 rev1.1 - 4.2.2.6
>>> + */
>>> +bool drm_raw_edid_header_corrupt(void);
>>> +
>>>   #endif /* __DRM_EDID_H__ */
>>> --
>>> 1.9.1
>>>
>>> _______________________________________________
>>> Intel-gfx mailing list
>>> Intel-gfx at lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>
>>
>>
>

-- 
Paulo Zanoni