[Intel-gfx] [PATCH 4/9] drm/i915: Add check for corrupt raw EDID header for Displayport compliance testing

Fri Apr 10 07:44:20 PDT 2015

On 4/8/2015 3:37 PM, Paulo Zanoni wrote:
> 2015-04-08 18:43 GMT-03:00 Todd Previte<tprevite at gmail.com>:
>> On 4/8/2015 9:51 AM, Paulo Zanoni wrote:
>>> 2015-03-31 14:15 GMT-03:00 Todd Previte<tprevite at gmail.com>:
>>>> Displayport compliance test 4.2.2.6 requires that a source device be
>>>> capable of detecting
>>>> a corrupt EDID. To do this, the test sets up an invalid EDID header to be
>>>> read by the source
>>>> device. Unfortunately, the DRM EDID reading and parsing functions are
>>>> actually too good in
>>>> this case and prevent the source from reading the corrupted EDID. The
>>>> result is a failed
>>>> compliance test.
>>>>
>>>> In order to successfully pass the test, the raw EDID header must be
>>>> checked on each read
>>>> to see if has been "corrupted". If an invalid raw header is detected, a
>>>> flag is set that
>>>> allows the compliance testing code to acknowledge that fact and react
>>>> appropriately. The
>>>> flag is automatically cleared on read.
>>>>
>>>> This code is designed to expressly work for compliance testing without
>>>> disrupting normal
>>>> operations for EDID reading and parsing.
>>>>
>>>> Signed-off-by: Todd Previte<tprevite at gmail.com>
>>>> Cc:dri-devel at lists.freedesktop.org
>>>> ---
>>>>    drivers/gpu/drm/drm_edid.c       | 33 +++++++++++++++++++++++++++++++++
>>>>    drivers/gpu/drm/i915/intel_dp.c  | 17 +++++++++++++++++
>>>>    drivers/gpu/drm/i915/intel_drv.h |  1 +
>>>>    include/drm/drm_edid.h           |  5 +++++
>>>>    4 files changed, 56 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
>>>> index 53bc7a6..3d4f473 100644
>>>> --- a/drivers/gpu/drm/drm_edid.c
>>>> +++ b/drivers/gpu/drm/drm_edid.c
>>>> @@ -990,6 +990,32 @@ static const u8 edid_header[] = {
>>>>           0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00
>>>>    };
>>>>
>>>> +
>>>> +/* Flag for EDID corruption testing
>>>> + * Displayport Link CTS Core 1.2 rev1.1 - 4.2.2.6
>>>> + */
>>>> +static bool raw_edid_header_corrupted;
>>> A static variable like this is not a good design, especially for a
>>> module like drm.ko. If you really need this, please store it inside
>>> some struct. But see below first.
>> Per our discussion this morning, I concur. This has been removed in favor of
>> a different solution that uses a new boolean flag in the drm_connector
>> struct.
>>
>> Capturing more of the discussion here, the static boolean was a bad idea to
>> begin with and needed to be removed. One solution was to make the flag
>> non-static and non-clear-on-read, then add a separate clear() function. But
>> it still had the problem of potential misuse other places in the code. The
>> current solution (which will be posted with V5) modifies the is_valid()
>> function and adds a flag in the drm_connector struct that can be used to
>> detect this low-level header corruption.
>>
>>
>>>> +
>>>> +/**
>>>> + * drm_raw_edid_header_valid - check to see if the raw header is
>>>> + * corrupt or not. Used solely for Displayport compliance
>>>> + * testing and required by Link CTS Core 1.2 rev1.1 4.2.2.6.
>>>> + * @raw_edid: pointer to raw base EDID block
>>>> + *
>>>> + * Indicates whether the original EDID header as read from the
>>>> + * device was corrupt or not. Clears on read.
>>>> + *
>>>> + * Return: true if the raw header was corrupt, otherwise false
>>>> + */
>>>> +bool drm_raw_edid_header_corrupt(void)
>>>> +{
>>>> +       bool corrupted = raw_edid_header_corrupted;
>>>> +
>>>> +       raw_edid_header_corrupted = 0;
>>>> +       return corrupted;
>>>> +}
>>>> +EXPORT_SYMBOL(drm_raw_edid_header_corrupt);
>>>> +
>>>>    /**
>>>>     * drm_edid_header_is_valid - sanity check the header of the base EDID
>>>> block
>>>>     * @raw_edid: pointer to raw base EDID block
>>>> @@ -1006,6 +1032,13 @@ int drm_edid_header_is_valid(const u8 *raw_edid)
>>>>                   if (raw_edid[i] == edid_header[i])
>>>>                           score++;
>>>>
>>>> +       if (score != 8) {
>>>> +               /* Log and set flag here for EDID corruption testing
>>>> +                * Displayport Link CTS Core 1.2 rev1.1 - 4.2.2.6
>>>> +                */
>>>> +               DRM_DEBUG_DRIVER("Raw EDID header invalid\n");
>>>> +               raw_edid_header_corrupted = 1;
>>>> +       }
>>> The problem is that here we're limiting ourselves to just a bad edid
>>> header, not a bad edid in general, so there are many things which we
>>> might not get - such as a simple wrong checksum edid value. I remember
>>> that on the previous patch you calculated the whole checksum manually,
>>> but I don't see that code anymore. What was the reason for the change?
>> So this code is specifically for the 4.2.2.6 compliance test that is looking
>> for nothing more than an invalid EDID header.
> On the version of the spec I have (1.2 Core, Aug 22 2011), 4.2.2.6 is
> "EDID Corruption Detection", and it mentions "EDID corruption" without
> really getting into the details of header corruption. On the "Test
> procedure" description, it mentions "Reference Sink sets up EDID with
> incorrect checksum", which we don't check. Of course, changing the
> header may produce an incorrect checksum, but maybe the wrong header
> is just a particular detail of the compliance testing device you have,
> while others could potentially have other forms of corruption, such as
> just a bad checksum?
It could very well be particular this unit. So with a different test 
device, we might be able to get away with just checking the checksum. 
For this one, however, we don't appear to have that option. I added the 
checksum computation into the header fixup code just to make sure.

> In the paragraphs below you elaborate even more on the assumption of a
> bad header instead of just a bad checksum, so maybe we have different
> versions of the spec? (I still remember when I used version 1.0 of a
> certain non-backwards-compatible spec to review a patch made against
> version 0.8 of the same spec)
I do have a later version of the spec, but description of this test 
seems to be the same between the two.
>> In fact, the test unit only
>> sets that header as invalid once, so if you miss it on the first read, you
>> can't go back and check it again later - the test will now fail. So catching
>> the general case isn't really what this is about - it's about being able to
>> detect a corrupt EDID header even if it only happens once.
>>
>> Honestly, the DRM EDID code is VERY good about catching corruption cases and
>> in the case of corrupted headers, fixing them and moving on. I had to tie
>> into it at a fairly low level in order to catch the invalid header before
>> the code fixed it.
>>
>> With respect to the checksum code, for quite a while the checksum
>> computation was incorrect in the DRM code. Somewhere along in November of
>> last year or 2013 (I remember the month, not the year, go figure) someone
>> came along and added a checksum computation that actually worked. So that
>> rendered that original code I wrote unnecessary.
>>
>>> Also, while reviewing the patch I just discovered
>>> connector->bad_edid_counter. Can't we just use it instead of this
>>> patch? I mean: grab the current counter, check edid, see if the
>>> counter moved.
>> I think the above description highlights why using this counter really isn't
>> an option. Since the code only gets one shot at catching that invalid
>> header, it's essential to make sure it's captured specifically. Comparing
>> before and after values of this counter doesn't specifically say that the
>> header was invalid, only that SOMEthing in the EDID was invalid.
> Which is, according to the way I read the spec, not a problem.
I completely agree with you. Unfortunately, coding directly to the spec 
isn't enough in this case.

>
>>>>           return score;
>>>>    }
>>>>    EXPORT_SYMBOL(drm_edid_header_is_valid);
>>>> diff --git a/drivers/gpu/drm/i915/intel_dp.c
>>>> b/drivers/gpu/drm/i915/intel_dp.c
>>>> index dc87276..57f8e43 100644
>>>> --- a/drivers/gpu/drm/i915/intel_dp.c
>>>> +++ b/drivers/gpu/drm/i915/intel_dp.c
>>>> @@ -3824,6 +3824,9 @@ update_status:
>>>>                                      &response, 1);
>>>>           if (status <= 0)
>>>>                   DRM_DEBUG_KMS("Could not write test response to
>>>> sink\n");
>>>> +
>>>> +       /* Clear flag here, after testing is complete*/
>>>> +       intel_dp->compliance_edid_invalid = 0;
>>>>    }
>>>>
>>>>    static int
>>>> @@ -3896,6 +3899,10 @@ intel_dp_check_link_status(struct intel_dp
>>>> *intel_dp)
>>>>    {
>>>>           struct drm_device *dev = intel_dp_to_dev(intel_dp);
>>>>           struct intel_encoder *intel_encoder =
>>>> &dp_to_dig_port(intel_dp)->base;
>>>> +       struct drm_connector *connector =
>>>> &intel_dp->attached_connector->base;
>>>> +       struct i2c_adapter *adapter = &intel_dp->aux.ddc;
>>>> +       struct edid *edid_read = NULL;
>>>> +
>>>>           u8 sink_irq_vector;
>>>>           u8 link_status[DP_LINK_STATUS_SIZE];
>>>>
>>>> @@ -3912,6 +3919,16 @@ intel_dp_check_link_status(struct intel_dp
>>>> *intel_dp)
>>>>                   return;
>>>>           }
>>>>
>>>> +       /* Compliance testing requires an EDID read for all HPD events
>>>> +        * Link CTS Core 1.2 rev 1.1: Test 4.2.2.1
>>>> +        * Flag set here will be handled in the EDID test function
>>>> +        */
>>>> +       edid_read = drm_get_edid(connector, adapter);
>>>> +       if (!edid_read || drm_raw_edid_header_corrupt() == 1) {
>>>> +               DRM_DEBUG_DRIVER("EDID invalid, setting flag\n");
>>>> +               intel_dp->compliance_edid_invalid = 1;
>>>> +       }
>>> I see that on the next patch you also add a drm_get_edid() call, so we
>>> have apparently added 2 calls for the edid test. Do we really need
>>> both? Why is this one needed? Why is that one needed?
>> So there's two issues here - first is the same one mentioned above, catching
>> that single instance of a corrupted EDID header. The second is that the
>> checksum from the test device differs between the two reads. If you remove
>> either one of them, one test or the other will fail.
> But then why not keep both at the same place? The one here is going to
> affect a lot more than just compliance testing, while the other is
> contained to DP compliance code.
I was able to find a solution that removed the duplicate EDID read. I 
had to add a checksum storage variable in the intel_dp struct, but 
that's infinitely better than having another EDID read.

Unfortunately though, the one that has to say is in the 
check_link_status. There's just no way around it because of the test 
4.2.2.1 that requires it to happen for a hot plug event. There's no test 
request bit set for that, or any other indicator. It simply has to 
happen for every HPD plug event.

>>> Also, some more ideas:
>>>
>>> I also thought that we already automatically issued get_edid() calls
>>> on the normal hotplug code path, so it would be a "third" call on the
>>> codepath for the test. Can't we just rely on this one?
>> Same issue as above.
>>> Another idea would be: instead of getting the edid from inside the
>>> Kernel, we could try to get it from the user-space, using the
>>> GetResources/GetConnector IOCTLs, and also maybe look at the EDID
>>> properties to possibly validate the EDID (in case that edid did not
>>> get "fixed" by the Kernel). The nice thing about this is that it would
>>> make the test be more like a real driver usage. Do you see any
>>> possible problems with this approach?
>> I don't really see this as a valid option in light of the descriptions I've
>> given above. This has a good chance of introducing latency problems which
>> may adversely affect the tests as well.
> We have 5 seconds, that's way more than enough.
The test has a 5 second timeout for the entire operation. I'm less 
concerned with timing out and more concerned about not being able to 
catch things fast enough or react fast enough to parameter or value 
changes. It may or may not be an issue for processing the EDID (I'd lean 
more towards the not case) but it's something that has to be kept in 
mind here, as this has caused problems in the past when building out the 
test interfaces.

In any case, this sounds like this is a suggestion rather than a 
blocking issue. My main concern with moving all this stuff into 
userspace is that it's moving towards building a Displayport-compliant 
user app versus a Displayport-compliant driver. But this is something 
that I can look into sometime down the road.

>>>> +
>>>>           /* Try to read the source of the interrupt */
>>>>           if (intel_dp->dpcd[DP_DPCD_REV] >= 0x11 &&
>>>>               intel_dp_get_sink_irq(intel_dp, &sink_irq_vector)) {
>>>> diff --git a/drivers/gpu/drm/i915/intel_drv.h
>>>> b/drivers/gpu/drm/i915/intel_drv.h
>>>> index e7b62be..42e4251 100644
>>>> --- a/drivers/gpu/drm/i915/intel_drv.h
>>>> +++ b/drivers/gpu/drm/i915/intel_drv.h
>>>> @@ -651,6 +651,7 @@ struct intel_dp {
>>>>           /* Displayport compliance testing */
>>>>           unsigned long compliance_test_type;
>>>>           bool compliance_testing_active;
>>>> +       bool compliance_edid_invalid;
>>>>    };
>>>>
>>>>    struct intel_digital_port {
>>>> diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h
>>>> index 87d85e8..8a7eb22 100644
>>>> --- a/include/drm/drm_edid.h
>>>> +++ b/include/drm/drm_edid.h
>>>> @@ -388,4 +388,9 @@ struct edid *drm_do_get_edid(struct drm_connector
>>>> *connector,
>>>>                                 size_t len),
>>>>           void *data);
>>>>
>>>> +/* Check for corruption in raw EDID header - Displayport compliance
>>>> +  * Displayport Link CTS Core 1.2 rev1.1 - 4.2.2.6
>>>> + */
>>>> +bool drm_raw_edid_header_corrupt(void);
>>>> +
>>>>    #endif /* __DRM_EDID_H__ */
>>>> --
>>>> 1.9.1
>>>>
>>>> _______________________________________________
>>>> Intel-gfx mailing list
>>>> Intel-gfx at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>>>
>