[Mesa-dev] [Mesa-stable] [PATCH] clover: Check the return value of pipe_loader_probe() when probing for devices
Francisco Jerez
currojerez at riseup.net
Sat Nov 28 09:06:59 PST 2015
Emil Velikov <emil.l.velikov at gmail.com> writes:
> On 28 November 2015 at 15:10, Francisco Jerez <currojerez at riseup.net> wrote:
>> Emil Velikov <emil.l.velikov at gmail.com> writes:
>>
>>> Hi Francisco,
>>>
>>> In all due respect, I am surprised with the volume of justification
>>> you put into some changes.
>>>
>> And apparently it wasn't enough? :P
>>
>>> On 28 November 2015 at 11:52, Francisco Jerez <currojerez at riseup.net> wrote:
>>>> Francisco Jerez <currojerez at riseup.net> writes:
>>>>
>>>>> Tom Stellard <thomas.stellard at amd.com> writes:
>>>>>
>>>>>> When probing for devices, clover will call pipe_loader_probe() twice.
>>>>>> The first time to retrieve the number of devices, and then second time
>>>>>> to retrieve the device structures.
>>>>>>
>>>>>> We currently assume that the return value of both calls will be the
>>>>>> same, but this will not be the case if a device happens to disappear
>>>>>> between the two calls.
>>>>>>
>>>>>
>>>>> Assuming that the two sequential calls will give the same result is bad,
>>>>> but the likelihood of a device disappearing between the two calls is so
>>>>> small it's unlikely to happen more than once in a million years. If
>>> Err... have to love this estimate :-P With some recent chances that
>>> probability increases ever so slightly, as we push the "can we find
>>> the module, does it have the entry point" from .create_screen and
>>> .configure further up to probe screen.
>>>
>>
>> I was referring to the probability of a device actually disappearing
>> From the system between the two calls.
>>
>>>>> this problem happens deterministically to you there's almost certainly a
>>>>> more serious bug in pipe-loader that will cause it to return an
>>>>> incorrect device count during the first call.
>>>>>
>>> Care to give an example. I'm not sure where/how this is possible.
>>>
>> The immediately following paragraph explained how and where it's
>> possible: pipe_loader_sw_probe() returns one regardless of whether the
>> software device is actually available.
>>
> That's why we have the "two stage" approach
> 1) Query the total count, allocate memory and 2) populate the memory
> with information.
>
> The alternative is to build a separate pipe-loader for each target,
> which ... doesn't sound optimal. Neither is forcing every target to
> have be with the same set of drivers.
>
>>>>> Indeed, pipe_loader_sw_probe() seems to return 1 unconditionally if ndev
>>>>> is zero, so it will give a different device count depending on ndev if
>>>>> pipe_loader_sw_probe_null() happens to fail. Seems like a bug we don't
>>>>> want to simply paper over.
>>>>>
>>> Can you elaborate what exactly the problem here ? If we ask "how many
>>> we have" sure we can provide that [fixed] number. On the other hand,
>>> if we want more information about it (them), there is no guarantee
>>> that we'll succeed.
>>
>> The return value of pipe_loader_probe() is supposed to be the number of
>> devices actually available on the system, it doesn't depend on the devs
>> pointer or ndev parameter -- See the doxygen comment in pipe_loader.h
>> for the expected behaviour and pipe_loader_drm_probe() for a correct
>> implementation.
>>
> True, the documentation does say that. Yet the EGL Devices approach is
> what I perceive as better (clearer, more intuitive) approach - if your
> storage pointer is null return the total count. Otherwise upon
> information retrieval cap up-to the specified device count.
>
That's a valid yet somewhat inconsistent alternative approach. In
pipe-loader the return value of pipe_loader_probe() always has the same
semantics. I don't think I see why you consider it more intuitive for
the return value to have inconsistent semantics depending on what the
function arguments are, but you're free to propose such a change as a
separate series as long as you fix the documentation, all users and
back-ends of pipe-loader, if you consider it worth doing -- I personally
don't.
> Admittedly the EGL spec came long after the pipe-loader.
>
>>>
>>>>> Anyway I'm not saying that clover's assumption shouldn't be fixed too --
>>>>> See comment below.
>>>>>
>>>>>> This patch removes this assumption and checks the return value of the
>>>>>> second pipe_loader_probe() call to ensure it does not try to initialize
>>>>>> devices that no longer exits.
>>>>>>
>>>>>> CC: <mesa-stable at lists.freedesktop.org>
>>>>>> ---
>>>>>> src/gallium/state_trackers/clover/core/platform.cpp | 5 +++--
>>>>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/src/gallium/state_trackers/clover/core/platform.cpp b/src/gallium/state_trackers/clover/core/platform.cpp
>>>>>> index 328b71c..689d692 100644
>>>>>> --- a/src/gallium/state_trackers/clover/core/platform.cpp
>>>>>> +++ b/src/gallium/state_trackers/clover/core/platform.cpp
>>>>>> @@ -28,9 +28,10 @@ platform::platform() : adaptor_range(evals(), devs) {
>>>>>> int n = pipe_loader_probe(NULL, 0);
>>>>>> std::vector<pipe_loader_device *> ldevs(n);
>>>>>>
>>>>>> - pipe_loader_probe(&ldevs.front(), n);
>>>>>> + n = pipe_loader_probe(&ldevs.front(), n);
>>>>>>
>>>>>> - for (pipe_loader_device *ldev : ldevs) {
>>>>>> + for (int i = 0; i < n; ++i) {
>>>>>> + pipe_loader_device *ldev = ldevs[i];
>>>>>
>>>>> There are many reasons to prefer range loops to induction variables
>>>>> (including iterators), and there's no need to change it here: Because
>>>>> ldevs is zero-initialized you can just wrap the push_back(create()) call
>>>>> around an 'if (ldev)' to make this a one-liner.
>>>>>
>>>> Let me be more explicit about what could go wrong with this change in
>>>> its current form: The return value of pipe_loader_probe() is the number
>>>> of devices available on the system, so basically the same race condition
>>>> is still possible if 'n' increases from the first call to the second,
>>>> and in that case you'll end up reading uninitialized memory past the end
>>>> of the array -- Worse than a null dereference. Range loops make this
>>>> kind of situation impossible.
>>>>
>>> If I understood you correctly, you're saying that on the second call
>>> can return count greater than the initial one.
>>>
>>> Errm... this looks like a bug in the pipe-loader. I don't think it's
>>> normal to claim that X devices are available only to return
>>> information for fewer devices. OK, calling it a bug might not be
>>> politically correct, although it is a misleading design decision, at
>>> the very least.
>>>
>> What I was saying is that if the rationale of this patch is correct and
>> clover may not make any assumptions about the number of available
>> devices remaining the same between calls to pipe_loader_probe(), this
>> patch doesn't fix the problem at all, and in fact introduces a worse
>> kind of bug (read from uninitialized memory). The alternative I
>> suggested would be simpler and more robust, so it's worth changing
>> regardless of whether or not it's actually likely that the device count
>> will increase or decrease between calls.
>>
> So the difference boils down to what is/should be the behaviour of
> pipe_loader_probe(). I take it that you're adamant that pipe-loader's
> approach is superior, or there is a sliver of hope that one can sway
> your view ?
>
I don't think it boils down to that. I consider the current idiom used
by pipe_loader_probe() satisfactory, but that doesn't mean I'm against
fixing this assumption in Clover in addition, since Tom is right that
there currently is a theoretical race condition, regardless of what the
semantics of the return value of pipe_loader_probe() are -- Even though
the race condition is extremely unlikely with the currently documented
semantics (as implemented by the DRM back-end). That's why I didn't NAK
the patch, I simply asked for an improvement.
> Thanks
> Emil
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151128/d427c429/attachment.sig>
More information about the mesa-dev
mailing list