[Mesa-stable] [PATCH] clover: Check the return value of pipe_loader_probe() when probing for devices
Emil Velikov
emil.l.velikov at gmail.com
Sat Nov 28 08:32:48 PST 2015
On 28 November 2015 at 15:10, Francisco Jerez <currojerez at riseup.net> wrote:
> Emil Velikov <emil.l.velikov at gmail.com> writes:
>
>> Hi Francisco,
>>
>> In all due respect, I am surprised with the volume of justification
>> you put into some changes.
>>
> And apparently it wasn't enough? :P
>
>> On 28 November 2015 at 11:52, Francisco Jerez <currojerez at riseup.net> wrote:
>>> Francisco Jerez <currojerez at riseup.net> writes:
>>>
>>>> Tom Stellard <thomas.stellard at amd.com> writes:
>>>>
>>>>> When probing for devices, clover will call pipe_loader_probe() twice.
>>>>> The first time to retrieve the number of devices, and then second time
>>>>> to retrieve the device structures.
>>>>>
>>>>> We currently assume that the return value of both calls will be the
>>>>> same, but this will not be the case if a device happens to disappear
>>>>> between the two calls.
>>>>>
>>>>
>>>> Assuming that the two sequential calls will give the same result is bad,
>>>> but the likelihood of a device disappearing between the two calls is so
>>>> small it's unlikely to happen more than once in a million years. If
>> Err... have to love this estimate :-P With some recent chances that
>> probability increases ever so slightly, as we push the "can we find
>> the module, does it have the entry point" from .create_screen and
>> .configure further up to probe screen.
>>
>
> I was referring to the probability of a device actually disappearing
> From the system between the two calls.
>
>>>> this problem happens deterministically to you there's almost certainly a
>>>> more serious bug in pipe-loader that will cause it to return an
>>>> incorrect device count during the first call.
>>>>
>> Care to give an example. I'm not sure where/how this is possible.
>>
> The immediately following paragraph explained how and where it's
> possible: pipe_loader_sw_probe() returns one regardless of whether the
> software device is actually available.
>
That's why we have the "two stage" approach
1) Query the total count, allocate memory and 2) populate the memory
with information.
The alternative is to build a separate pipe-loader for each target,
which ... doesn't sound optimal. Neither is forcing every target to
have be with the same set of drivers.
>>>> Indeed, pipe_loader_sw_probe() seems to return 1 unconditionally if ndev
>>>> is zero, so it will give a different device count depending on ndev if
>>>> pipe_loader_sw_probe_null() happens to fail. Seems like a bug we don't
>>>> want to simply paper over.
>>>>
>> Can you elaborate what exactly the problem here ? If we ask "how many
>> we have" sure we can provide that [fixed] number. On the other hand,
>> if we want more information about it (them), there is no guarantee
>> that we'll succeed.
>
> The return value of pipe_loader_probe() is supposed to be the number of
> devices actually available on the system, it doesn't depend on the devs
> pointer or ndev parameter -- See the doxygen comment in pipe_loader.h
> for the expected behaviour and pipe_loader_drm_probe() for a correct
> implementation.
>
True, the documentation does say that. Yet the EGL Devices approach is
what I perceive as better (clearer, more intuitive) approach - if your
storage pointer is null return the total count. Otherwise upon
information retrieval cap up-to the specified device count.
Admittedly the EGL spec came long after the pipe-loader.
>>
>>>> Anyway I'm not saying that clover's assumption shouldn't be fixed too --
>>>> See comment below.
>>>>
>>>>> This patch removes this assumption and checks the return value of the
>>>>> second pipe_loader_probe() call to ensure it does not try to initialize
>>>>> devices that no longer exits.
>>>>>
>>>>> CC: <mesa-stable at lists.freedesktop.org>
>>>>> ---
>>>>> src/gallium/state_trackers/clover/core/platform.cpp | 5 +++--
>>>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/src/gallium/state_trackers/clover/core/platform.cpp b/src/gallium/state_trackers/clover/core/platform.cpp
>>>>> index 328b71c..689d692 100644
>>>>> --- a/src/gallium/state_trackers/clover/core/platform.cpp
>>>>> +++ b/src/gallium/state_trackers/clover/core/platform.cpp
>>>>> @@ -28,9 +28,10 @@ platform::platform() : adaptor_range(evals(), devs) {
>>>>> int n = pipe_loader_probe(NULL, 0);
>>>>> std::vector<pipe_loader_device *> ldevs(n);
>>>>>
>>>>> - pipe_loader_probe(&ldevs.front(), n);
>>>>> + n = pipe_loader_probe(&ldevs.front(), n);
>>>>>
>>>>> - for (pipe_loader_device *ldev : ldevs) {
>>>>> + for (int i = 0; i < n; ++i) {
>>>>> + pipe_loader_device *ldev = ldevs[i];
>>>>
>>>> There are many reasons to prefer range loops to induction variables
>>>> (including iterators), and there's no need to change it here: Because
>>>> ldevs is zero-initialized you can just wrap the push_back(create()) call
>>>> around an 'if (ldev)' to make this a one-liner.
>>>>
>>> Let me be more explicit about what could go wrong with this change in
>>> its current form: The return value of pipe_loader_probe() is the number
>>> of devices available on the system, so basically the same race condition
>>> is still possible if 'n' increases from the first call to the second,
>>> and in that case you'll end up reading uninitialized memory past the end
>>> of the array -- Worse than a null dereference. Range loops make this
>>> kind of situation impossible.
>>>
>> If I understood you correctly, you're saying that on the second call
>> can return count greater than the initial one.
>>
>> Errm... this looks like a bug in the pipe-loader. I don't think it's
>> normal to claim that X devices are available only to return
>> information for fewer devices. OK, calling it a bug might not be
>> politically correct, although it is a misleading design decision, at
>> the very least.
>>
> What I was saying is that if the rationale of this patch is correct and
> clover may not make any assumptions about the number of available
> devices remaining the same between calls to pipe_loader_probe(), this
> patch doesn't fix the problem at all, and in fact introduces a worse
> kind of bug (read from uninitialized memory). The alternative I
> suggested would be simpler and more robust, so it's worth changing
> regardless of whether or not it's actually likely that the device count
> will increase or decrease between calls.
>
So the difference boils down to what is/should be the behaviour of
pipe_loader_probe(). I take it that you're adamant that pipe-loader's
approach is superior, or there is a sliver of hope that one can sway
your view ?
Thanks
Emil
More information about the mesa-stable
mailing list