[Mesa-dev] [PATCH v3 10/25] panfrost: Make sure the BO is 'ready' when picked from the cache

Thu Sep 5 22:21:10 UTC 2019

> Will document that.
+1

> Evict won't help here as memory will only be released after the jobs
> are done using it. And madvise doesn't help either, for the same reason.

Ah-ha, I understand the distinction; thank you.

> The behavior hasn't changed regarding allocation failures: it's still
> an assert(), so the code is not more or less buggy than it was :p. What
> happens when assert()s are disabled? probably a segfault because of a
> NULL pointer dereference. So, adding the fprintf() is probably a good
> idea as a first step, and then we can see if we can handle the OOM case
> gracefully.

Haha, that's reasonable. I'm wondering if we should try some assert-less
stress test but maybe that doesn't matter until "productization".

> > In short, I'm not convinced this algorithm (specifically the last step)
> > is ideal.
> 
> It really depends on how robust you want to be when the system is under
> memory pressure vs how long you accept to wait. Note that, in the worst
> case scenario we wouldn't wait more than we currently do, as having each
> batch wait on BOs of the previous batch is just like the serialization
> we had in panfrost_flush(). I don't see it as a huge problem, but maybe
> I'm wrong.

Ya, I don't know; these seem like hard problems to say the least :-(

> > If there is no memory left for us, is it responsible to continue at all?
> 
> It's not exactly no memory, it's no immediately available memory.

'fraid I don't know enough about Linux allocators to grok the
distinction.

> > Should we just fail the allocation after step 2, and if the caller has a
> > problem with that, it's their issue? Or we abort here after step 2?
> 
> I think that one is a separate issue. I mean, that's something we have
> to handle even if we go through step 3 and step 3 fails. 
>
> > I
> > don't like the robustness implications but low memory behaviour is a
> > risky subject as it is; I don't want to add more unknowns into it --
> > aborting it with an assert(0) is something we can recognize immediately.
> > Strange crashes in random places with no explanation, less so.
> 
> And that hasn't changed. We still have an assert after step 3.

OK, let's hold off until after this series is merged :)