[PATCH 1/1] [RFC] drm/ttm: Don't init dma32_zone on 64-bit systems

Thomas Hellstrom thellstrom at vmware.com
Wed Feb 20 09:01:46 UTC 2019


On Wed, 2019-02-20 at 08:35 +0000, Koenig, Christian wrote:
> Am 20.02.19 um 09:14 schrieb Thomas Hellstrom:
> > On 2/20/19 9:07 AM, Christian König wrote:
> > > Am 20.02.19 um 07:41 schrieb Thomas Hellstrom:
> > > > On Tue, 2019-02-19 at 17:06 +0000, Kuehling, Felix wrote:
> > > > > On 2019-02-18 3:39 p.m., Thomas Hellstrom wrote:
> > > > > > On Mon, 2019-02-18 at 18:07 +0100, Christian König wrote:
> > > > > > > Am 18.02.19 um 10:47 schrieb Thomas Hellstrom:
> > > > > > > > On Mon, 2019-02-18 at 09:20 +0000, Koenig, Christian
> > > > > > > > wrote:
> > > > > > > > > Another good question is also why the heck the
> > > > > > > > > acc_size
> > > > > > > > > counts
> > > > > > > > > towards
> > > > > > > > > the DMA32 zone?
> > > > > > > > DMA32 TTM pages are accounted in the DMA32 zone. Other
> > > > > > > > pages
> > > > > > > > are
> > > > > > > > not.
> > > > > > > Yeah, I'm perfectly aware of this. But this is for the
> > > > > > > accounting
> > > > > > > size!
> > > > > > > 
> > > > > > > We have an accounting for the stuff needed additional to
> > > > > > > the
> > > > > > > pages
> > > > > > > backing the BO (e.g. the page and DMA addr array).
> > > > > > > 
> > > > > > > And from the bug description it sounds like we use the
> > > > > > > DMA32 zone
> > > > > > > for
> > > > > > > this accounting which of course is completely nonsense.
> > > > > > It's actually accounted in all available zones, since it
> > > > > > would be
> > > > > > pretty hard to determine exactly where that memory should
> > > > > > be
> > > > > > accounted.
> > > > > > In particular if it's vmalloced. It might be DMA32, it
> > > > > > might not.
> > > > > > Given
> > > > > > the objective of stopping malicious user-space from
> > > > > > exhausting the
> > > > > > DMA32 zone it was, at the time the code was written, a
> > > > > > reasonable
> > > > > > approximation. With ever increasing memory sizes, there
> > > > > > might be
> > > > > > better
> > > > > > solutions?
> > > > > As far as I can see, in TTM, ttm_mem_global_alloc is only
> > > > > used for
> > > > > the
> > > > > acc_size in ttm_bo_init_reserved. Other than that vmwgfx also
> > > > > seems
> > > > > to
> > > > > use it to account for a few things that are allocated with
> > > > > kmalloc.
> > > > > 
> > > > > So would a better solution be to change ttm_mem_global_alloc
> > > > > to use
> > > > > only
> > > > > the kernel zone?
> > > > > 
> > > > IMO we need to determine what functionality to keep and then
> > > > the best
> > > > solution. The current code does its job, but is obviously too
> > > > restrictive. Both of the solutions you suggest open up for
> > > > potential
> > > > DOS attacks (DMA32 and kernel zones are not mutually exclusive.
> > > > They
> > > > overlap).
> > > 
> > > Yeah and exactly because of this it doesn't make to much sense
> > > to 
> > > account the housekeeping stuff in both zones.
> > 
> > IMO it makes sense because (given the assumption that
> > kmalloc/vmalloc 
> > doesn't care) , accounting only in DMA32 in low memory
> > configurations 
> > will not guarantee that kernel is not exhausted. Accounting only
> > in 
> > kernel in huge memory configurations will not guarantee that DMA32
> > is 
> > not exhausted.
> 
> DMA32 is not exhausted by TTMs kmalloc/vmalloc because otherwise any 
> other kmalloc/vmalloc could exhaust it as well.
> 
> I mean its probably 20+ years ago that I last looked at stuff in
> this 
> part of the kernel, but I don't think we have ever changed the
> handling 
> of DMA/DMA32 or otherwise we would be running into massive
> exhaustion 
> problems already on modern systems.
> 
> > > IIRC the kernel takes perfectly care by itself that kmalloced
> > > memory 
> > > doesn't eat up everything in the DMA32 zone.
> > 
> > If we can somehow verify this, or at least illustrate that it's 
> > likely, I'm perfectly fine with either patch from the vmwgfx POW.
> 
> Well I actually think that the whole accounting of housekeeping
> towards 
> the memory zones is completely pointless in the first place. Even if
> we 
> swap things out we still have the same BO and TTM structures around
> anyway.

Well, the whole point of this is to stop malicious user-space apps from
exhausting memory, causing a DOS vulnerability, similaraly for example
to how processes are stopped from creating to many open file
descriptors or other kernel structures. IIRC, for example, you used to
be able to call gem open in an infinite loop with the same bo until you
brought a random big process down using the OOM killer. With vmwgfx you
can't do that. You'd end up swapping out all available BOs and then
you'd get an error.

> 
> What we should rather to is to make sure that all processes using a
> BO 
> have their RSS increased because they do so. So when a process tries
> to 
> exhaust a memory domain it is simply killed by the OOM killer.

Yes, if there is a way to reliably do that and also take care of other
kernel structures that are created by user-space, like in the example
above that indeed sounds like a better solution.

/Thomas



> 
> Regards,
> Christian.
> 
> > Thanks,
> > Thomas
> > 
> > 
> > 
> > > Christian.
> > > 
> > > > 
> > > > /Thomas
> > > > 
> > > > 
> > > > 
> > > > 
> > > > > Regards,
> > > > >     Felix
> > > > > 
> > > > > 
> > > > > > /Thomas
> > > > > > 
> > > > > > > Christian.
> > > > > > > 
> > > > > > > > For small persistent allocations using
> > > > > > > > ttm_mem_global_alloc(),
> > > > > > > > they
> > > > > > > > are
> > > > > > > > accounted also in the DMA32 zone, which may cause over-
> > > > > > > > accounting
> > > > > > > > of
> > > > > > > > that zone, but that's pretty unlikely to be a big
> > > > > > > > problem..
> > > > > > > > 
> > > > > > > > /Thomas
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > In other words why should the internal bookkeeping
> > > > > > > > > pages be
> > > > > > > > > allocated
> > > > > > > > > in
> > > > > > > > > the DMA32 zone?
> > > > > > > > > 
> > > > > > > > > That doesn't sounds valid to me in any way,
> > > > > > > > > Christian.
> > > > > > > > > 
> > > > > > > > > Am 18.02.19 um 09:02 schrieb Thomas Hellstrom:
> > > > > > > > > > Hmm,
> > > > > > > > > > 
> > > > > > > > > > This zone was intended to stop TTM page allocations
> > > > > > > > > > from
> > > > > > > > > > exhausting
> > > > > > > > > > the DMA32 zone. IIRC dma_alloc_coherent() uses
> > > > > > > > > > DMA32 by
> > > > > > > > > > default,
> > > > > > > > > > which
> > > > > > > > > > means if we drop this check, other devices may stop
> > > > > > > > > > functioning
> > > > > > > > > > unexpectedly?
> > > > > > > > > > 
> > > > > > > > > > However, in the end I'd expect the kernel page
> > > > > > > > > > allocation
> > > > > > > > > > system
> > > > > > > > > > to
> > > > > > > > > > make sure there are some pages left in the DMA32
> > > > > > > > > > zone,
> > > > > > > > > > otherwise
> > > > > > > > > > random non-IO page allocations would also
> > > > > > > > > > potentially
> > > > > > > > > > exhaust
> > > > > > > > > > the
> > > > > > > > > > DMA32 zone without anybody caring, which means
> > > > > > > > > > removing
> > > > > > > > > > this
> > > > > > > > > > zone
> > > > > > > > > > wouldn't be any worse than whatever other
> > > > > > > > > > subsystems may be
> > > > > > > > > > doing
> > > > > > > > > > already...
> > > > > > > > > > 
> > > > > > > > > > /Thomas
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > On 2/16/19 12:02 AM, Kuehling, Felix wrote:
> > > > > > > > > > > This is an RFC. I'm not sure this is the right
> > > > > > > > > > > solution,
> > > > > > > > > > > but
> > > > > > > > > > > it
> > > > > > > > > > > highlights the problem I'm trying to solve.
> > > > > > > > > > > 
> > > > > > > > > > > The dma32_zone limits the acc_size of all
> > > > > > > > > > > allocated BOs
> > > > > > > > > > > to
> > > > > > > > > > > 2GB.
> > > > > > > > > > > On a
> > > > > > > > > > > 64-bit system with hundreds of GB of system
> > > > > > > > > > > memory and
> > > > > > > > > > > GPU
> > > > > > > > > > > memory,
> > > > > > > > > > > this can become a bottle neck. We're seeing TTM
> > > > > > > > > > > memory
> > > > > > > > > > > allocation
> > > > > > > > > > > failures not because we're truly out of memory,
> > > > > > > > > > > but
> > > > > > > > > > > because
> > > > > > > > > > > we're
> > > > > > > > > > > out of space in the dma32_zone for the acc_size
> > > > > > > > > > > needed
> > > > > > > > > > > for
> > > > > > > > > > > our BO
> > > > > > > > > > > book-keeping.
> > > > > > > > > > > 
> > > > > > > > > > > Signed-off-by: Felix Kuehling <
> > > > > > > > > > > Felix.Kuehling at amd.com>
> > > > > > > > > > > CC: thellstrom at vmware.com
> > > > > > > > > > > CC: christian.koenig at amd.com
> > > > > > > > > > > ---
> > > > > > > > > > >      drivers/gpu/drm/ttm/ttm_memory.c | 4 ++--
> > > > > > > > > > >      1 file changed, 2 insertions(+), 2
> > > > > > > > > > > deletions(-)
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_memory.c
> > > > > > > > > > > b/drivers/gpu/drm/ttm/ttm_memory.c
> > > > > > > > > > > index f1567c3..bb05365 100644
> > > > > > > > > > > --- a/drivers/gpu/drm/ttm/ttm_memory.c
> > > > > > > > > > > +++ b/drivers/gpu/drm/ttm/ttm_memory.c
> > > > > > > > > > > @@ -363,7 +363,7 @@ static int
> > > > > > > > > > > ttm_mem_init_highmem_zone(struct
> > > > > > > > > > > ttm_mem_global *glob,
> > > > > > > > > > >          glob->zones[glob->num_zones++] = zone;
> > > > > > > > > > >          return 0;
> > > > > > > > > > >      }
> > > > > > > > > > > -#else
> > > > > > > > > > > +#elifndef CONFIG_64BIT
> > > > > > > > > > >      static int ttm_mem_init_dma32_zone(struct
> > > > > > > > > > > ttm_mem_global
> > > > > > > > > > > *glob,
> > > > > > > > > > >                         const struct sysinfo *si)
> > > > > > > > > > >      {
> > > > > > > > > > > @@ -441,7 +441,7 @@ int
> > > > > > > > > > > ttm_mem_global_init(struct
> > > > > > > > > > > ttm_mem_global
> > > > > > > > > > > *glob)
> > > > > > > > > > >          ret = ttm_mem_init_highmem_zone(glob,
> > > > > > > > > > > &si);
> > > > > > > > > > >          if (unlikely(ret != 0))
> > > > > > > > > > >              goto out_no_zone;
> > > > > > > > > > > -#else
> > > > > > > > > > > +#elifndef CONFIG_64BIT
> > > > > > > > > > >          ret = ttm_mem_init_dma32_zone(glob,
> > > > > > > > > > > &si);
> > > > > > > > > > >          if (unlikely(ret != 0))
> > > > > > > > > > >              goto out_no_zone;
> > > > > > > > _______________________________________________
> > > > > > > > amd-gfx mailing list
> > > > > > > > amd-gfx at lists.freedesktop.org
> > > > > > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cthellstrom%40vmware.com%7C40edb735811c4c609d5f08d6970e52df%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636862485126215077&sdata=Q4%2Fga%2B8HbwMLRhePkp62uiUMtkpWfp44nMuNumiOskM%3D&reserved=0 
> > > > > > > > 
> > > > _______________________________________________
> > > > amd-gfx mailing list
> > > > amd-gfx at lists.freedesktop.org
> > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7Cthellstrom%40vmware.com%7C40edb735811c4c609d5f08d6970e52df%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636862485126215077&sdata=Q4%2Fga%2B8HbwMLRhePkp62uiUMtkpWfp44nMuNumiOskM%3D&reserved=0


More information about the dri-devel mailing list