[PATCH 8/9] drm/i915/gvt: Manage shadow pages with radix tree

Du, Changbin changbin.du at intel.com
Tue Dec 26 03:01:21 UTC 2017


On Mon, Dec 25, 2017 at 09:40:15PM +0800, Zhi Wang wrote:
> Finally I got the paper I read at that time, since I did think about
> replacing hash table with radix or rb tree before.
> 
> https://www.kernel.org/doc/ols/2005/ols2005v2-pages-87-98.pdf
> 
> Radix is good in larger scalability.
> 
> On stackflow.com, there was also one guy who did a comprehensive comparison
> between linux radix/red black tree/hashtable and a patented search tree from
> HP in performance and memory consumption. But I cannot find it now. I give
> up radix/rb tree mostly based on his results.
>
We use Frame Number as key here. Considering a 4GB memory, there are only 20bits used
to index the tree. It means the tree height is 4 or 5 (depens on kernel radix config)
except common pre-fix (since kerel always use 64bit key).

It is fair enough to append some data for such change. I will collect some sample
data in commit message.

> Thanks,
> Zhi.
> 
> On 12/25/17 20:45, Zhi Wang wrote:
> > I did think this for a while after sending out previous email.  We can
> > not just imagine the algorithms will work well, because it's used on
> > page cache, because it's in kernel. Algorithms can behave quite
> > different with different scenarios and inputs/scalability. Usually, they
> > are results of trade-offs.
> > 
> > Besides, each node of radix tree in linux implementation requires much
> > more memory than rb/hash. In the page cache case, there must be some
> > trade-off between memory consumption and performance under specific
> > scenarios.
> > 
> > But the trade-off in our case might be quite different. We should decide
> > the trade-off by testing, then decide what's kinds of trade off we need.
> > For example, how much extra memory radix tree needs and how much benefit
> > it brings.
> > 
> > The reason why the hash table is there is because it was there since the
> > time earlier than GVT-g on Haswell. If you want to change it, do
> > convince people by data. Or you can request to add another task item.
> > 
> > Thanks,
> > Zhi.
> > 
> > 
> > On 12/25/17 18:30, Zhi Wang wrote:
> > > Yes. Mostly I think it's because page cache has to use radix tree.
> > > Just like I said you only get benefit in large scalability case.
> > > Jike did a research and comparison during writting IOVA map cache
> > > and I did see some FS moved to use rb tree from radix tree because
> > > of the memory consumption.
> > > 
> > > With those considerations, I suggest you attach some data in the
> > > patch comments. Because it might be good to page cache, but might
> > > not be good to us.
> > > 
> > > Thanks,
> > > Zhi.
> > > 
> > > On 12/25/17 18:16, Du, Changbin wrote:
> > > > On Mon, Dec 25, 2017 at 05:35:05PM +0800, Zhi Wang wrote:
> > > > > Better attach some investigation and performance data here.
> > > > > 
> > > > > Radix tree is quite heavy and requires more memory than hash
> > > > > and rb tree. A
> > > > > lot of Linux subsystems finally turn to rb tree from radix
> > > > > because of that.
> > > > > 
> > > > Sure? then why page cache still use radix tree? :
> > > > 
> > > > > It only shows better performance in a very large
> > > > > scalability, compared to
> > > > > hash and RB. So, usually, people tends to use RB tree
> > > > > instead of radix, if
> > > > > the hash table is not satisfying.
> > > > > 
> > > > Here we what we need is a map and radix is it. Most of the
> > > > rb-tree use cases are
> > > > to manage a data with range, like IOVA allocator, VMALLOC and so on.
> > > > 
> > > > Actually, here is not so sensitive. Bot rb-tree and radix tree
> > > > are very quick,
> > > > but a little slower than hasn table.
> > > > 
> > > > > Thanks,
> > > > > Zhi.
> > > > > 
> > > > > On 12/25/17 17:11, changbin.du at intel.com wrote:
> > > > > > From: Changbin Du <changbin.du at intel.com>
> > > > > > 
> > > > > > We don't know how many page tables will be shadowed. It varies
> > > > > > considerably corresponding to guest load. So radix tree is the
> > > > > > best choice for us.
> > > > > > 
> > > > > > Signed-off-by: Changbin Du <changbin.du at intel.com>
> > > > > > ---
> > > > > >    drivers/gpu/drm/i915/gvt/gtt.c | 48
> > > > > > ++++++++++++++++++++++--------------------
> > > > > >    drivers/gpu/drm/i915/gvt/gtt.h |  4 +---
> > > > > >    2 files changed, 26 insertions(+), 26 deletions(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/gvt/gtt.c
> > > > > > b/drivers/gpu/drm/i915/gvt/gtt.c
> > > > > > index 1866c2d..1ce6393 100644
> > > > > > --- a/drivers/gpu/drm/i915/gvt/gtt.c
> > > > > > +++ b/drivers/gpu/drm/i915/gvt/gtt.c
> > > > > > @@ -638,8 +638,8 @@ static void ppgtt_free_spt(struct
> > > > > > intel_vgpu_ppgtt_spt *spt)
> > > > > >        dma_unmap_page(kdev, spt->shadow_page.mfn <<
> > > > > > I915_GTT_PAGE_SHIFT, 4096,
> > > > > >                   PCI_DMA_BIDIRECTIONAL);
> > > > > > -    if (!hlist_unhashed(&spt->node))
> > > > > > -        hash_del(&spt->node);
> > > > > > +
> > > > > > +    radix_tree_delete(&spt->vgpu->gtt.spt_tree,
> > > > > > spt->shadow_page.mfn);
> > > > > >        if (spt->guest_page.oos_page)
> > > > > >            detach_oos_page(spt->vgpu, spt->guest_page.oos_page);
> > > > > > @@ -652,12 +652,14 @@ static void ppgtt_free_spt(struct
> > > > > > intel_vgpu_ppgtt_spt *spt)
> > > > > >    static void ppgtt_free_all_spt(struct intel_vgpu *vgpu)
> > > > > >    {
> > > > > > -    struct hlist_node *n;
> > > > > >        struct intel_vgpu_ppgtt_spt *spt;
> > > > > > -    int i;
> > > > > > +    struct radix_tree_iter iter;
> > > > > > +    void **slot;
> > > > > > -    hash_for_each_safe(vgpu->gtt.spt_hash_table, i, n, spt, node)
> > > > > > +    radix_tree_for_each_slot(slot, &vgpu->gtt.spt_tree, &iter, 0) {
> > > > > > +        spt = radix_tree_deref_slot(slot);
> > > > > >            ppgtt_free_spt(spt);
> > > > > > +    }
> > > > > >    }
> > > > > >    static int ppgtt_handle_guest_write_page_table_bytes(
> > > > > > @@ -695,16 +697,10 @@ static struct intel_vgpu_ppgtt_spt
> > > > > > *intel_vgpu_find_spt_by_gfn(
> > > > > >    }
> > > > > >    /* Find the spt by shadow page mfn. */
> > > > > > -static struct intel_vgpu_ppgtt_spt *intel_vgpu_find_spt_by_mfn(
> > > > > > +static inline struct intel_vgpu_ppgtt_spt
> > > > > > *intel_vgpu_find_spt_by_mfn(
> > > > > >            struct intel_vgpu *vgpu, unsigned long mfn)
> > > > > >    {
> > > > > > -    struct intel_vgpu_ppgtt_spt *spt;
> > > > > > -
> > > > > > -    hash_for_each_possible(vgpu->gtt.spt_hash_table,
> > > > > > spt, node, mfn) {
> > > > > > -        if (spt->shadow_page.mfn == mfn)
> > > > > > -            return spt;
> > > > > > -    }
> > > > > > -    return NULL;
> > > > > > +    return radix_tree_lookup(&vgpu->gtt.spt_tree, mfn);
> > > > > >    }
> > > > > >    static int reclaim_one_ppgtt_mm(struct intel_gvt *gvt);
> > > > > > @@ -739,8 +735,8 @@ static struct intel_vgpu_ppgtt_spt
> > > > > > *ppgtt_alloc_spt(
> > > > > >                     0, 4096, PCI_DMA_BIDIRECTIONAL);
> > > > > >        if (dma_mapping_error(kdev, daddr)) {
> > > > > >            gvt_vgpu_err("fail to map dma addr\n");
> > > > > > -        free_spt(spt);
> > > > > > -        return ERR_PTR(-EINVAL);
> > > > > > +        ret = -EINVAL;
> > > > > > +        goto err_free_spt;
> > > > > >        }
> > > > > >        spt->shadow_page.vaddr = page_address(spt->shadow_page.page);
> > > > > >        spt->shadow_page.mfn = daddr >> I915_GTT_PAGE_SHIFT;
> > > > > > @@ -753,17 +749,23 @@ static struct intel_vgpu_ppgtt_spt
> > > > > > *ppgtt_alloc_spt(
> > > > > >        ret = intel_vgpu_register_page_track(vgpu, spt->guest_page.gfn,
> > > > > >                        ppgtt_write_protection_handler, spt);
> > > > > > -    if (ret) {
> > > > > > -        free_spt(spt);
> > > > > > -        dma_unmap_page(kdev, daddr, PAGE_SIZE,
> > > > > > PCI_DMA_BIDIRECTIONAL);
> > > > > > -        return ERR_PTR(ret);
> > > > > > -    }
> > > > > > +    if (ret)
> > > > > > +        goto err_unmap_dma;
> > > > > > -    INIT_HLIST_NODE(&spt->node);
> > > > > > -    hash_add(vgpu->gtt.spt_hash_table, &spt->node,
> > > > > > spt->shadow_page.mfn);
> > > > > > +    ret = radix_tree_insert(&vgpu->gtt.spt_tree,
> > > > > > spt->shadow_page.mfn, spt);
> > > > > > +    if (ret)
> > > > > > +        goto err_unreg_page_track;
> > > > > >        trace_spt_alloc(vgpu->id, spt, type,
> > > > > > spt->shadow_page.mfn, gfn);
> > > > > >        return spt;
> > > > > > +
> > > > > > +err_unreg_page_track:
> > > > > > +    intel_vgpu_unregister_page_track(vgpu, spt->guest_page.gfn);
> > > > > > +err_unmap_dma:
> > > > > > +    dma_unmap_page(kdev, daddr, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
> > > > > > +err_free_spt:
> > > > > > +    free_spt(spt);
> > > > > > +    return ERR_PTR(ret);
> > > > > >    }
> > > > > >    #define pt_entry_size_shift(spt) \
> > > > > > @@ -1965,7 +1967,7 @@ int intel_vgpu_init_gtt(struct intel_vgpu *vgpu)
> > > > > >    {
> > > > > >        struct intel_vgpu_gtt *gtt = &vgpu->gtt;
> > > > > > -    hash_init(gtt->spt_hash_table);
> > > > > > +    INIT_RADIX_TREE(&gtt->spt_tree, GFP_KERNEL);
> > > > > >        INIT_LIST_HEAD(&gtt->ppgtt_mm_list_head);
> > > > > >        INIT_LIST_HEAD(&gtt->oos_page_list_head);
> > > > > > diff --git a/drivers/gpu/drm/i915/gvt/gtt.h
> > > > > > b/drivers/gpu/drm/i915/gvt/gtt.h
> > > > > > index 32504c0..c006cc9d 100644
> > > > > > --- a/drivers/gpu/drm/i915/gvt/gtt.h
> > > > > > +++ b/drivers/gpu/drm/i915/gvt/gtt.h
> > > > > > @@ -39,7 +39,6 @@
> > > > > >    struct intel_vgpu_mm;
> > > > > > -#define INTEL_GVT_GTT_HASH_BITS 8
> > > > > >    #define INTEL_GVT_INVALID_ADDR (~0UL)
> > > > > >    struct intel_gvt_gtt_entry {
> > > > > > @@ -179,7 +178,7 @@ struct intel_vgpu_gtt {
> > > > > >        struct intel_vgpu_mm *ggtt_mm;
> > > > > >        unsigned long active_ppgtt_mm_bitmap;
> > > > > >        struct list_head ppgtt_mm_list_head;
> > > > > > -    DECLARE_HASHTABLE(spt_hash_table, INTEL_GVT_GTT_HASH_BITS);
> > > > > > +    struct radix_tree_root spt_tree;
> > > > > >        struct list_head oos_page_list_head;
> > > > > >        struct list_head post_shadow_list_head;
> > > > > >        struct intel_vgpu_scratch_pt scratch_pt[GTT_TYPE_MAX];
> > > > > > @@ -212,7 +211,6 @@ struct intel_vgpu_oos_page {
> > > > > >    struct intel_vgpu_ppgtt_spt {
> > > > > >        atomic_t refcount;
> > > > > >        struct intel_vgpu *vgpu;
> > > > > > -    struct hlist_node node;
> > > > > >        struct {
> > > > > >            intel_gvt_gtt_type_t type;
> > > > > > 
> > > > > _______________________________________________
> > > > > intel-gvt-dev mailing list
> > > > > intel-gvt-dev at lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> > > > 
> > _______________________________________________
> > intel-gvt-dev mailing list
> > intel-gvt-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev

-- 
Thanks,
Changbin Du


More information about the intel-gvt-dev mailing list