[PATCH 8/9] drm/i915/gvt: Manage shadow pages with radix tree
Zhi Wang
zhi.a.wang at intel.com
Mon Dec 25 13:40:15 UTC 2017
Finally I got the paper I read at that time, since I did think about
replacing hash table with radix or rb tree before.
https://www.kernel.org/doc/ols/2005/ols2005v2-pages-87-98.pdf
Radix is good in larger scalability.
On stackflow.com, there was also one guy who did a comprehensive
comparison between linux radix/red black tree/hashtable and a patented
search tree from HP in performance and memory consumption. But I cannot
find it now. I give up radix/rb tree mostly based on his results.
Thanks,
Zhi.
On 12/25/17 20:45, Zhi Wang wrote:
> I did think this for a while after sending out previous email. We can
> not just imagine the algorithms will work well, because it's used on
> page cache, because it's in kernel. Algorithms can behave quite
> different with different scenarios and inputs/scalability. Usually, they
> are results of trade-offs.
>
> Besides, each node of radix tree in linux implementation requires much
> more memory than rb/hash. In the page cache case, there must be some
> trade-off between memory consumption and performance under specific
> scenarios.
>
> But the trade-off in our case might be quite different. We should decide
> the trade-off by testing, then decide what's kinds of trade off we need.
> For example, how much extra memory radix tree needs and how much benefit
> it brings.
>
> The reason why the hash table is there is because it was there since the
> time earlier than GVT-g on Haswell. If you want to change it, do
> convince people by data. Or you can request to add another task item.
>
> Thanks,
> Zhi.
>
>
> On 12/25/17 18:30, Zhi Wang wrote:
>> Yes. Mostly I think it's because page cache has to use radix tree.
>> Just like I said you only get benefit in large scalability case. Jike
>> did a research and comparison during writting IOVA map cache and I did
>> see some FS moved to use rb tree from radix tree because of the memory
>> consumption.
>>
>> With those considerations, I suggest you attach some data in the patch
>> comments. Because it might be good to page cache, but might not be
>> good to us.
>>
>> Thanks,
>> Zhi.
>>
>> On 12/25/17 18:16, Du, Changbin wrote:
>>> On Mon, Dec 25, 2017 at 05:35:05PM +0800, Zhi Wang wrote:
>>>> Better attach some investigation and performance data here.
>>>>
>>>> Radix tree is quite heavy and requires more memory than hash and rb
>>>> tree. A
>>>> lot of Linux subsystems finally turn to rb tree from radix because
>>>> of that.
>>>>
>>> Sure? then why page cache still use radix tree? :
>>>
>>>> It only shows better performance in a very large scalability,
>>>> compared to
>>>> hash and RB. So, usually, people tends to use RB tree instead of
>>>> radix, if
>>>> the hash table is not satisfying.
>>>>
>>> Here we what we need is a map and radix is it. Most of the rb-tree
>>> use cases are
>>> to manage a data with range, like IOVA allocator, VMALLOC and so on.
>>>
>>> Actually, here is not so sensitive. Bot rb-tree and radix tree are
>>> very quick,
>>> but a little slower than hasn table.
>>>
>>>> Thanks,
>>>> Zhi.
>>>>
>>>> On 12/25/17 17:11, changbin.du at intel.com wrote:
>>>>> From: Changbin Du <changbin.du at intel.com>
>>>>>
>>>>> We don't know how many page tables will be shadowed. It varies
>>>>> considerably corresponding to guest load. So radix tree is the
>>>>> best choice for us.
>>>>>
>>>>> Signed-off-by: Changbin Du <changbin.du at intel.com>
>>>>> ---
>>>>> drivers/gpu/drm/i915/gvt/gtt.c | 48
>>>>> ++++++++++++++++++++++--------------------
>>>>> drivers/gpu/drm/i915/gvt/gtt.h | 4 +---
>>>>> 2 files changed, 26 insertions(+), 26 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gvt/gtt.c
>>>>> b/drivers/gpu/drm/i915/gvt/gtt.c
>>>>> index 1866c2d..1ce6393 100644
>>>>> --- a/drivers/gpu/drm/i915/gvt/gtt.c
>>>>> +++ b/drivers/gpu/drm/i915/gvt/gtt.c
>>>>> @@ -638,8 +638,8 @@ static void ppgtt_free_spt(struct
>>>>> intel_vgpu_ppgtt_spt *spt)
>>>>> dma_unmap_page(kdev, spt->shadow_page.mfn <<
>>>>> I915_GTT_PAGE_SHIFT, 4096,
>>>>> PCI_DMA_BIDIRECTIONAL);
>>>>> - if (!hlist_unhashed(&spt->node))
>>>>> - hash_del(&spt->node);
>>>>> +
>>>>> + radix_tree_delete(&spt->vgpu->gtt.spt_tree,
>>>>> spt->shadow_page.mfn);
>>>>> if (spt->guest_page.oos_page)
>>>>> detach_oos_page(spt->vgpu, spt->guest_page.oos_page);
>>>>> @@ -652,12 +652,14 @@ static void ppgtt_free_spt(struct
>>>>> intel_vgpu_ppgtt_spt *spt)
>>>>> static void ppgtt_free_all_spt(struct intel_vgpu *vgpu)
>>>>> {
>>>>> - struct hlist_node *n;
>>>>> struct intel_vgpu_ppgtt_spt *spt;
>>>>> - int i;
>>>>> + struct radix_tree_iter iter;
>>>>> + void **slot;
>>>>> - hash_for_each_safe(vgpu->gtt.spt_hash_table, i, n, spt, node)
>>>>> + radix_tree_for_each_slot(slot, &vgpu->gtt.spt_tree, &iter, 0) {
>>>>> + spt = radix_tree_deref_slot(slot);
>>>>> ppgtt_free_spt(spt);
>>>>> + }
>>>>> }
>>>>> static int ppgtt_handle_guest_write_page_table_bytes(
>>>>> @@ -695,16 +697,10 @@ static struct intel_vgpu_ppgtt_spt
>>>>> *intel_vgpu_find_spt_by_gfn(
>>>>> }
>>>>> /* Find the spt by shadow page mfn. */
>>>>> -static struct intel_vgpu_ppgtt_spt *intel_vgpu_find_spt_by_mfn(
>>>>> +static inline struct intel_vgpu_ppgtt_spt
>>>>> *intel_vgpu_find_spt_by_mfn(
>>>>> struct intel_vgpu *vgpu, unsigned long mfn)
>>>>> {
>>>>> - struct intel_vgpu_ppgtt_spt *spt;
>>>>> -
>>>>> - hash_for_each_possible(vgpu->gtt.spt_hash_table, spt, node,
>>>>> mfn) {
>>>>> - if (spt->shadow_page.mfn == mfn)
>>>>> - return spt;
>>>>> - }
>>>>> - return NULL;
>>>>> + return radix_tree_lookup(&vgpu->gtt.spt_tree, mfn);
>>>>> }
>>>>> static int reclaim_one_ppgtt_mm(struct intel_gvt *gvt);
>>>>> @@ -739,8 +735,8 @@ static struct intel_vgpu_ppgtt_spt
>>>>> *ppgtt_alloc_spt(
>>>>> 0, 4096, PCI_DMA_BIDIRECTIONAL);
>>>>> if (dma_mapping_error(kdev, daddr)) {
>>>>> gvt_vgpu_err("fail to map dma addr\n");
>>>>> - free_spt(spt);
>>>>> - return ERR_PTR(-EINVAL);
>>>>> + ret = -EINVAL;
>>>>> + goto err_free_spt;
>>>>> }
>>>>> spt->shadow_page.vaddr = page_address(spt->shadow_page.page);
>>>>> spt->shadow_page.mfn = daddr >> I915_GTT_PAGE_SHIFT;
>>>>> @@ -753,17 +749,23 @@ static struct intel_vgpu_ppgtt_spt
>>>>> *ppgtt_alloc_spt(
>>>>> ret = intel_vgpu_register_page_track(vgpu, spt->guest_page.gfn,
>>>>> ppgtt_write_protection_handler, spt);
>>>>> - if (ret) {
>>>>> - free_spt(spt);
>>>>> - dma_unmap_page(kdev, daddr, PAGE_SIZE,
>>>>> PCI_DMA_BIDIRECTIONAL);
>>>>> - return ERR_PTR(ret);
>>>>> - }
>>>>> + if (ret)
>>>>> + goto err_unmap_dma;
>>>>> - INIT_HLIST_NODE(&spt->node);
>>>>> - hash_add(vgpu->gtt.spt_hash_table, &spt->node,
>>>>> spt->shadow_page.mfn);
>>>>> + ret = radix_tree_insert(&vgpu->gtt.spt_tree,
>>>>> spt->shadow_page.mfn, spt);
>>>>> + if (ret)
>>>>> + goto err_unreg_page_track;
>>>>> trace_spt_alloc(vgpu->id, spt, type, spt->shadow_page.mfn,
>>>>> gfn);
>>>>> return spt;
>>>>> +
>>>>> +err_unreg_page_track:
>>>>> + intel_vgpu_unregister_page_track(vgpu, spt->guest_page.gfn);
>>>>> +err_unmap_dma:
>>>>> + dma_unmap_page(kdev, daddr, PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
>>>>> +err_free_spt:
>>>>> + free_spt(spt);
>>>>> + return ERR_PTR(ret);
>>>>> }
>>>>> #define pt_entry_size_shift(spt) \
>>>>> @@ -1965,7 +1967,7 @@ int intel_vgpu_init_gtt(struct intel_vgpu *vgpu)
>>>>> {
>>>>> struct intel_vgpu_gtt *gtt = &vgpu->gtt;
>>>>> - hash_init(gtt->spt_hash_table);
>>>>> + INIT_RADIX_TREE(>t->spt_tree, GFP_KERNEL);
>>>>> INIT_LIST_HEAD(>t->ppgtt_mm_list_head);
>>>>> INIT_LIST_HEAD(>t->oos_page_list_head);
>>>>> diff --git a/drivers/gpu/drm/i915/gvt/gtt.h
>>>>> b/drivers/gpu/drm/i915/gvt/gtt.h
>>>>> index 32504c0..c006cc9d 100644
>>>>> --- a/drivers/gpu/drm/i915/gvt/gtt.h
>>>>> +++ b/drivers/gpu/drm/i915/gvt/gtt.h
>>>>> @@ -39,7 +39,6 @@
>>>>> struct intel_vgpu_mm;
>>>>> -#define INTEL_GVT_GTT_HASH_BITS 8
>>>>> #define INTEL_GVT_INVALID_ADDR (~0UL)
>>>>> struct intel_gvt_gtt_entry {
>>>>> @@ -179,7 +178,7 @@ struct intel_vgpu_gtt {
>>>>> struct intel_vgpu_mm *ggtt_mm;
>>>>> unsigned long active_ppgtt_mm_bitmap;
>>>>> struct list_head ppgtt_mm_list_head;
>>>>> - DECLARE_HASHTABLE(spt_hash_table, INTEL_GVT_GTT_HASH_BITS);
>>>>> + struct radix_tree_root spt_tree;
>>>>> struct list_head oos_page_list_head;
>>>>> struct list_head post_shadow_list_head;
>>>>> struct intel_vgpu_scratch_pt scratch_pt[GTT_TYPE_MAX];
>>>>> @@ -212,7 +211,6 @@ struct intel_vgpu_oos_page {
>>>>> struct intel_vgpu_ppgtt_spt {
>>>>> atomic_t refcount;
>>>>> struct intel_vgpu *vgpu;
>>>>> - struct hlist_node node;
>>>>> struct {
>>>>> intel_gvt_gtt_type_t type;
>>>>>
>>>> _______________________________________________
>>>> intel-gvt-dev mailing list
>>>> intel-gvt-dev at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
>>>
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
More information about the intel-gvt-dev
mailing list