[RFC] ttm merge ttm_backend & ttm_tt

Wed Nov 2 09:48:21 PDT 2011

On Wed, Nov 02, 2011 at 11:04:43AM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Nov 01, 2011 at 09:11:37PM -0400, Jerome Glisse wrote:
> > Hi,
> > 
> > So attached is patch serie (sorry for that i am away of my normal mail
> > config) which ultimately merge ttm_backend & ttm_tt it allows to get
> > rid of data duplication btw the two, especialy btw ttm_tt and driver
> > specific backend. So net result is less 300lines of code accross ttm
> > and driver.
> > 
> > Konrad given some of the upstream nouveau change, your patchserie
> > conflict and would lead to revert some nouveau fixes. I believe the
> > intention is to get the ttm-dma code into 3.3 (3.2 seems late but
> > dunno). If 3.3 is the aim than i will rebase your patch on top of this
> > serie, this should lower the number of change your serie needed.
> > 
> > Note that this is early code, only compile tested, i just wanted to
> > get feedback to know if anyone has objection on that.
> > 
> > For quick review, first 5 patches can be ignored, they are cleanup and
> > dead code removal or small fixes. The real action is in the last
> > patch.
> > 
> > Next set is to move page allocation through ttm_helper into the driver
> > thus making the whole dma stuff a driver business, driver who don't
> > care about dma (like vmw) could ignore it, while driver that have to
> > would deal with it through ttm_helper.
> 
> I took a look at them (1-3 are good. Please put Reviewed-by on them). The 
> 
> [PATCH 4/6] drm/ttm: convert page allocation to use page ptr array instead of list
> 
> has a interesting side effect - previously the list was used so on the get_pages
> the order of pages was:
> 
>  a,b,c,d,e,f
> 
> and when one was putting pages back, the list was iterated as so:
> 
>  for (i = 0; i < ttm->num_pages; ++i) {
>    .. list_add(&cur_page->lru, &h);
>    /* list_add is like a stack, items are put in the front */
>  }
>  ttm_put_pages(&h,..);
> 
> which meant that the list would have the items in:
>  f,e,d,c,b,a
> 
> order. The TTM pool code would iterate over those in that order
> and call __free_page. Which means some performance drawback when the
> memory had to be iterated forward and then backwards, thought it probably
> was cached in the L3 cache, so probably no biggie.
> 
> I don't know why it was done that way, but I wrote the TTM DMA code
> to be optimized for that (and it has the code to reverse direction - b/c
> the testcases I used initially had the a,b,c,d,e,f order when doing get_pages
> and put_pages) - but I can alter the code to do it in the forward fashion instead
> of the reverse fashion.. Better yet, it removes around 70 lines of code from the
> TTM DMA.

Order in which we put them back on the list can be easily change, either
by use of add_tail or by iterating array from end to begining. i am not
sure how much this can impact things.

> Anyhow, it might be noting that in the commit description and perhaps make a bug-fix
> for the put_pages being called in the error path instead of ttm_put_pages as a seperate
> patch as you suggested.
> 
> 
> [PATCH 6/6] drm/ttm: merge ttm_backend and ttm_tt
> 
> That is going to be a bit tricky. You are using the pci_map_page, which should not
> be used if the pages have been allocated via the pci_alloc_coherent (they are already
> mapped). Perhaps a simple check such as:
> 
> 	if !(ttm->be && ttm->be->func && ttm->be->func->get_pages) {
> 		ttm->dma_address[i] = pci_map_page(dev->pdev, ...)
> 	}

Your dma change are not suppose to be on top of that, idea is to add yet
another callbacks populate+free_page which will do either pci map page
or just use your code (ie use dma alloc and store dma addr into the
array). So there is a missing piece here before adding your dma code.
I just wanted to keep ttm_tt & ttm_backend merge as simple as possible
without major change. Adding the populate + get_page callback sounded
like good candidate for another patch.

> That long ttm->be && ttm->be->func can probably be wrapped in a nice macro
> and be put in ttm_page_alloc.h as:
> 
> diff --git a/include/drm/ttm/ttm_page_alloc.h b/include/drm/ttm/ttm_page_alloc.h
> index daf5db6..39d6076 100644
> --- a/include/drm/ttm/ttm_page_alloc.h
> +++ b/include/drm/ttm/ttm_page_alloc.h
> @@ -30,6 +30,12 @@
>  #include "ttm_memory.h"
>  
>  #ifdef CONFIG_SWIOTLB
> +static inline bool ttm_dma_inuse(struct ttm_tt *ttm)
> +{
> +	if (ttm && ttm->be && ttm->be->func && ttm->be->func->get_pages)
> +		return true;
> +	return false;
> +}
>  extern bool ttm_dma_override(struct ttm_backend_func *be);
>  
>  extern int ttm_dma_disable;
> @@ -47,6 +53,11 @@ void ttm_dma_page_alloc_fini(void);
>  extern int ttm_dma_page_alloc_debugfs(struct seq_file *m, void *data);
>  #else
>  #define ttm_dma_disable (1)
> +
> +static inline bool ttm_dma_inuse(struct ttm_tt *ttm)
> +{
> +	return false;
> +}
>  static inline bool ttm_dma_override(struct ttm_backend_func *be)
>  {
>  	return false;
> 
> 
> Hmm, I am also not clear what you are using as base? 3.1 does not seem
> to have a whole bunch of the things your code is doing. Like the
> be->func->clear call from nouveau_sgdma.c.
> 
> What base should I be looking at? Is this the base that Dave has for 3.2?

Should apply on top of linus master kernel as of yesterday.

I will do some testing and add the populate+free page callback
and resend.

Cheers,
Jerome