[PATCH v5 07/19] media: vsp1: dl: Support one-shot entries in the display list

Liviu Dudau Liviu.Dudau at arm.com
Thu Mar 7 16:31:40 UTC 2019


On Thu, Mar 07, 2019 at 03:48:23PM +0200, Laurent Pinchart wrote:
> Hi Liviu,
> 
> On Thu, Mar 07, 2019 at 11:52:18AM +0000, Liviu Dudau wrote:
> > On Wed, Mar 06, 2019 at 08:01:53PM +0200, Laurent Pinchart wrote:
> > > On Wed, Mar 06, 2019 at 02:20:51PM +0000, Liviu Dudau wrote:
> > >> On Wed, Mar 06, 2019 at 01:14:40AM +0200, Laurent Pinchart wrote:
> > >>> On Fri, Feb 22, 2019 at 03:06:19PM +0000, Brian Starkey wrote:
> > >>>> On Fri, Feb 22, 2019 at 04:46:29PM +0200, Laurent Pinchart wrote:
> > >>>>> On Fri, Feb 22, 2019 at 02:30:03PM +0000, Brian Starkey wrote:
> > >>>>>> On Thu, Feb 21, 2019 at 12:32:00PM +0200, Laurent Pinchart wrote:
> > >>>>>>> One-shot entries are used as an alternative to committing a complete new
> > >>>>>>> display list when a couple of registers need to be written for one frame
> > >>>>>>> and then reset to another value for all subsequent frames. This will be
> > >>>>>>> used to implement writeback support that will need to enable writeback
> > >>>>>>> for the duration of a single frame.
> > >>>>>>> 
> > >>>>>>> Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas at ideasonboard.com>
> > >>>>>>> ---
> > >>>>>>>  drivers/media/platform/vsp1/vsp1_dl.c | 78 +++++++++++++++++++++++++++
> > >>>>>>>  drivers/media/platform/vsp1/vsp1_dl.h |  3 ++
> > >>>>>>>  2 files changed, 81 insertions(+)
> > >>>>>>> 
> > >>>>>>> diff --git a/drivers/media/platform/vsp1/vsp1_dl.c b/drivers/media/platform/vsp1/vsp1_dl.c
> > >>>>>>> index 886b3a69d329..7b4d252bfde7 100644
> > >>>>>>> --- a/drivers/media/platform/vsp1/vsp1_dl.c
> > >>>>>>> +++ b/drivers/media/platform/vsp1/vsp1_dl.c
> > >>>>>>> @@ -115,6 +115,12 @@ struct vsp1_dl_body {
> > >>>>>>>  
> > >>>>>>>  	unsigned int num_entries;
> > >>>>>>>  	unsigned int max_entries;
> > >>>>>>> +
> > >>>>>>> +	unsigned int num_patches;
> > >>>>>>> +	struct {
> > >>>>>>> +		struct vsp1_dl_entry *entry;
> > >>>>>>> +		u32 data;
> > >>>>>>> +	} patches[2];
> > >>>>>>>  };
> > >>>>>>>  
> > >>>>>>>  /**
> > >>>>>>> @@ -361,6 +367,7 @@ void vsp1_dl_body_put(struct vsp1_dl_body *dlb)
> > >>>>>>>  		return;
> > >>>>>>>  
> > >>>>>>>  	dlb->num_entries = 0;
> > >>>>>>> +	dlb->num_patches = 0;
> > >>>>>>>  
> > >>>>>>>  	spin_lock_irqsave(&dlb->pool->lock, flags);
> > >>>>>>>  	list_add_tail(&dlb->free, &dlb->pool->free);
> > >>>>>>> @@ -388,6 +395,47 @@ void vsp1_dl_body_write(struct vsp1_dl_body *dlb, u32 reg, u32 data)
> > >>>>>>>  	dlb->num_entries++;
> > >>>>>>>  }
> > >>>>>>>  
> > >>>>>>> +/**
> > >>>>>>> + * vsp1_dl_body_write_oneshot - Write a register to a display list body for a
> > >>>>>>> + *	single frame
> > >>>>>>> + * @dlb: The body
> > >>>>>>> + * @reg: The register address
> > >>>>>>> + * @value: The register value
> > >>>>>>> + * @reset_value: The value to reset the register to at the next vblank
> > >>>>>>> + *
> > >>>>>>> + * Display lists in continuous mode are re-used by the hardware for successive
> > >>>>>>> + * frames until a new display list is committed. Changing the VSP configuration
> > >>>>>>> + * normally requires creating and committing a new display list. This function
> > >>>>>>> + * offers an alternative race-free way by writing a @value to the @register in
> > >>>>>>> + * the display list body for a single frame, specifying in @reset_value the
> > >>>>>>> + * value to reset the register to one vblank after the display list is
> > >>>>>>> + * committed.
> > >>>>>>> + *
> > >>>>>>> + * The maximum number of one-shot entries is limited to 2 per display list body,
> > >>>>>>> + * and one-shot entries are counted in the total number of entries specified
> > >>>>>>> + * when the body is allocated by vsp1_dl_body_alloc().
> > >>>>>>> + */
> > >>>>>>> +void vsp1_dl_body_write_oneshot(struct vsp1_dl_body *dlb, u32 reg, u32 value,
> > >>>>>>> +				u32 reset_value)
> > >>>>>>> +{
> > >>>>>>> +	if (WARN_ONCE(dlb->num_entries >= dlb->max_entries,
> > >>>>>>> +		      "DLB size exceeded (max %u)", dlb->max_entries))
> > >>>>>>> +		return;
> > >>>>>>> +
> > >>>>>>> +	if (WARN_ONCE(dlb->num_patches >= ARRAY_SIZE(dlb->patches),
> > >>>>>>> +		      "DLB patches size exceeded (max %zu)",
> > >>>>>>> +		      ARRAY_SIZE(dlb->patches)))
> > >>>>>>> +		return;
> > >>>>>>> +
> > >>>>>>> +	dlb->patches[dlb->num_patches].entry = &dlb->entries[dlb->num_entries];
> > >>>>>>> +	dlb->patches[dlb->num_patches].data = reset_value;
> > >>>>>>> +	dlb->num_patches++;
> > >>>>>>> +
> > >>>>>>> +	dlb->entries[dlb->num_entries].addr = reg;
> > >>>>>>> +	dlb->entries[dlb->num_entries].data = value;
> > >>>>>>> +	dlb->num_entries++;
> > >>>>>>> +}
> > >>>>>>> +
> > >>>>>>>  /* -----------------------------------------------------------------------------
> > >>>>>>>   * Display List Extended Command Management
> > >>>>>>>   */
> > >>>>>>> @@ -652,6 +700,7 @@ static void __vsp1_dl_list_put(struct vsp1_dl_list *dl)
> > >>>>>>>  	 * has at least one body, thus we reinitialise the entries list.
> > >>>>>>>  	 */
> > >>>>>>>  	dl->body0->num_entries = 0;
> > >>>>>>> +	dl->body0->num_patches = 0;
> > >>>>>>>  
> > >>>>>>>  	list_add_tail(&dl->list, &dl->dlm->free);
> > >>>>>>>  }
> > >>>>>>> @@ -930,6 +979,35 @@ void vsp1_dl_list_commit(struct vsp1_dl_list *dl, unsigned int dl_flags)
> > >>>>>>>   * Display List Manager
> > >>>>>>>   */
> > >>>>>>>  
> > >>>>>>> +/**
> > >>>>>>> + * vsp1_dlm_irq_display_start - Display list handler for the display start
> > >>>>>>> + *	interrupt
> > >>>>>>> + * @dlm: the display list manager
> > >>>>>>> + *
> > >>>>>>> + * Apply all one-shot patches registered for the active display list.
> > >>>>>>> + */
> > >>>>>>> +void vsp1_dlm_irq_display_start(struct vsp1_dl_manager *dlm)
> > >>>>>>> +{
> > >>>>>>> +	struct vsp1_dl_body *dlb;
> > >>>>>>> +	struct vsp1_dl_list *dl;
> > >>>>>>> +	unsigned int i;
> > >>>>>>> +
> > >>>>>>> +	spin_lock(&dlm->lock);
> > >>>>>>> +
> > >>>>>>> +	dl = dlm->active;
> > >>>>>>> +	if (!dl)
> > >>>>>>> +		goto done;
> > >>>>>>> +
> > >>>>>>> +	list_for_each_entry(dlb, &dl->bodies, list) {
> > >>>>>>> +		for (i = 0; i < dlb->num_patches; ++i)
> > >>>>>>> +			dlb->patches[i].entry->data = dlb->patches[i].data;
> > >>>>>>> +		dlb->num_patches = 0;
> > >>>>>>> +	}
> > >>>>>>> +
> > >>>>>>> +done:
> > >>>>>>> +	spin_unlock(&dlm->lock);
> > >>>>>>> +}
> > >>>>>>> +
> > >>>>>> 
> > >>>>>> We've got some HW which doesn't support one-shot writeback, and use a
> > >>>>>> similar trick to try and disable writeback immediately after the flip.
> > >>>>>> 
> > >>>>>> We ran into issues where the "start" interrupt wouldn't run in time to
> > >>>>>> make sure the writeback disable was committed before the next frame.
> > >>>>>> We have to keep track of whether the disable really happened in time,
> > >>>>>> before we release the output buffer.
> > >>>>>> 
> > >>>>>> Might you have a similar problem here?
> > >>>>> 
> > >>>>> We may, but there's no provision at the hardware level to check if the
> > >>>>> configuration updated happened in time. I could add some safety checks
> > >>>>> but I believe they would be racy in the best case :-(
> > >>>> 
> > >>>> We managed to find (what I believe to be...) a non-racy way, but it
> > >>>> will of course depend a lot on the HW behaviour, so I'll leave it to
> > >>>> your best judgement.
> > >>>> 
> > >>>> We basically have a "configuration committed" interrupt which we can
> > >>>> use to set a flag indicating writeback was disabled.
> > >>> 
> > >>> The way my hardware is operated is, roughly, as follows:
> > >>> 
> > >>> - The driver prepares a list of values to write to registers and stores
> > >>>   them in DMA-able memory. This doesn't involve the hardware, and so is
> > >>>   completely asynchronous to hardware operation.
> > >>> 
> > >>> - The driver then sets a register with the pointer to the registers
> > >>>   list.
> > >>> 
> > >>> - At the end of the current frame, the hardware reads the memory address
> > >>>   of the registers list and DMAs values to register. This operation is
> > >>>   fast and occurs fully during vertical blanking.
> > >>> 
> > >>> - The hardware then waits for the start of the frame, and begins
> > >>>   processing the framebuffers to send the frame to the display.
> > >>> 
> > >>> - If no new registers list is provided for the next frame, the current
> > >>>   list is reused.
> > >> 
> > >> Does this mean the hardware goes back to step 3 and re-reads the memory
> > >> using DMA?
> > > 
> > > That's correct. At frame end the hardware will always read and apply a
> > > registers list, either a new one or the old one if no new list is
> > > programmed.
> > > 
> > >>> Two interrupts are provided, one at the start of the frame and one at
> > >>> the end of the frame. The driver uses the end of frame interrupt to
> > >>> signal vertical blanking and page flip completion to DRM.
> > >>> 
> > >>> The end of frame interrupt is also used to schedule atomic commits. The
> > >>> hardware queue depth for atomic commits is just one, so the driver has
> > >>> to wait until the previous commit has been processed before writing the
> > >>> new registers list address to the hardware. To solve the race between
> > >>> the end of frame interrupt and the address write, the hardware provides
> > >>> a status bit that is set to 1 when the address is written, and reset to
> > >>> 0 when the hardware starts processing the registers list.
> > >> 
> > >> For my understanding, the following simplication is then true: status = 0
> > >> means driver is allowed to update the pointer register, status = 1 means
> > >> currently programmed pointer is used to fetch the register values, right?
> > > 
> > > Almost. status = 1 means that the pointer has been programmed by the
> > > CPU, but not taken into account by the hardware yet. status = 0 means
> > > that the pointer hasn't been written to by the CPU since the last time
> > > the hardware read it.
> > 
> > Right, so CPU writes to the pointer register and that sets status to 1,
> > and when the HW has copied that value into internal DMA registers it clears
> > it, correct?
> 
> Yes, that's correct.
> 
> > >> What happens if you update the pointer while status = 1? DMA transfer gets
> > >> interrupted and a new one is scheduled? Does the DMA transfer get cancelled?
> > > 
> > > You should not do that, as it's racy. The hardware will use either the
> > > old value of the pointer or new new one, based on the timings of the
> > > reprogrammation, and you can't tell which one is used. Nothing can
> > > cancel the DMA, when it's time to start the DMA the hardware will
> > > transfer the value of the pointer to an internal register and use that
> > > for DMA. Reprogramming the pointer during the DMA will not affect the
> > > DMA, the new value will be taken into account for the next DMA.
> > 
> > It looks to me like you can do the in-place update of the writeback
> > framebuffer address at the end of the current frame (vblank time(?)). If
> > you wait until status = 0 then you know the next frame parameters are
> > being "internalised", so you can set in the commit you're about to queue
> > the disabling of the writeback if that commit doesn't have a fresh
> > writeback request (phew, that's a mouthful).
> 
> My problem is that there may not be a next commit, if userspace hasn't
> queued one. Otherwise this is what I would do. My initial implementation
> was queuing two commits for writeback, one with writeback enabled, and
> immediately after a second one copied from the first but with writeback
> disabled. This halved the frame rate, as the next commit from userspace
> had to first wait for completion of the writeback disabling commit. The
> real issue here is that I would need to queue the writeback disabling
> commit right after the writeback enabling commit to make sure it gets
> processed for the next frame, but once its queued any userspace commit
> would need to wait one extra frame as a queued commit can be processed
> at any time and thus can't be replaced. That's why I decided to modify
> the registers list in place instead of queueing a new commit to disable
> writeback.
> 
> > I don't think you need to wait until the next start of frame interrupt
> > to do that. Also, vblank time is probably the time you want to signal
> > the completion of the previous writeback anyway, right?
> 
> Correct, that's when I signal it.
> 
> > >>> We thus have three states for an atomic commit:
> > >>> 
> > >>> - active, where the corresponding registers list address has been
> > >>>   written to the hardware, and processed
> > >>> 
> > >>> - queued, where the corresponding registers list address has been
> > >>>   written to the hardware but not processed yet
> > >>> 
> > >>> - pending, where the corresponding registers list address hasn't been
> > >>>   written to the hardware yet
> > >>> 
> > >>> The status bit mentioned above allows us to tell if a list exists in the
> > >>> queued state.
> > >>> 
> > >>> At frame end time, if the status bit is set, we have potentially lost
> > >>> the race between writing the new registers list and the frame end
> > >>> interrupt, so we wait for one more vblank. Otherwise, if a list was
> > >>> queued, we move it to the active state, and retire the active list. If a
> > >>> list was pending, we write its address to the hardware, and move it to
> > >>> the queued state.
> > 
> > It looks to me like the moving of the pending state into queued state is your
> > opportunity to also in-place modify the writeback registers if the state does
> > not have its own writeback request.
> 
> Moving from pending to queued means the pointer has been given to the
> hardware, but not processed yet. I need to wait until the commit that
> enables writeback is fully processed before modifying it in-place to
> disable writeback, and that's at the frame start following the move from
> the queued state to the active state.

I'm not attempting to (re)write your driver, only to explain my thinking process in
a way that is easiest for me:

1. driver prepares a new commit that might have a writeback and sets the
pointer register to the new address. It then marks the commit as queued.

(optional) 2. driver receives a new commit that is marked as pending

3. end-of-frame interrupt arrives
     a. HW reads the new address and programs a DMA transfer to update registers.
     b. driver reads the status bit and waits until status == 0.
     c. driver marks queued commit as active and looks if it has any pending commits
          - if yes, it looks at the pending commit if it has a writeback request
	        - if no, it updates the pending commit to write the register(s) that disable writeback
		- moves pending commit to queued
	  - if no, then it copies active commit and disables writeback. (*)
     d. driver signals vblank and any previous writebacks that might have been programmed
        by the previously active commit

4. ....
5. profit!


(*) depending on how the HW behaves, it might be enough to create a stub
    commit that only disables the writeback, with no other update.

This way writeback commits will always be followed by another commit, even if it is a dummy one
that only disables writeback.

Best regards,
Liviu

> 
> > >>> To schedule writeback I ended up using the frame start interrupt.
> > >>> Writeback has a specific need in that it requires enabling the memory
> > >>> write interface for a single frame, and that is not something the
> > >>> hardware supports directly. I can't either queue a new registers list
> > >>> with the write interface disabled right after the list with the
> > >>> writeback request became active, as any atomic commit would then be
> > >>> delayed by an extra frame. I thus ended up modifying the registers list
> > >>> in place at frame start time, which is right after the active register
> > >>> list has been DMA'ed to hardware registers. We have the duration of a
> > >>> whole frame to do so, before the hardware transfers the same list again
> > >>> at the beginning of the next frame.
> > >> 
> > >> Yes, that is like to what we do for DP500 in mali-dp, where we have a
> > >> similar situation. HW will continue to stream to the given buffer until
> > >> stopped, but the writeback API is one-shot mode, so on vblank (when we
> > >> know the writeback will start at the beginning of the new frame) we
> > >> program the memwrite to stop, but that only takes effect when we also
> > >> program the GO bit (CONFIG_VALID in our case).
> > >> 
> > >> One problem we had to take care for DP500 was when the next commit (the
> > >> queued one in your case) also contains a writeback buffer. In that case,
> > >> we don't stop the writeback but "re-start" it.
> > >> 
> > >>> There is a race condition there, as the in-place modification of the
> > >>> active list could be done after the end of the frame if the interrupt
> > >>> latency gets too large. I don't see how I could solve that, as I could
> > >>> write the value after the hardware starts processing the active list at
> > >>> frame end time, but before the interrupt is notified.
> > >> 
> > >> I think that would be a better idea if the HW finishes to transfer
> > >> before the end of frame interrupt is signalled. Otherwise if your
> > >> interrupt handler is scheduled too fast, you might overwrite the active
> > >> frame values?
> > > 
> > > That can't happen, as I patch the active registers list in-place in the
> > > frame start interrupt handler, and frame start occurs after the hardware
> > > finishes the DMA of the registers list.
> > > 
> > >>>>> Note that we have the duration of a complete frame to disable writeback,
> > >>>>> as we receive an interrupt when the frame starts, and have until vblank
> > >>>>> to update the configuration. It's thus slightly better than having to
> > >>>>> disable writeback between vblank and the start of the next frame.
> > >>>> 
> > >>>> Yeah... we have a whole frame too. I'm struggling to find our wiki
> > >>>> page with the data, but anecdotally there's some (out-of-tree) drivers
> > >>>> which keep interrupts masked for a _really long_ time. It's nice if
> > >>>> you don't have to care about those :-)
> > >>> 
> > >>> I only provide two options: not my problem, or give me the source code
> > >>> and documentation and let me upstream a clean version of the out-of-tree
> > >>> drivers :-)
> > >>> 
> > >>>>>>>  /**
> > >>>>>>>   * vsp1_dlm_irq_frame_end - Display list handler for the frame end interrupt
> > >>>>>>>   * @dlm: the display list manager
> > >>>>>>> diff --git a/drivers/media/platform/vsp1/vsp1_dl.h b/drivers/media/platform/vsp1/vsp1_dl.h
> > >>>>>>> index e0fdb145e6ed..f845607abc4c 100644
> > >>>>>>> --- a/drivers/media/platform/vsp1/vsp1_dl.h
> > >>>>>>> +++ b/drivers/media/platform/vsp1/vsp1_dl.h
> > >>>>>>> @@ -54,6 +54,7 @@ struct vsp1_dl_manager *vsp1_dlm_create(struct vsp1_device *vsp1,
> > >>>>>>>  					unsigned int prealloc);
> > >>>>>>>  void vsp1_dlm_destroy(struct vsp1_dl_manager *dlm);
> > >>>>>>>  void vsp1_dlm_reset(struct vsp1_dl_manager *dlm);
> > >>>>>>> +void vsp1_dlm_irq_display_start(struct vsp1_dl_manager *dlm);
> > >>>>>>>  unsigned int vsp1_dlm_irq_frame_end(struct vsp1_dl_manager *dlm);
> > >>>>>>>  struct vsp1_dl_body *vsp1_dlm_dl_body_get(struct vsp1_dl_manager *dlm);
> > >>>>>>>  
> > >>>>>>> @@ -71,6 +72,8 @@ struct vsp1_dl_body *vsp1_dl_body_get(struct vsp1_dl_body_pool *pool);
> > >>>>>>>  void vsp1_dl_body_put(struct vsp1_dl_body *dlb);
> > >>>>>>>  
> > >>>>>>>  void vsp1_dl_body_write(struct vsp1_dl_body *dlb, u32 reg, u32 data);
> > >>>>>>> +void vsp1_dl_body_write_oneshot(struct vsp1_dl_body *dlb, u32 reg, u32 value,
> > >>>>>>> +				u32 reset_value);
> > >>>>>>>  int vsp1_dl_list_add_body(struct vsp1_dl_list *dl, struct vsp1_dl_body *dlb);
> > >>>>>>>  int vsp1_dl_list_add_chain(struct vsp1_dl_list *head, struct vsp1_dl_list *dl);
> > >>>>>>>  
> 
> -- 
> Regards,
> 
> Laurent Pinchart

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯


More information about the dri-devel mailing list