[Intel-gfx] [PATCH 4/4] drm/i915/bios: do not discard address space

Fri Nov 8 21:02:46 UTC 2019

On Fri, Nov 08, 2019 at 12:14:05PM -0800, Lucas De Marchi wrote:
> On Fri, Nov 08, 2019 at 09:19:15PM +0200, Ville Syrjälä wrote:
> >On Fri, Nov 08, 2019 at 10:18:52AM -0800, Lucas De Marchi wrote:
> >> On Fri, Nov 08, 2019 at 01:14:03PM +0200, Jani Nikula wrote:
> >> >On Thu, 07 Nov 2019, Lucas De Marchi <lucas.demarchi at intel.com> wrote:
> >> >> When we are mapping the VBT through pci_map_rom() we may not be allowed
> >> >> to simply discard the address space and go on reading the memory. After
> >> >> checking on my test system that dumping the rom via sysfs I could
> >> >> actually get the correct vbt, I decided to change the implementation to
> >> >> use the same approach, by calling memcpy_fromio().
> >> >>
> >> >> In order to avoid copying the entire oprom this implements a simple
> >> >> memmem() searching for "$VBT". Contrary to the previous implementation
> >> >> this also takes care of not issuing unaligned PCI reads that would
> >> >> otherwise get translated into more even more reads. I also vaguely
> >> >> remember unaligned reads failing in the past with some devices.
> >> >>
> >> >> Also make sure we copy only the VBT and not the entire oprom that is
> >> >> usually much larger.
> >> >
> >> >So you have
> >> >
> >> >1. a fix to unaligned reads
> >>
> >> unaligned io reads, yes
> >>
> >> >
> >> >2. an optimization to avoid reading individual bytes four times
> >>
> >> it was by no means an optimization. Not reading the same byte 4 bytes is
> >> there actually to stop doing the unaligned IO reads. You can't have (2)
> >> without (1) unless you switch to ioreadb() and add a shift (which may
> >> not be a bad idea.
> >>
> >> >
> >> >3. respecting __iomem and copying (I guess these are tied together)
> >> >
> >> >Seems to me that really should be at least three patches. Not
> >> >necessarily in the above order.
> >>
> >> (3) is actually the most important I think, so I will start by that.
> >>
> >> >
> >> >Follow-up: store pointer to the oprom vbt somewhere under i915->vbt, and
> >> >have debugfs i915_vbt() handle that properly.
> >>
> >> I don't think this is needed. The thing I'm doing here is the same as
> >> what can be accomplished by reading the rom from sysfs:
> >>
> >> find /sys/bus/pci/devices/*/ -name rom
> >> ... choose one
> >>
> >> echo 1 > rom # to allow reading the rom
> >> hexdump -C rom
> >>
> >>
> >> >
> >> >>
> >> >> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> >> >> ---
> >> >>  drivers/gpu/drm/i915/display/intel_bios.c | 95 +++++++++++++++++++----
> >> >>  1 file changed, 79 insertions(+), 16 deletions(-)
> >> >>
> >> >> diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c
> >> >> index 671bbce6ba5b..c401e90b7cf1 100644
> >> >> --- a/drivers/gpu/drm/i915/display/intel_bios.c
> >> >> +++ b/drivers/gpu/drm/i915/display/intel_bios.c
> >> >> @@ -1806,31 +1806,88 @@ bool intel_bios_is_valid_vbt(const void *buf, size_t size)
> >> >>  	return vbt;
> >> >>  }
> >> >>
> >> >> -static const struct vbt_header *find_vbt(void __iomem *oprom, size_t size)
> >> >> +void __iomem *find_vbt(void __iomem *oprom, size_t size)
> >> >>  {
> >> >> -	size_t i;
> >> >> +	const u32 MAGIC = *((const u32 *)"$VBT");
> >> >> +	size_t done = 0, cur = 0;
> >> >> +	void __iomem *p;
> >> >> +	u8 buf[128];
> >> >> +	u32 val;
> >> >>
> >> >> -	/* Scour memory looking for the VBT signature. */
> >> >> -	for (i = 0; i + 4 < size; i++) {
> >> >> -		void *vbt;
> >> >> +	/*
> >> >> +	 * poor's man memmem() with sizeof(buf) window to avoid frequent
> >> >> +	 * wrap-arounds and using u32 for comparison. This gives us 4
> >> >> +	 * comparisons per ioread32() and avoids unaligned io reads (although it
> >> >> +	 * still does unaligned cpu access).
> >> >> +	 */
> >> >
> >> >If we're really worried about performance here, and use a local buffer
> >> >to optimize the wraparounds, would it actually be more efficient to use
> >> >memcpy_fromio() which has an arch specific implementation in asm?
> >>
> >> Not really worried about performance. I actually did 3 implementations
> >> that avoids the unaligned io reads.
> >>
> >> 1) this one
> >> 2) memcpy_fromio() to the local buffer + strnstr()
> >> 3) allocate a oprom buffer, memcpy_fromio() the entire rom and keep a
> >> pointer to it. Then free the oprom after the vbt is used
> >>
> >> (2) and (1) had basically the same complexity involved of requiring a
> >> wrap around local buffer, so I went with (1)
> >>
> >> I didn't feel confortable with (3) because it would allocate much more
> >> memory than really needed.
> >>
> >> >
> >> >In any case makes you think you should first have the patch that the
> >> >patch subject claims, fix unaligned reads and add optimizations
> >> >next. This one does too much.
> >>
> >> Again, it was not really meant to be an optimization.
> >>
> >> Ville told me that we may not really need to deal with the unaligned
> >> access and change the implementation to expect the VBT to be aligned.
> >> This I would be the simplest way to change it, but I'm not fond on
> >> changing this and breaking old systems usin it... anyway, we can give it
> >> a try and revert if it breaks.
> >
> >The current code already assumes 4 byte alignment. So nothing would
> >change and so nothing can get broken.
> 
> the code for reading the oprom via pci is not assuming 4-byte alignment.
> See above, it's doing
> 
> 	for (i = 0; i + 4 < size; i++)
> 
> It might as well be using a ioread8() + a shift and it would be the
> same, since it only advances 1 byte per loop.

Hmm, indeed you are correct. I guess the +4 tricked me into thinking
otherwise.

> 
> Saying that from acpi it needs to be aligned so the oprom should be
> aligned IMO is not valid because the pci method was there before the
> acpi one.  I don't know exactly what I may be breaking if I switch to
> 4-byte alignment.
> 
> Anyway, my new patch splits the changes, as requested by Jani.
> Just enforcing we don't ignore the address space already fixes my
> issue. So maybe we can leave the alignment alone and not touch it.

I suppose we can stick to the i++ if the code can be kept simple
enough. But I'm totally willing to put my name on a i+=4 patch
if the complexity starts to rise alarmingly.

-- 
Ville Syrjälä
Intel