[Intel-gfx] [PATCH 4/4] drm/i915/bios: do not discard address space

Lucas De Marchi lucas.demarchi at intel.com
Fri Nov 8 21:09:40 UTC 2019


On Fri, Nov 08, 2019 at 11:02:46PM +0200, Ville Syrjälä wrote:
>On Fri, Nov 08, 2019 at 12:14:05PM -0800, Lucas De Marchi wrote:
>> On Fri, Nov 08, 2019 at 09:19:15PM +0200, Ville Syrjälä wrote:
>> >On Fri, Nov 08, 2019 at 10:18:52AM -0800, Lucas De Marchi wrote:
>> >> On Fri, Nov 08, 2019 at 01:14:03PM +0200, Jani Nikula wrote:
>> >> >On Thu, 07 Nov 2019, Lucas De Marchi <lucas.demarchi at intel.com> wrote:
>> >> >> When we are mapping the VBT through pci_map_rom() we may not be allowed
>> >> >> to simply discard the address space and go on reading the memory. After
>> >> >> checking on my test system that dumping the rom via sysfs I could
>> >> >> actually get the correct vbt, I decided to change the implementation to
>> >> >> use the same approach, by calling memcpy_fromio().
>> >> >>
>> >> >> In order to avoid copying the entire oprom this implements a simple
>> >> >> memmem() searching for "$VBT". Contrary to the previous implementation
>> >> >> this also takes care of not issuing unaligned PCI reads that would
>> >> >> otherwise get translated into more even more reads. I also vaguely
>> >> >> remember unaligned reads failing in the past with some devices.
>> >> >>
>> >> >> Also make sure we copy only the VBT and not the entire oprom that is
>> >> >> usually much larger.
>> >> >
>> >> >So you have
>> >> >
>> >> >1. a fix to unaligned reads
>> >>
>> >> unaligned io reads, yes
>> >>
>> >> >
>> >> >2. an optimization to avoid reading individual bytes four times
>> >>
>> >> it was by no means an optimization. Not reading the same byte 4 bytes is
>> >> there actually to stop doing the unaligned IO reads. You can't have (2)
>> >> without (1) unless you switch to ioreadb() and add a shift (which may
>> >> not be a bad idea.
>> >>
>> >> >
>> >> >3. respecting __iomem and copying (I guess these are tied together)
>> >> >
>> >> >Seems to me that really should be at least three patches. Not
>> >> >necessarily in the above order.
>> >>
>> >> (3) is actually the most important I think, so I will start by that.
>> >>
>> >> >
>> >> >Follow-up: store pointer to the oprom vbt somewhere under i915->vbt, and
>> >> >have debugfs i915_vbt() handle that properly.
>> >>
>> >> I don't think this is needed. The thing I'm doing here is the same as
>> >> what can be accomplished by reading the rom from sysfs:
>> >>
>> >> find /sys/bus/pci/devices/*/ -name rom
>> >> ... choose one
>> >>
>> >> echo 1 > rom # to allow reading the rom
>> >> hexdump -C rom
>> >>
>> >>
>> >> >
>> >> >>
>> >> >> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>> >> >> ---
>> >> >>  drivers/gpu/drm/i915/display/intel_bios.c | 95 +++++++++++++++++++----
>> >> >>  1 file changed, 79 insertions(+), 16 deletions(-)
>> >> >>
>> >> >> diff --git a/drivers/gpu/drm/i915/display/intel_bios.c b/drivers/gpu/drm/i915/display/intel_bios.c
>> >> >> index 671bbce6ba5b..c401e90b7cf1 100644
>> >> >> --- a/drivers/gpu/drm/i915/display/intel_bios.c
>> >> >> +++ b/drivers/gpu/drm/i915/display/intel_bios.c
>> >> >> @@ -1806,31 +1806,88 @@ bool intel_bios_is_valid_vbt(const void *buf, size_t size)
>> >> >>  	return vbt;
>> >> >>  }
>> >> >>
>> >> >> -static const struct vbt_header *find_vbt(void __iomem *oprom, size_t size)
>> >> >> +void __iomem *find_vbt(void __iomem *oprom, size_t size)
>> >> >>  {
>> >> >> -	size_t i;
>> >> >> +	const u32 MAGIC = *((const u32 *)"$VBT");
>> >> >> +	size_t done = 0, cur = 0;
>> >> >> +	void __iomem *p;
>> >> >> +	u8 buf[128];
>> >> >> +	u32 val;
>> >> >>
>> >> >> -	/* Scour memory looking for the VBT signature. */
>> >> >> -	for (i = 0; i + 4 < size; i++) {
>> >> >> -		void *vbt;
>> >> >> +	/*
>> >> >> +	 * poor's man memmem() with sizeof(buf) window to avoid frequent
>> >> >> +	 * wrap-arounds and using u32 for comparison. This gives us 4
>> >> >> +	 * comparisons per ioread32() and avoids unaligned io reads (although it
>> >> >> +	 * still does unaligned cpu access).
>> >> >> +	 */
>> >> >
>> >> >If we're really worried about performance here, and use a local buffer
>> >> >to optimize the wraparounds, would it actually be more efficient to use
>> >> >memcpy_fromio() which has an arch specific implementation in asm?
>> >>
>> >> Not really worried about performance. I actually did 3 implementations
>> >> that avoids the unaligned io reads.
>> >>
>> >> 1) this one
>> >> 2) memcpy_fromio() to the local buffer + strnstr()
>> >> 3) allocate a oprom buffer, memcpy_fromio() the entire rom and keep a
>> >> pointer to it. Then free the oprom after the vbt is used
>> >>
>> >> (2) and (1) had basically the same complexity involved of requiring a
>> >> wrap around local buffer, so I went with (1)
>> >>
>> >> I didn't feel confortable with (3) because it would allocate much more
>> >> memory than really needed.
>> >>
>> >> >
>> >> >In any case makes you think you should first have the patch that the
>> >> >patch subject claims, fix unaligned reads and add optimizations
>> >> >next. This one does too much.
>> >>
>> >> Again, it was not really meant to be an optimization.
>> >>
>> >> Ville told me that we may not really need to deal with the unaligned
>> >> access and change the implementation to expect the VBT to be aligned.
>> >> This I would be the simplest way to change it, but I'm not fond on
>> >> changing this and breaking old systems usin it... anyway, we can give it
>> >> a try and revert if it breaks.
>> >
>> >The current code already assumes 4 byte alignment. So nothing would
>> >change and so nothing can get broken.
>>
>> the code for reading the oprom via pci is not assuming 4-byte alignment.
>> See above, it's doing
>>
>> 	for (i = 0; i + 4 < size; i++)
>>
>> It might as well be using a ioread8() + a shift and it would be the
>> same, since it only advances 1 byte per loop.
>
>Hmm, indeed you are correct. I guess the +4 tricked me into thinking
>otherwise.

it tricked me too, it took some time to realize that it was advancing
just one byte... maybe that was when I decided: ok, let me reimplement
this rather than focusing on the minimum viable fix.

>
>>
>> Saying that from acpi it needs to be aligned so the oprom should be
>> aligned IMO is not valid because the pci method was there before the
>> acpi one.  I don't know exactly what I may be breaking if I switch to
>> 4-byte alignment.
>>
>> Anyway, my new patch splits the changes, as requested by Jani.
>> Just enforcing we don't ignore the address space already fixes my
>> issue. So maybe we can leave the alignment alone and not touch it.
>
>I suppose we can stick to the i++ if the code can be kept simple
>enough. But I'm totally willing to put my name on a i+=4 patch
>if the complexity starts to rise alarmingly.

yep, we can try that on top.

thanks
Lucas De Marchi

>
>-- 
>Ville Syrjälä
>Intel


More information about the Intel-gfx mailing list