[poppler] Question on policy w.r.t. code modernization

Tue Dec 29 12:39:13 PST 2015

Hello again,

Am 29.12.2015 um 20:28 schrieb Carlos Garcia Campos:
> Albert Astals Cid <aacid at kde.org> writes:
> 
>> El Tuesday 29 December 2015, a les 12:28:28, Adam Reichold va escriure:
>>> Hello again,
>>>
>>> Am 29.12.2015 um 12:04 schrieb Albert Astals Cid:
>>>> El Monday 28 December 2015, a les 20:32:31, Adam Reichold va escriure:
>>>>> Hello,
>>>>>
>>>>> Am 28.12.2015 um 18:55 schrieb Albert Astals Cid:
>>>>>> El Friday 25 December 2015, a les 14:38:12, Adam Reichold va escriure:
>>>>>>> Hello,
>>>>>>>
>>>>>>> remembering how I tried to push some minor optimizations to GooString
>>>>>>> some time ago, I wanted to ask about the project's policy on how to
>>>>>>> handle code modernizations in the future.
>>>>>>>
>>>>>>> To me, it seems like stuff that is pervasive like GooList or GooString
>>>>>>> is unlikely to be changed with the currently available man power due to
>>>>>>> the risk of regressions. One way of resolving this is IMHO to try to
>>>>>>> remove the code completely and replace it with externally maintained
>>>>>>> and
>>>>>>> highly optimized components, e.g. using std::vector and std::string
>>>>>>> from
>>>>>>> the standard library instead of GooList and GooString. But this will
>>>>>>> probably make it rather hard to merge xpdf changes in the future. Then
>>>>>>> again, this seems to be hardly happening as is.
>>>>>>
>>>>>> Do we actually get any benefit for using std::string instead of
>>>>>> GooString?
>>>>>> I can only see it creating regressions and lots of work for what is
>>>>>> probably a very marginal speed improvement (happy to be proven wrong)
>>>>>> and
>>>>>> also the class does not give us any maintainance overhead since it's
>>>>>> "done".
>>>>>
>>>>> Basically, these are just examples to start this general discussion with
>>>>> a concrete narrative. And while these modules certainly do what they
>>>>> were intended to do, I would argue that their interfaces and
>>>>> implementations are inferior to what most of the standard libraries on
>>>>> various platforms provide. Also, better usage of the STL would improve
>>>>> type safety, e.g. std::vector<T> instead of GooList storing void*, which
>>>>> could also help with some of the issue mentioned below.
>>>>>
>>>>> What is considered marginal or done, can be debated IMHO. Poppler is
>>>>> used by a lot of people, so even saving a percent or two of runtime or
>>>>> memory would surely save quite a bit in absolute terms.
>>>>
>>>> Yeah, that very same lot of people will complain if you change the API
>>>> (even if we've told them not to use it lots of people do).
>>>>
>>>>> Also similar
>>>>> projects did make similar optimizations for a possibly smaller user
>>>>> base, e.g. Okular's text processing code does employ the more compact
>>>>> small buffer optimization I proposed for GooString. Of course, the same
>>>>> numbers imply opposite relationships w.r.t. regressions which is why I
>>>>> would like to know how the maintainers think about this.
>>>>>
>>>>> Phrased more generally, my question is whether this project sees room to
>>>>> change things that are not broken. Or an explicit statement that changes
>>>>> are restricted to adding features and fixing bugs if that is the case,
>>>>> i.e. if the fundamentals of Poppler are considered "done".
>>>>
>>>> Personally I think replacing GooString with std::string everywhere is a
>>>> hell lot of work and I will not do it in my spare time.
>>>
>>> This is definitely not about trying to tell you what to do in your spare
>>> time. It is about what kind of patches actually have a chance of being
>>> reviewed and accepted, e.g. if they significantly break away from
>>> Poppler's xpdf roots. (Also, the GooList and GooString examples are just
>>> examples. Fixing const methods and copy constructors everywhere is
>>> pretty the same category IMHO.)
>>>
>>>> If you want to do it, prove there's no regressions and prove there's a
>>>> measurable gain, we can talk about merging the patch.
>>>
>>> So can I interpret this together with Carlos' answer as: One can
>>> significantly break away from the current code organization, but the bar
>>> w.r.t. test cases and benchmarks will be pretty high? 
>>
>> Any patch needs to provide and improvement better than the risk it causes.
> 
> Indeed, code readability or less error prone code are also improvements.
> 
>> That goes with all the patches be it new features, bug fixes or internal 
>> rewrites.
>>
>> The thing is it is genearlly easier to match that condition for new features 
>> and bug fixes than for internal rewrites, but the conditions are the same.
>>
>>> (I am not sure if
>>> "prove" is to be taken literally? Because if so, I would indeed
>>> interpret this as a strict no.)
>>
>> If it doesn't cause regressions over the thousands of files i have here it is 
>> good enough ;)
> 
> Yes, my point was that merging xpdf changes shouldn't be a reason to not
> improve our code anymore. But of course any rewrite shouldn't introduce
> any regression. Unfortunately our regression tools don't measure
> performance at all.

At least for micro benchmarks, it seems relatively simple to integrate
this into the existing QTest-based unit tests, c.f. the patch I posted
on GooString's SBO which did include several GooString micro benchmarks.
They are generally of similar complexity as a unit test with the
framework handling most of the set up, measurement and statistics.

Also, I guess the Python-based regtest framework could be expanded to
generate performance reports about runtime and memory usage. Checking
them for significant relative changes seems possible as well. If there
is general interest in that, I would also volunteer to try to work
something out.

Best regards, Adam.

>> Cheers,
>>   Albert
>>
>>>
>>> Best regards, Adam.
>>>
>>>> Cheers,
>>>>
>>>>   Albert
>>>>   
>>>>>> If we're going to diverge for the sake of modernizing the source code i
>>>>>> think that makes more sense adding overrides to overriden functions,
>>>>>> adding const to const functions and making sure that classes that should
>>>>>> not be copied/ assigned by value have the copy constructor and
>>>>>> assignment
>>>>>> operators deleted.
>>>>>
>>>>> I completely agree. As stated above, the chosen examples were just what
>>>>> got me started thinking about this again recently.
>>>>>
>>>>>>> So I guess that basically, my question is on the opinions of the
>>>>>>> maintainers to detach the Poppler code base from the xpdf code base as
>>>>>>> a
>>>>>>> prerequisite to thoroughly modernizing it. Related to this is the
>>>>>>> question on whether raising the language standard to C++11 or C++14 is
>>>>>>> considered possible.
>>>>>
>>>>> Best regards, Adam.
>>>>
>>>> _______________________________________________
>>>> poppler mailing list
>>>> poppler at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/poppler
>>
>> _______________________________________________
>> poppler mailing list
>> poppler at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/poppler
> 
> 
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20151229/221fe5f1/attachment.sig>