[Libreoffice-bugs] [Bug 114721] Special char: find the char by drawing it

Thu Dec 28 19:12:21 UTC 2017

https://bugs.documentfoundation.org/show_bug.cgi?id=114721

V Stuart Foote <vstuart.foote at utsa.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |akshaydeepiitr at gmail.com,
                   |                            |fito at libreoffice.org,
                   |                            |kendy at collabora.com,
                   |                            |michael.meeks at collabora.com
                   |                            |, s.mehrbrodt at gmail.com,
                   |                            |vstuart.foote at utsa.edu

--- Comment #1 from V Stuart Foote <vstuart.foote at utsa.edu> ---
Hi Vincent, thank you for posting--you've obviously put a lot of work into it.
Would think arranging a branch on LO project Gerret is the way to go--you can
fully test with assistance of other project devs, and then migrate when stable.

Not my area of expertise, but I would think using an Artificial Neural Network
(NN) based recognition of a search glyph trace input against a training library
of prepared NN for existing on system fonts might work for doing our
handwritten "drawing" of a glyph. Is there a better data model to hold the NN
training of the fonts and match the user input against? Tesseract-ocr or
Caffe... [1][2]

Otherwise assume the FANN project's libfann 2.2.0 + [3][4] is license
compatible and can be compiled cross platform Windows, macOS, Linux, Android.

I notice you've set GUI to include a 80x150 px character input pad--but that
might be made a bit wider--and would square 150x150px facilitate resampling of
the node? 

And since then key in implementing the NN search is the FANN "resampling" of
the input to an NN node for search. Will working with 15x15px nodes and NN
trainings of the fonts provide sufficient resolution to differentiate glyphs
from more complex Unicode blocks/scripts (think of CJK "traditional" glyphs).

I'd think using an NN based on 32x32, or possibly 64x64 px matrix might be
required.  

Of course the more complex NN would require a larger cache to match
against--and building dynamically is out of the question. Also, some of the
fonts that would benefit from handwritten "drawing" search are going to hold
tens-of-thousands of glyphs, would a FANN based search scale that large?

Another issue would be UI--selecting and training the NN of each target font in
cache on system would be required.  The project could not host the NN for the
fonts (bandwidth and storage).  So, project could deploy a few of the FANN
based NN trainings with installation packaging, but LibreOffice GUI would have
to guide the user's selection of additional fonts to parse and hold
locally--and include some estimate of the size of the NN cache.

Finally, returns of NN search must remain keyed to Unicode point of the source
font as that remains significant--not clear the FANN would keeps the Unicode
details of the matching glyphs.

=-ref-=
[1] https://github.com/tesseract-ocr/
[2] https://github.com/BVLC/caffe

[3] https://github.com/libfann/fann
[4] http://leenissen.dk/fann/html/files2/advancedusage-txt.html

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20171228/fb847d5e/attachment-0001.html>