<div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, 9 Jan 2019 at 10:25, Jens Tröger <<a href="mailto:jens.troeger@light-speed.de">jens.troeger@light-speed.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br> > On Jan 9, 2019, at 16:06, Noel Grandin <<a href="mailto:noelgrandin@gmail.com" target="_blank">noelgrandin@gmail.com</a>> wrote:<br> > <br> > Nobody owns it, and you're welcome to file a bug, but there are already a ton of HTML import bugs, our support is really very basic.<br> <br> Well I’ve noticed in the past that bugs regarding the HTML filter received very little attention, unfortunately. If not the existing import filter, are there efforts to implement alternatives?<br> <br></blockquote><div><br></div><div><div class="gmail_default" style="font-family:tahoma,sans-serif">HTML is a fairly massive beast, so there are __always__ going to be bugs in our import filter.</div></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">I believe that Kohei started doing some parsing work over in the orcus library at</div><div class="gmail_default" style="font-family:tahoma,sans-serif"> <a href="https://gitlab.com/orcus/orcus">https://gitlab.com/orcus/orcus</a></div><div class="gmail_default" style="font-family:tahoma,sans-serif">and we use some of that (e.g. very very basic CSS parsing) somewhere in our code.</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">But for normal HTML we still use our own parser.</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">And the parsing is only a very small part anyhow, most of the work is in converting the HTML model to our own document model.</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br> Unfortunately, while I’m very comfortable with C++ and Linux etc, LO seems like a magnificent beast of code and I am unable to judge whether I’d be useful tackling these issues myself. It would be great to find whoever knows the code to discuss… <br></blockquote><div><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">Come hang out on the IRC channel and ask questions. Nobody really owns it, but various people have some knowledge about different parts of it and can point you in the right direction.</div><div class="gmail_default" style="font-family:tahoma,sans-serif">The bulk of the logic lives in</div><div class="gmail_default" style="font-family:tahoma,sans-serif"> sw/source/filter/html/*</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"></div></div></div></div>