[Mesa-dev] [PATCH 02/16] docs: Add python script that converts html to rst.

Laura Ekstrand laura at jlekstrand.net
Tue May 29 23:11:31 UTC 2018


Nevermind, Jason used a git revert trick to get it to work.  The v2 will
have the diff and preserve the git file history.

On Fri, May 25, 2018 at 7:58 PM, Laura Ekstrand <laura at jlekstrand.net>
wrote:

> I specifically tried forcing a rename earlier, but it doesn't work.  Git
> sees too much change.  The only way I could get it to work was manually
> renaming the HTML files to rst first, then committing, then converting to
> rst.
>
> The problem with that strategy is that then the Pandoc command for
> converting to rst doesn't make sense.  (.rst to .rst? What?)
>
> Laura
>
> On Fri, May 25, 2018, 4:26 AM Eric Engestrom <eric.engestrom at intel.com>
> wrote:
>
>> On Thursday, 2018-05-24 17:27:05 -0700, Laura Ekstrand wrote:
>> > Use Beautiful Soup to fix bad html, then use pandoc for converting to
>> > rst.
>> > ---
>> >  docs/rstConverter.py | 23 +++++++++++++++++++++++
>> >  1 file changed, 23 insertions(+)
>> >  create mode 100755 docs/rstConverter.py
>> >
>> > diff --git a/docs/rstConverter.py b/docs/rstConverter.py
>> > new file mode 100755
>> > index 0000000000..5321fdde8b
>> > --- /dev/null
>> > +++ b/docs/rstConverter.py
>> > @@ -0,0 +1,23 @@
>> > +#!/usr/bin/python3
>> > +import glob
>> > +import subprocess
>> > +from bs4 import BeautifulSoup
>> > +
>> > +pages = glob.glob("*.html")
>> > +pages += glob.glob("relnotes/*.html")
>> > +for filename in pages:
>> > +    # Fix some annoyingly bad html.
>> > +    with open(filename) as f:
>> > +        soup = BeautifulSoup(f, 'html5lib')
>> > +    soup.find("div", "header").extract() # Get rid of old header
>> > +    soup.iframe.extract() # Get rid of old contents bar.
>> > +    soup.find("div", "content").unwrap() # Strip the content div.
>>
>> Good call on using beautifulsoup to clean the html before converting it!
>>
>> > +
>> > +    # Write out the better html.
>> > +    with open(filename, 'wt') as f:
>> > +        f.write(str(soup))
>> > +
>> > +    # Convert to rst with pandoc.
>> > +    name = filename.split(".html")[0]
>> > +    bashCmd = "pandoc " + filename + " -o " + name + ".rst"
>> > +    subprocess.run(bashCmd.split())
>>
>> Idea: remove the old html at the same time as we introduce the rst
>> (commit-wise), so that git picks it up as a rename with changes, which
>> hopefully would be easier to check as a 1:1 of any given conversion?
>>
>> (In case this is as unclear as I think it is, I'm thinking about how we
>> can review individual pages conversions; say index.html -> index.rst, to
>> see that no release has been dropped in the process. If git shows this
>> as a rename with changes, I expect it will be easier to check than if
>> one commit creates all the rst files and another deletes all the html)
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180529/f02fb4c4/attachment-0001.html>


More information about the mesa-dev mailing list