[packagekit] Character encoding woes

Thu Nov 8 07:07:00 PST 2007

On 11/8/07, Richard Hughes <hughsient at gmail.com> wrote:
> On Thu, 2007-11-08 at 00:38 -0500, Matthias Clasen wrote:
> > On Nov 7, 2007 11:54 PM, James Bowes <jbowes at dangerouslyinc.com> wrote:
> > > Hi all:
> > >
> > > The package iwl3945-firmware in Fedora contains a copyright symbol in
> > > the description. By default when writing to a pipe python will encode
> > > in ascii, raising an exception when it hits this character. We can get
> > > around this by using some hacks to wrap sys.stdout/stderr in
> > > codecs.getwriter('utf=8'), but the backend or frontend does not
> > > display the results.
> > >
> > > I'm guessing this will require changing some things from vanilla
> > > character pointers to unicode strings, but I'm not sure. Anyone have
> > > any ideas ?
> >
> > Not being a python hacker, I don't have any ideas for a solution, but
> > just wanted to confirm that packagekit clearly needs to handle
> > summary, description and file lists containing
> > non-ascii utf-8. That is inevitable when dealing with translated descriptions.
>
> Totally agree. Can we add some unit tests for this in PkSpawn? I admit
> I'm a bit of a newbie with UTF8 and unicode, so an in depth explanation
> would be terrific. Thanks.

I don't know of anyone who isn't a unicode newbie :)
I'll pick this up again tonight and start writing some failing tests,
then go from there.

-James