Uhaw Pa Sa Camel

 
d'Doc
Alabang, Muntinlupa City, Philippines
Beer-loving Gunner extraordinaire, perennial vocalist, guitarist, dog person, and wet kisser in one neat li'l package.

>> VIEW MY COMPLETE PROFILE

 

Subscribe to
Posts [Atom]  

Previous Posts

Powered by Blogger

 
Thursday, November 04, 2010

Python Short Hacking Tip #4: Don't mix encodings

 
Being a C/C++ programmer I really found it a pain handling string encodings. But in python, it is a breeze! Here are a few short tips for handling unicode and other types of encoding.
>>> #Create a unicode string
>>> s = u'Hello unicode world! Ü'
>>> s
u'Hello unicode world! \xdc'
>>>
>>> #Convert to an encoding using encode
>>> s = s.encode('utf-8')
>>> s
'Hello unicode world! \xc3\x9c'
>>>
>>> #Convert back to unicode using decode
>>> s = s.decode('utf-8')
>>> s
u'Hello unicode world! \xdc'
>>>
>>> #Convert to another encoding
>>> s = s.encode('iso-8859-1')
'Hello unicode world! \xdc'
>>>
>>> #Convert back to unicode
>>> s = s.decode('utf-8')
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xdc in position 21: unexpected end of data
. . .


This time when the string codec tried to convert the string back to unicode, it was expecting a utf-8 but we supplied a string in iso-8859-1 thus, the exception. Don't get your encodings mixed!

Not all situations results in exceptions, though. For example:

>>> #Create a unicode string
>>> s = u'Hello unicode world! Ü'
>>> s
u'Hello unicode world! \xdc'
>>>
>>> #Convert to an encoding using encode
>>> s = s.encode('utf-8')
>>> s
'Hello unicode world! \xc3\x9c'
>>>
>>> #Convert back to unicode using decode
>>> s = s.decode('iso-8859-1')
>>> s
u'Hello unicode world! \xc3\x9c'


As you can see we did not get the original unicode string anymore!

Subscribe to
Posts [Atom]

 
 

1 Comments:

Anonymous Anonymous said...

This is awesome, i learned too much with this unicode strings codings, i tried it on my python script, and it's worked fined!

Thank you for this tip Sir!

Keep it up!

Goodbless you.

MasterJEO

10:40 AM  

Post a Comment

<< Home