It is next to impossible to determine what encoding was used just by looking at a string of bytes. The second best thing for us is knowing whether the string is encoded using a certain specific encoding.
def is_encoding(enc, s)
try:
s.decode(enc)
return True
except UnicodeDecodeError:
return False
Sample run:
>>> is_encoding('utf-8', u'Hello World \xdc'.encode('iso-8859-1'))
False
Take note that if the characters in the byte string are all in the ascii set, is_encoding will return true even for the call above.
>>> is_encoding('utf-8', u'Hello World'.encode('iso-8859-1'))
True
0 Comments:
Post a Comment
<< Home