In Python 2.x, the default encoding is ASCII. However, ASCII encoding cannot always represent non-English languages (for example, accents in French, or Asian characters), or arbitrary text input (for example, music notes).
The Unicode encoding method assigns a unique number to represent each character of every language. It can be implemented by different character encodings, the more commonly used being UTF-8 and UTF-16. UTF-8 uses one byte for any ASCII character (ASCII characters have the same value in both UTF-8 and ASCII encoding), and up to four bytes for other characters. UTF-16 uses 4 8-bit bytes for other characters.
You can create a Unicode literal as a string prefixed with a 'u' or 'U' character, for example: u'abcdefj'.
ASCII strings are of type 'str', while Unicode strings are of type 'unicode'.
If you have a Unicode file path and/or Unicode file name, you must first change these strings to Unicode strings using the unicode() constructor; or, prefix it with 'u' or 'U', such as:
unicode('abcdef')
u'abcdef'
Use codecs.open to read/write a Unicode file. For example:
codecs.open(fname, encoding='utf-8', mode='w+')
In the above example, a file with the name fname is created and opened in write mode, and its encoding is set to UTF-8 in order to handle Unicode characters.