Python 中 unicode() 和 encode() 函数的使用

python string sqlite unicode encoding

我在编码路径变量并将其插入 SQLite 数据库时遇到问题。我尝试使用无帮助的 encode("utf-8") 函数来解决它。然后我使用了 unicode() 函数，它给了我类型 unicode。

print type(path)                  # <type 'unicode'>
path = path.replace("one", "two") # <type 'str'>
path = path.encode("utf-8")       # <type 'str'> strange
path = unicode(path)              # <type 'unicode'>

最后我获得了 unicode 类型，但是当路径变量的类型为 str 时，我仍然遇到相同的错误

sqlite3.ProgrammingError：除非您使用可以解释 8 位字节串的 text_factory（如 text_factory = str），否则不得使用 8 位字节串。强烈建议您将应用程序切换为 Unicode 字符串。

您能帮我解决这个错误并解释 encode("utf-8") 和 unicode() 函数的正确用法吗？我经常和它打架。

编辑：

此 execute() 语句引发了错误：

cur.execute("update docs set path = :fullFilePath where path = :path", locals())

我忘记更改遇到同样问题的 fullFilePath 变量的编码，但我现在很困惑。我应该只使用 unicode() 还是 encode("utf-8") 或两者都使用？

我无法使用

fullFilePath = unicode(fullFilePath.encode("utf-8"))

因为它引发了这个错误：

UnicodeDecodeError：“ascii”编解码器无法解码位置 32 中的字节 0xc5：序数不在范围内（128）

Python 版本为 2.7.2

引发错误的代码在哪里？

您的确切问题已得到解答：[stackoverflow.com/questions/2392732/… [1]: stackoverflow.com/questions/2392732/…

@newtover 我编辑了这个问题。

您是否将两个使用的变量都转换为 unicode？

学习 Python 3 handles 文本和数据如何真正帮助我理解了一切。然后很容易将知识应用到 Python 2。

newtover

str 是以字节为单位的文本表示，unicode 是以字符为单位的文本表示。

您将文本从字节解码为 unicode，并将 unicode 编码为具有某种编码的字节。

那是：

>>> 'abc'.decode('utf-8')  # str to unicode
u'abc'
>>> u'abc'.encode('utf-8') # unicode to str
'abc'

UPD 2020 年 9 月：答案是在大多数使用 Python 2 时编写的。在 Python 3 中，str 被重命名为 bytes，而 unicode 被重命名为 str。

>>> b'abc'.decode('utf-8') # bytes to str
'abc'
>>> 'abc'.encode('utf-8'). # str to bytes
b'abc'

很好的回答，直截了当。我要补充一点，unicode 表示字母或符号，或者更笼统地说：runes 而 str 表示特定编码的字节字符串，您必须 decode（显然在正确的编码）来获取特定的符文

Python 3.8>> 'str' object has no attribute 'decode'

您有将 unicode 更改为 str 的文档吗？我找不到

@cikatomo 这是 Python 3 的关键变化之一：docs.python.org/3.0/whatsnew/…

Andrew Clark

您错误地使用了 encode("utf-8")。 Python 字节字符串（str 类型）有编码，Unicode 没有。您可以使用 uni.encode(encoding) 将 Unicode 字符串转换为 Python 字节字符串，也可以使用 s.decode(encoding)（或等效的 unicode(s, encoding)）将字节字符串转换为 Unicode 字符串。

如果 fullFilePath 和 path 当前是 str 类型，您应该弄清楚它们是如何编码的。例如，如果当前编码是 utf-8，您将使用：

path = path.decode('utf-8')
fullFilePath = fullFilePath.decode('utf-8')

如果这不能解决问题，实际问题可能是您没有在 execute() 调用中使用 Unicode 字符串，请尝试将其更改为以下内容：

cur.execute(u"update docs set path = :fullFilePath where path = :path", locals())

此语句 fullFilePath = fullFilePath.decode("utf-8") 仍会引发错误 UnicodeEncodeError: 'ascii' codec can't encode characters in position 32-34: ordinal not in range(128)。 fullFilePath 是类型 str 和取自 db 表的 text 列的字符串的组合，应该是 utf-8 编码。

根据 this 但它可以是 UTF-8、UTF-16BE 或 UTF-16LE。我能以某种方式找到它吗？

@xralf，如果您组合不同的 str 对象，您可能会混合编码。你能显示print repr(fullFilePath)的结果吗？

我只能在调用 decode() 之前显示它。有问题的字符是 \u0161 和 \u0165。

@xralf - 所以它已经是 unicode 了？尝试将执行调用更改为 unicode：cur.execute(u"update docs set path = :fullFilePath where path = :path", locals())

kenorb

确保您在从 shell 运行脚本之前设置了您的语言环境设置，例如

$ locale -a | grep "^en_.\+UTF-8"
en_GB.UTF-8
en_US.UTF-8
$ export LC_ALL=en_GB.UTF-8
$ export LANG=en_GB.UTF-8

文档：man locale、man setlocale。

Python 中 unicode() 和 encode() 函数的使用

关注公众号

想领先一步获取最新的外包任务吗？

相似问题

平台

支持

联系我们