替换字符串中多个字符的最佳方法？

n

nitin3685

替换两个字符

我对当前答案中的所有方法以及一个额外的方法进行了计时。

输入字符串 abc&def#ghi 并替换 & -> \&和#-> \#，最快的方法是像这样将替换链接在一起：text.replace('&', '\&').replace('#', '\#')。

每个功能的计时：

a) 1000000 次循环，3 次中的最佳：每个循环 1.47 μs

b) 1000000 次循环，3 次中的最佳：每个循环 1.51 μs

c) 100000 次循环，3 次中的最佳：每个循环 12.3 μs

d) 100000 次循环，3 次取胜：每个循环 12 μs

e) 100000 次循环，3 次中的最佳：每个循环 3.27 μs

f) 1000000 次循环，3 次中的最佳：每个循环 0.817 μs

g) 100000 次循环，3 次中的最佳：每个循环 3.64 μs

h) 1000000 次循环，3 次中的最佳：每个循环 0.927 μs

i) 1000000 次循环，3 次中的最佳：每个循环 0.814 μs

以下是功能：

def a(text):
    chars = "&#"
    for c in chars:
        text = text.replace(c, "\\" + c)


def b(text):
    for ch in ['&','#']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)


import re
def c(text):
    rx = re.compile('([&#])')
    text = rx.sub(r'\\\1', text)


RX = re.compile('([&#])')
def d(text):
    text = RX.sub(r'\\\1', text)


def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('&#')
def e(text):
    esc(text)


def f(text):
    text = text.replace('&', '\&').replace('#', '\#')


def g(text):
    replacements = {"&": "\&", "#": "\#"}
    text = "".join([replacements.get(c, c) for c in text])


def h(text):
    text = text.replace('&', r'\&')
    text = text.replace('#', r'\#')


def i(text):
    text = text.replace('&', r'\&').replace('#', r'\#')

定时是这样的：

python -mtimeit -s"import time_functions" "time_functions.a('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.b('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.c('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.d('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.e('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.f('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.g('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.h('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.i('abc&def#ghi')"

替换 17 个字符

这里有类似的代码来做同样的事情，但有更多的字符要转义 (\`*_{}>#+-.!$)：

def a(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        text = text.replace(c, "\\" + c)


def b(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)


import re
def c(text):
    rx = re.compile('([&#])')
    text = rx.sub(r'\\\1', text)


RX = re.compile('([\\`*_{}[]()>#+-.!$])')
def d(text):
    text = RX.sub(r'\\\1', text)


def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('\\`*_{}[]()>#+-.!$')
def e(text):
    esc(text)


def f(text):
    text = text.replace('\\', '\\\\').replace('`', '\`').replace('*', '\*').replace('_', '\_').replace('{', '\{').replace('}', '\}').replace('[', '\[').replace(']', '\]').replace('(', '\(').replace(')', '\)').replace('>', '\>').replace('#', '\#').replace('+', '\+').replace('-', '\-').replace('.', '\.').replace('!', '\!').replace('$', '\$')


def g(text):
    replacements = {
        "\\": "\\\\",
        "`": "\`",
        "*": "\*",
        "_": "\_",
        "{": "\{",
        "}": "\}",
        "[": "\[",
        "]": "\]",
        "(": "\(",
        ")": "\)",
        ">": "\>",
        "#": "\#",
        "+": "\+",
        "-": "\-",
        ".": "\.",
        "!": "\!",
        "$": "\$",
    }
    text = "".join([replacements.get(c, c) for c in text])


def h(text):
    text = text.replace('\\', r'\\')
    text = text.replace('`', r'\`')
    text = text.replace('*', r'\*')
    text = text.replace('_', r'\_')
    text = text.replace('{', r'\{')
    text = text.replace('}', r'\}')
    text = text.replace('[', r'\[')
    text = text.replace(']', r'\]')
    text = text.replace('(', r'\(')
    text = text.replace(')', r'\)')
    text = text.replace('>', r'\>')
    text = text.replace('#', r'\#')
    text = text.replace('+', r'\+')
    text = text.replace('-', r'\-')
    text = text.replace('.', r'\.')
    text = text.replace('!', r'\!')
    text = text.replace('$', r'\$')


def i(text):
    text = text.replace('\\', r'\\').replace('`', r'\`').replace('*', r'\*').replace('_', r'\_').replace('{', r'\{').replace('}', r'\}').replace('[', r'\[').replace(']', r'\]').replace('(', r'\(').replace(')', r'\)').replace('>', r'\>').replace('#', r'\#').replace('+', r'\+').replace('-', r'\-').replace('.', r'\.').replace('!', r'\!').replace('$', r'\$')

以下是相同输入字符串 abc&def#ghi 的结果：

a) 100000 次循环，3 次中的最佳：每个循环 6.72 μs

b) 100000 次循环，3 次中的最佳：每个循环 2.64 μs

c) 100000 次循环，3 次中的最佳：每个循环 11.9 μs

d) 100000 次循环，3 次中的最佳：每个循环 4.92 μs

e) 100000 次循环，3 次中的最佳：每个循环 2.96 μs

f) 100000 次循环，3 次中的最佳：每个循环 4.29 μs

g) 100000 次循环，3 次中的最佳：每个循环 4.68 μs

h) 100000 次循环，3 次中的最佳：每个循环 4.73 μs

i) 100000 次循环，3 次中的最佳：每个循环 4.24 μs

并使用更长的输入字符串 (## *Something* and [another] thing in a longer sentence with {more} things to replace$)：

a) 100000 次循环，3 次中的最佳：每个循环 7.59 μs

b) 100000 次循环，3 次中的最佳：每个循环 6.54 μs

c) 100000 次循环，3 次中的最佳：每个循环 16.9 μs

d) 100000 次循环，3 次中的最佳：每个循环 7.29 μs

e) 100000 次循环，3 次中的最佳：每个循环 12.2 μs

f) 100000 次循环，3 次中的最佳：每个循环 5.38 μs

g) 10000 次循环，3 次中的最佳：每个循环 21.7 μs

h) 100000 次循环，3 次中的最佳：每个循环 5.7 μs

i) 100000 次循环，3 次中的最佳：每个循环 5.13 μs

添加几个变体：

def ab(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        text = text.replace(ch,"\\"+ch)


def ba(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        if c in text:
            text = text.replace(c, "\\" + c)

使用较短的输入：

ab) 100000 次循环，3 次中的最佳：每个循环 7.05 μs

ba) 100000 次循环，3 次中的最佳：每个循环 2.4 μs

使用更长的输入：

ab) 100000 次循环，3 次中的最佳：每个循环 7.71 μs

ba) 100000 次循环，3 次中的最佳：每个循环 6.08 μs

因此，我将使用 ba 来提高可读性和速度。

附录

由评论中的黑客提示，ab 和 ba 之间的一个区别是 if c in text: 检查。让我们针对另外两个变体对它们进行测试：

def ab_with_check(text):
    for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
        if ch in text:
            text = text.replace(ch,"\\"+ch)

def ba_without_check(text):
    chars = "\\`*_{}[]()>#+-.!$"
    for c in chars:
        text = text.replace(c, "\\" + c)

在 Python 2.7.14 和 3.6.3 上以及在与早期设置不同的机器上，每个循环的时间（以 μs 为单位），因此无法直接比较。

╭────────────╥──────┬───────────────┬──────┬──────────────────╮
│ Py, input  ║  ab  │ ab_with_check │  ba  │ ba_without_check │
╞════════════╬══════╪═══════════════╪══════╪══════════════════╡
│ Py2, short ║ 8.81 │    4.22       │ 3.45 │    8.01          │
│ Py3, short ║ 5.54 │    1.34       │ 1.46 │    5.34          │
├────────────╫──────┼───────────────┼──────┼──────────────────┤
│ Py2, long  ║ 9.3  │    7.15       │ 6.85 │    8.55          │
│ Py3, long  ║ 7.43 │    4.38       │ 4.41 │    7.02          │
└────────────╨──────┴───────────────┴──────┴──────────────────┘

我们可以得出结论：

有支票的人比没有支票的人快 4 倍

ab_with_check 在 Python 3 上略微领先，但 ba（带检查）在 Python 2 上的领先优势更大

然而，最大的教训是 Python 3 比 Python 2 快 3 倍！ Python 3 上最慢的和 Python 2 上最快的并没有太大的区别！

为什么这不是例外的答案？

@雨果；我认为这种时间差异是因为只有在 ba 的情况下在 text 中找到 c 时才会调用 replace，而在 ab 的每次迭代中都会调用它。

@hacks 谢谢，我已经用更多时间更新了我的答案：添加检查对两者都更好，但最大的教训是 Python 3 的速度提高了 3 倍！

@雨果：i.pinimg.com/originals/ed/55/82/…

你先生是一个真正的英雄

g

ghostdog74

>>> string="abc&def#ghi"
>>> for ch in ['&','#']:
...   if ch in string:
...      string=string.replace(ch,"\\"+ch)
...
>>> print string
abc\&def\#ghi

为什么需要双反斜杠？为什么“\”不起作用？

双反斜杠转义反斜杠，否则python会将“\”解释为仍在打开的字符串中的文字引号字符。

为什么需要string=string.replace(ch,"\\"+ch)？仅 string.replace(ch,"\\"+ch) 还不够吗？

@MattSom replace() 不会修改原始字符串，而是返回一个副本。因此，您需要分配代码才能产生任何效果。

你真的需要if吗？无论如何，这看起来像是替换将要执行的操作的重复。

t

tommy.carstensen

这是使用 str.translate 和 str.maketrans 的 python3 方法：

s = "abc&def#ghi"
print(s.translate(str.maketrans({'&': '\&', '#': '\#'})))

打印的字符串是 abc\&def\#ghi。

这是一个很好的答案，但实际上执行一个 .translate() 似乎比三个链接的 .replace() 慢（使用 CPython 3.6.4）。

@Changaco感谢您安排时间👍实际上我会自己使用replace()，但为了完整起见，我添加了这个答案。

对于大字符串和许多替换，这应该更快，尽管一些测试会很好......

好吧，它不在我的机器上（2 和 17 替换相同）。

此方法允许执行链接版本不执行的“破坏性替换”。例如，将“a”替换为“b”，将“b”替换为“a”。

t

thefourtheye

像这样简单地链接 replace 函数

strs = "abc&def#ghi"
print strs.replace('&', '\&').replace('#', '\#')
# abc\&def\#ghi

如果替换的数量更多，您可以通过这种通用方式执行此操作

strs, replacements = "abc&def#ghi", {"&": "\&", "#": "\#"}
print "".join([replacements.get(c, c) for c in strs])
# abc\&def\#ghi

k

kennytm

你总是要在前面加上反斜杠吗？如果是这样，请尝试

import re
rx = re.compile('([&#])')
#                  ^^ fill in the characters here.
strs = rx.sub('\\\\\\1', strs)

这可能不是最有效的方法，但我认为它是最简单的。

S

Sebastialonso

聚会迟到了，但在找到答案之前，我在这个问题上浪费了很多时间。

短而甜，translate优于replace。如果您对随时间优化的功能更感兴趣，请不要使用 replace。

如果您不知道要替换的字符集是否与要替换的字符集重叠，也可以使用 translate。

一个例子：

使用 replace，您会天真地期望片段 "1234".replace("1", "2").replace("2", "3").replace("3", "4") 返回 "2344"，但实际上它会返回 "4444"。

翻译似乎执行了 OP 最初想要的。

V

Victor Olex

您可以考虑编写一个通用转义函数：

def mk_esc(esc_chars):
    return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])

>>> esc = mk_esc('&#')
>>> print esc('Learn & be #1')
Learn \& be \#1

这样，您可以使用应转义的字符列表使您的函数可配置。

k

krm

对于 Python 3.8 及更高版本，可以使用赋值表达式

[text := text.replace(s, f"\\{s}") for s in "&#" if s in text];

虽然，我很不确定这是否会被视为 PEP 572 中描述的赋值表达式的“适当使用”，但看起来很干净并且读起来很好（在我看来）。如果你在 REPL 中运行，最后的分号会抑制输出。

如果您还想要所有中间字符串，这将是“合适的”。例如，（删除所有小写元音）：

text = "Lorem ipsum dolor sit amet"
intermediates = [text := text.replace(i, "") for i in "aeiou" if i in text]

['Lorem ipsum dolor sit met',
 'Lorm ipsum dolor sit mt',
 'Lorm psum dolor st mt',
 'Lrm psum dlr st mt',
 'Lrm psm dlr st mt']

从好的方面来说，它似乎（出乎意料？）比接受答案中的一些更快的方法更快，并且似乎在增加字符串长度和增加替换数量的情况下表现良好。

https://i.stack.imgur.com/Pq6dJ.png

上述比较的代码如下。我正在使用随机字符串让我的生活更简单一些，并且要替换的字符是从字符串本身中随机选择的。（注意：我在这里使用了 ipython 的 %timeit 魔法，所以在 ipython/jupyter 中运行它）。

import random, string

def make_txt(length):
    "makes a random string of a given length"
    return "".join(random.choices(string.printable, k=length))

def get_substring(s, num):
    "gets a substring"
    return "".join(random.choices(s, k=num))

def a(text, replace): # one of the better performing approaches from the accepted answer
    for i in replace:
        if i in text:
             text = text.replace(i, "")

def b(text, replace):
    _ = (text := text.replace(i, "") for i in replace if i in text) 


def compare(strlen, replace_length):
    "use ipython / jupyter for the %timeit functionality"

    times_a, times_b = [], []

    for i in range(*strlen):
        el = make_txt(i)
        et = get_substring(el, replace_length)

        res_a = %timeit -n 1000 -o a(el, et) # ipython magic

        el = make_txt(i)
        et = get_substring(el, replace_length)
        
        res_b = %timeit -n 1000 -o b(el, et) # ipython magic

        times_a.append(res_a.average * 1e6)
        times_b.append(res_b.average * 1e6)
        
    return times_a, times_b

#----run
t2 = compare((2*2, 1000, 50), 2)
t10 = compare((2*10, 1000, 50), 10)

p

parity3

仅供参考，这对 OP 几乎没有用处，但它可能对其他读者有用（请不要投反对票，我知道这一点）。

作为一个有点荒谬但有趣的练习，想看看我是否可以使用 python 函数式编程来替换多个字符。我很确定这不会比两次调用 replace() 更好。如果性能是一个问题，你可以在 rust、C、julia、perl、java、javascript 甚至 awk 中轻松击败它。它使用名为 pytoolz 的外部“帮助程序”包，通过 cython (cytoolz, it's a pypi package) 加速。

from cytoolz.functoolz import compose
from cytoolz.itertoolz import chain,sliding_window
from itertools import starmap,imap,ifilter
from operator import itemgetter,contains
text='&hello#hi&yo&'
char_index_iter=compose(partial(imap, itemgetter(0)), partial(ifilter, compose(partial(contains, '#&'), itemgetter(1))), enumerate)
print '\\'.join(imap(text.__getitem__, starmap(slice, sliding_window(2, chain((0,), char_index_iter(text), (len(text),))))))

我什至不打算解释这一点，因为没有人会费心使用它来完成多个替换。尽管如此，我觉得这样做有点成就感，并认为它可能会激励其他读者或赢得代码混淆竞赛。

你知道，“函数式编程”并不意味着“使用尽可能多的函数”。

这是一个非常好的纯功能多字符替换器：gist.github.com/anonymous/4577424f586173fc6b91a215ea2ce89e 无分配、无突变、无副作用。也可读。

j

jewishmoses

这个怎么样？

def replace_all(dict, str):
    for key in dict:
        str = str.replace(key, dict[key])
    return str

然后

print(replace_all({"&":"\&", "#":"\#"}, "&#"))

输出

\&\#

类似于 answer

C

CasualCoder3

使用在 python2.7 和 python3.* 中可用的 reduce，您可以轻松地以干净和 python 的方式替换多个子字符串。

# Lets define a helper method to make it easy to use
def replacer(text, replacements):
    return reduce(
        lambda text, ptuple: text.replace(ptuple[0], ptuple[1]), 
        replacements, text
    )

if __name__ == '__main__':
    uncleaned_str = "abc&def#ghi"
    cleaned_str = replacer(uncleaned_str, [("&","\&"),("#","\#")])
    print(cleaned_str) # "abc\&def\#ghi"

在 python2.7 中，您不必导入 reduce，但在 python3.* 中，您必须从 functools 模块中导入它。

添加“if”条件（Hugo 提到的变体 ba）：lambda text, ptuple: text.replace(ptuple[0], ptuple[1]) if ptuple[0] in text else text

A

Ahmed4end

使用正则表达式的高级方法

import re
text = "hello ,world!"
replaces = {"hello": "hi", "world":" 2020", "!":"."}
regex = re.sub("|".join(replaces.keys()), lambda match: replaces[match.string[match.start():match.end()]], text)
print(regex)

T

Tiago Wutzke de Oliveira

也许是一个简单的循环替换字符：

a = '&#'

to_replace = ['&', '#']

for char in to_replace:
    a = a.replace(char, "\\"+char)

print(a)

>>> \&\#

j

jonesy

>>> a = '&#'
>>> print a.replace('&', r'\&')
\&#
>>> print a.replace('#', r'\#')
&\#
>>>

您想使用“原始”字符串（由替换字符串前缀的“r”表示），因为原始字符串不会特别处理反斜杠。

C

Crawsome

这将帮助寻找简单解决方案的人。

def replacemany(our_str, to_be_replaced:tuple, replace_with:str):
    for nextchar in to_be_replaced:
        our_str = our_str.replace(nextchar, replace_with)
    return our_str

os = 'the rain in spain falls mainly on the plain ttttttttt sssssssssss nnnnnnnnnn'
tbr = ('a','t','s','n')
rw = ''

print(replacemany(os,tbr,rw))

输出：

he ri i pi fll mily o he pli

替换字符串中多个字符的最佳方法？

关注公众号

想领先一步获取最新的外包任务吗？

相似问题

平台

支持

联系我们