ChatGPT解决这个技术问题 Extra ChatGPT

How to replace multiple substrings of a string?

I would like to use the .replace function to replace multiple strings.

I currently have

string.replace("condition1", "")

but would like to have something like

string.replace("condition1", "").replace("condition2", "text")

although that does not feel like good syntax

what is the proper way to do this? kind of like how in grep/regex you can do \1 and \2 to replace fields to certain search strings

Did you try all of the solutions provided? Which one is faster?
I have taken the time to test all answers in different scenarios. See stackoverflow.com/questions/59072514/…
Honestly, I prefer your chained approach to all the others. I landed here while looking for a solution and used yours and it works just fine.
@frakman1 +1. no clue why this is not upvoted more. All the other methods make code way harder to read. If there was a function pass arrays to replace, this would work. But your chained method is most clear (at least with a static number of replacements)
It seems the short answer is: there isn't a better way to do this.

M
Majid Ali Khan

Here is a short example that should do the trick with regular expressions:

import re

rep = {"condition1": "", "condition2": "text"} # define desired replacements here

# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems()) 
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)

For example:

>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'

The replacement happens in a single pass.
dkamins: it’s not too clever, it’s not even as clever as it should be (we should regex-escape the keys before joining them with "|"). why isn’t that overengineered? because this way we do it in one pass (=fast), and we do all the replacements at the same time, avoiding clashes like "spamham sha".replace("spam", "eggs").replace("sha","md5") being "eggmd5m md5" instead of "eggsham md5"
@AndrewClark I would greatly appreciate if you could explain what is happening on the last line with lambda.
Hi there, I created a small gist with a clearer version of this snippet. It should be also slightly more efficient: gist.github.com/bgusach/a967e0587d6e01e889fd1d776c5f3729
For python 3, use items() instead of iteritems().
r
root

You could just make a nice little looping function.

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

where text is the complete string and dic is a dictionary — each definition is a string that will replace a match to the term.

Note: in Python 3, iteritems() has been replaced with items()

Careful: Python dictionaries don't have a reliable order for iteration. This solution only solves your problem if:

order of replacements is irrelevant

it's ok for a replacement to change the results of previous replacements

Update: The above statement related to ordering of insertion does not apply to Python versions greater than or equal to 3.6, as standard dicts were changed to use insertion ordering for iteration.

For instance:

d = { "cat": "dog", "dog": "pig"}
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, d)
print(my_sentence)

Possible output #1:

"This is my pig and this is my pig."

Possible output #2

"This is my dog and this is my pig."

One possible fix is to use an OrderedDict.

from collections import OrderedDict
def replace_all(text, dic):
    for i, j in dic.items():
        text = text.replace(i, j)
    return text
od = OrderedDict([("cat", "dog"), ("dog", "pig")])
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, od)
print(my_sentence)

Output:

"This is my pig and this is my pig."

Careful #2: Inefficient if your text string is too big or there are many pairs in the dictionary.


The order in which you apply the different replacements will matter - so instead of using a standard dict, consider using an OrderedDict - or a list of 2-tuples.
This makes iterating the string twice... not good for performances.
Performance-wise it's worse than what Valentin says - it'll traverse the text as many times as there are items in dic! Fine if 'text' is small but, terrible for large text.
Note that this may give unexpected results because the newly inserted text in the first iteration can be matched in the second iteration. For example, if we naively try to replace all 'A' with 'B' and all 'B' with 'C', the string 'AB' would be transformed into 'CC', and not 'BC'.
NOTE: As of Python 3.7, "the insertion-order preservation nature of dict objects has been declared to be an official part of the Python language spec." -- 3.7 Release Notes
S
Snehal Parmar

Why not one solution like this?

s = "The quick brown fox jumps over the lazy dog"
for r in (("brown", "red"), ("lazy", "quick")):
    s = s.replace(*r)

#output will be:  The quick red fox jumps over the quick dog

This is super useful, simply and portable.
Looked nice, but not replacing regex like in: for r in ((r'\s.', '.'), (r'\s,' , ',')):
to make it 1-liner: ss = [s.replace(*r) for r in (("brown", "red"), ("lazy", "quick"))][0]
This suffers from the ordering problem of any multiple replace approach, "abc" and your replacements are (("a", "b"), ("b", "a")) you might expect "bac" but you get "aac". Also, there's the performance issue of scanning the entire string every time per call, so the complexity is at least O(number of replacements * len(s)), plus whatever string pattern matching happens under the hood.
@MarkK this is clever but very expensive memory-wise because it makes a giant list of all the intermediate results only to throw it all away to the garbage collector. functools.reduce would be a bit more respectful: reduce(lambda a, e: a.replace(*e), ("ab",), "abac"). Either way, I don't recommend the approach fundamentally (see comment above).
B
Björn Lindqvist

Here is a variant of the first solution using reduce, in case you like being functional. :)

repls = {'hello' : 'goodbye', 'world' : 'earth'}
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)

martineau's even better version:

repls = ('hello', 'goodbye'), ('world', 'earth')
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)

Would be simpler to make repls a sequence of tuples and do away with the iteritems() call. i.e. repls = ('hello', 'goodbye'), ('world', 'earth') and reduce(lambda a, kv: a.replace(*kv), repls, s). Would also work unchanged in Python 3.
nice! if you use python3 use items instead of iteritems (now removed in dicts stuff).
@martineau: It's not true that this works unchanged in python3 since reduce has been removed.
@normanius: reduce still exists, however it was made a part of the functools module (see the docs) in Python 3, so when I said unchanged, I meant the same code could be run—although admittedly it would require that reduce has been imported if necessary since it's no longer a built-in.
Syntax aside, this is fundamentally the same as many other solutions on this page which suffer from poor time complexity as well as ordering issues and unexpected behavior in the replacement.
m
mmj

This is just a more concise recap of F.J and MiniQuark great answers and last but decisive improvement by bgusach. All you need to achieve multiple simultaneous string replacements is the following function:

def multiple_replace(string, rep_dict):
    pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
    return pattern.sub(lambda x: rep_dict[x.group(0)], string)

Usage:

>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'

If you wish, you can make your own dedicated replacement functions starting from this simpler one.


While this is a good solution, concurrent string replacements won't give precisely the same results as performing them sequentially (chaining) them would -- although that may not matter.
Sure, with rep_dict = {"but": "mut", "mutton": "lamb"} the string "button" results in "mutton" with your code, but would give "lamb" if the replacements were chained, one after the other.
That is the main feature of this code, not a defect. With chained replacements it could not achieve the desired behaviour of substituting two words simultaneously and reciprocally like in my example.
It could not seem a great feature if you don't need it. But here we are talking about simultaneous replacements, then it is indeed the main feature. With "chained" replacements, the output of the example would be Do you prefer cafe? No, I prefer cafe., which is not desiderable at all.
Best answer. Most other do sequential not concurrent replacement.
M
MiniQuark

I built this upon F.J.s excellent answer:

import re

def multiple_replacer(*key_values):
    replace_dict = dict(key_values)
    replacement_function = lambda match: replace_dict[match.group(0)]
    pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
    return lambda string: pattern.sub(replacement_function, string)

def multiple_replace(string, *key_values):
    return multiple_replacer(*key_values)(string)

One shot usage:

>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.

Note that since replacement is done in just one pass, "café" changes to "tea", but it does not change back to "café".

If you need to do the same replacement many times, you can create a replacement function easily:

>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
                       u'Does this work?\tYes it does',
                       u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
...     print my_escaper(line)
... 
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"

Improvements:

turned code into a function

added multiline support

fixed a bug in escaping

easy to create a function for a specific multiple replacement

Enjoy! :-)


Could some one explain this step by step for python noobs like me?
Fellow python noob here, so I'm gonna take an incomplete shot at understanding it.. a. break down key_values into stuff-to-replace (keys joined by "|") and logic (if the match is a key, return value) b. make a regex parser ("pattern" that looks for keys, and uses given logic) - wrap this in a lambda function and return. Stuff I'm looking up now: re.M, and the necessity for lambda for replacement logic.
@Fox You got it. You could define a function instead of using a lambda, it's just to make the code shorter. But note that pattern.sub expects a function with just one parameter (the text to replace), so the function needs to have access to replace_dict. re.M allows Multiline replacements (it is well explained in the doc: docs.python.org/2/library/re.html#re.M).
This is a smart answer, as it handles overlapping and swapping by virtue of being a single scan through the string. So many of the other answers to this question are booby-trapped…
X
Xavier Guihot

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can apply the replacements within a list comprehension:

# text = "The quick brown fox jumps over the lazy dog"
# replacements = [("brown", "red"), ("lazy", "quick")]
[text := text.replace(a, b) for a, b in replacements]
# text = 'The quick red fox jumps over the quick dog'

Do you know if this is more efficient than using replace in a loop? I am testing all answers for performance but I don't have 3.8 yet.
Why do I get the output in a list?
@johnrao07 Well a list comprehension builds a list. That's why, in this case, you get ['The quick red fox jumps over the lazy dog', 'The quick red fox jumps over the quick dog']. But the assignment expression (text := text.replace) also iteratively builds new versions of text by mutating it. After the list comprehension, you can use the text variable that contains the modified text.
If you want to return the new version of text as a one-liner, you can also use [text := text.replace(a, b) for a, b in replacements][-1] (note the [-1]), which extracts the last element of the list comprehension; i.e. the last version of text.
This is a huge waste of space if you only need the last element. Don't use list comprehensions as reducers, although the linked answer isn't particularly efficient or useful since it suffers from replacement ordering problems, as does this.
F
Fredrik Pihl

I would like to propose the usage of string templates. Just place the string to be replaced in a dictionary and all is set! Example from docs.python.org

>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1, col 10
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
[...]
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'

Looks good, but when adding a key not provided in substitute raises an exception, so be careful when getting templates from users.
A drawback of this approach is that the template must contain all, and no more than all, $strings to be replaced, see here
J
James Koss

In my case, I needed a simple replacing of unique keys with names, so I thought this up:

a = 'This is a test string.'
b = {'i': 'I', 's': 'S'}
for x,y in b.items():
    a = a.replace(x, y)
>>> a
'ThIS IS a teSt StrIng.'

This works as long as you don't have a replacement clash. If you replaced i with s you would get a weird behaviour.
If order is significant, instead of the dict above you can use an array: b = [ ['i', 'Z'], ['s', 'Y'] ]; for x,y in (b): a = a.replace(x, y) Then if you're careful to order your array pairs you can ensure you don't replace() recursively.
It seems that dicts now maintain order, from Python 3.7.0. I tested it and it works in order on my machine with latest stable Python 3.
How is this any different from most of the other answers on this page?
b
bgusach

Here my $0.02. It is based on Andrew Clark's answer, just a little bit clearer, and it also covers the case when a string to replace is a substring of another string to replace (longer string wins)

def multireplace(string, replacements):
    """
    Given a string and a replacement map, it returns the replaced string.

    :param str string: string to execute replacements on
    :param dict replacements: replacement dictionary {value to find: value to replace}
    :rtype: str

    """
    # Place longer ones first to keep shorter substrings from matching
    # where the longer ones should take place
    # For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against 
    # the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
    substrs = sorted(replacements, key=len, reverse=True)

    # Create a big OR regex that matches any of the substrings to replace
    regexp = re.compile('|'.join(map(re.escape, substrs)))

    # For each match, look up the new string in the replacements
    return regexp.sub(lambda match: replacements[match.group(0)], string)

It is in this this gist, feel free to modify it if you have any proposal.


This should have been the accepted answer instead because the regex is constructed from all the keys by sorting them in descending order of length and joining them with the | regex alternation operator. And the sorting is necessary so that the longest of all possible choices is selected if there are any alternatives.
I agree that this is the best solution, thanks to the sorting. Apart from sorting is identical to my original answer, so I borrowed the sorting for my solution too, to make sure nobody will miss such an important feature.
佚名

I needed a solution where the strings to be replaced can be a regular expressions, for example to help in normalizing a long text by replacing multiple whitespace characters with a single one. Building on a chain of answers from others, including MiniQuark and mmj, this is what I came up with:

def multiple_replace(string, reps, re_flags = 0):
    """ Transforms string, replacing keys from re_str_dict with values.
    reps: dictionary, or list of key-value pairs (to enforce ordering;
          earlier items have higher priority).
          Keys are used as regular expressions.
    re_flags: interpretation of regular expressions, such as re.DOTALL
    """
    if isinstance(reps, dict):
        reps = reps.items()
    pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
                                  for i, re_str in enumerate(reps)),
                         re_flags)
    return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)

It works for the examples given in other answers, for example:

>>> multiple_replace("(condition1) and --condition2--",
...                  {"condition1": "", "condition2": "text"})
'() and --text--'

>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'

>>> multiple_replace("Do you like cafe? No, I prefer tea.",
...                  {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'

The main thing for me is that you can use regular expressions as well, for example to replace whole words only, or to normalize white space:

>>> s = "I don't want to change this name:\n  Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"

If you want to use the dictionary keys as normal strings, you can escape those before calling multiple_replace using e.g. this function:

def escape_keys(d):
    """ transform dictionary d by applying re.escape to the keys """
    return dict((re.escape(k), v) for k, v in d.items())

>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n  Philip II of Spain"

The following function can help in finding erroneous regular expressions among your dictionary keys (since the error message from multiple_replace isn't very telling):

def check_re_list(re_list):
    """ Checks if each regular expression in list is well-formed. """
    for i, e in enumerate(re_list):
        try:
            re.compile(e)
        except (TypeError, re.error):
            print("Invalid regular expression string "
                  "at position {}: '{}'".format(i, e))

>>> check_re_list(re_str_dict.keys())

Note that it does not chain the replacements, instead performs them simultaneously. This makes it more efficient without constraining what it can do. To mimic the effect of chaining, you may just need to add more string-replacement pairs and ensure the expected ordering of the pairs:

>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
...                             ("but", "mut"), ("mutton", "lamb")])
'lamb'

This is nice, thanks. Could it be improved to also allow backreferences to be used in the substitutions? I haven't immediately figured out how to add that.
The answer to my question above is stackoverflow.com/questions/45630940/…
Hi, I receive an error with this script TypeError: 'dict_items' object is not subscriptable. Can anyone help?
9
9000

Note: Test your case, see comments.

Here's a sample which is more efficient on long strings with many small replacements.

source = "Here is foo, it does moo!"

replacements = {
    'is': 'was', # replace 'is' with 'was'
    'does': 'did',
    '!': '?'
}

def replace(source, replacements):
    finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
    result = []
    pos = 0
    while True:
        match = finder.search(source, pos)
        if match:
            # cut off the part up until match
            result.append(source[pos : match.start()])
            # cut off the matched part and replace it in place
            result.append(replacements[source[match.start() : match.end()]])
            pos = match.end()
        else:
            # the rest after the last match
            result.append(source[pos:])
            break
    return "".join(result)

print replace(source, replacements)

The point is in avoiding many concatenations of long strings. We chop the source string to fragments, replacing some of the fragments as we form the list, and then join the whole thing back into a string.


Do you have benchmarks to support the performance assertions here?
@ggorlen: Actually the opposite: on strings within first few kilobytes, long string replacement and concatenation is faster, according to my tests.
G
George Pipis

You can use the pandas library and the replace function which supports both exact matches as well as regex replacements. For example:

df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})

to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']

print(df.text.replace(to_replace, replace_with, regex=True))

And the modified text is:

0    name is going to visit city in month
1                      I was born in date
2                 I will be there at time

You can find an example here. Notice that the replacements on the text are done with the order they appear in the lists


P
Pablo

I was struggling with this problem as well. With many substitutions regular expressions struggle, and are about four times slower than looping string.replace (in my experiment conditions).

You should absolutely try using the Flashtext library (blog post here, Github here). In my case it was a bit over two orders of magnitude faster, from 1.8 s to 0.015 s (regular expressions took 7.7 s) for each document.

It is easy to find use examples in the links above, but this is a working example:

    from flashtext import KeywordProcessor
    self.processor = KeywordProcessor(case_sensitive=False)
    for k, v in self.my_dict.items():
        self.processor.add_keyword(k, v)
    new_string = self.processor.replace_keywords(string)

Note that Flashtext makes substitutions in a single pass (to avoid a --> b and b --> c translating 'a' into 'c'). Flashtext also looks for whole words (so 'is' will not match 'this'). It works fine if your target is several words (replacing 'This is' by 'Hello').


How does this work if you need to replace HTML tags? E.g. replace <p> with /n. I tried your approach but with tags flashtext doesn't seem to parse it?
I'm not sure why it is not working as you expect. One possibility is that these tags are not separated by spaces, and remember Flashtext looks for whole words. A way around this is to use a simple replace first, so that "Hi

there" becomes "Hi

there". You would need to be careful to remove unwanted spaces when you are done (also simple replace?). Hope that helps.

Thanks, can you set < and > to mark end of a word (but be included in the replace)?
I believe that "words" are marked by spaces only. Perhaps there are some optional parameters you can set in "KeywordProcessor". Otherwise consider the approach above: substitute "<" by " <", apply Flashtext then substitute back (in you case, for example, " <" to "<" and " \n" to "\n" might work).
Thanks for mentioning this project. It perfectly solves a couple of requirements of mine.
m
mcsoini

I feel this question needs a single-line recursive lambda function answer for completeness, just because. So there:

>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.popitem()), d)

Usage:

>>> mrep('abcabc', {'a': '1', 'c': '2'})
'1b21b2'

Notes:

This consumes the input dictionary.

Python dicts preserve key order as of 3.6; corresponding caveats in other answers are not relevant anymore. For backward compatibility one could resort to a tuple-based version:

>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.pop()), d)
>>> mrep('abcabc', [('a', '1'), ('c', '2')])

Note: As with all recursive functions in python, too large recursion depth (i.e. too large replacement dictionaries) will result in an error. See e.g. here.


I run into RecursionError when using a large dictionary!
@Pablo Interesting. How large? Note that this happens for all recursive functions. See for example here: stackoverflow.com/questions/3323001/…
My dictionary of substitutions is close to 100k terms... so far using string.replace is by far the best approach.
@Pablo in that case you can't use recursive functions. In general, sys.getrecursionlimit() is a couple 1000, max. use a loop or something like that, or try to simplify the substitutions.
Yeah, I'm afraid there's really no shortcut here.
T
Tanvir Ahmed

I face similar problem today, where I had to do use .replace() method multiple times but it didn't feel good to me. So I did something like this:

REPLACEMENTS = {'<': '&lt;', '>': '&gt;', '&': '&amp;'}

event_title = ''.join([REPLACEMENTS.get(c,c) for c in event['summary']])

M
Miroslav Kašpar

I was doing a similar exercise in one of my school homework. This was my solution

dictionary = {1: ['hate', 'love'],
              2: ['salad', 'burger'],
              3: ['vegetables', 'pizza']}

def normalize(text):
    for i in dictionary:
        text = text.replace(dictionary[i][0], dictionary[i][1])
    return text

See result yourself on test string

string_to_change = 'I hate salad and vegetables'
print(normalize(string_to_change))

i
inspectorG4dget

You should really not do it this way, but I just find it way too cool:

>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>>     cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)

Now, answer is the result of all the replacements in turn

again, this is very hacky and is not something that you should be using regularly. But it's just nice to know that you can do something like this if you ever need to.


C
Carson

For replace only one character, use the translate and str.maketrans is my favorite method.

tl;dr > result_string = your_string.translate(str.maketrans(dict_mapping))

demo

my_string = 'This is a test string.'
dict_mapping = {'i': 's', 's': 'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
    result_bad = result_bad.replace(x, y)
print(result_good)  # ThsS sS a teSt Strsng.
print(result_bad)   # ThSS SS a teSt StrSng.

I love maketrans/translate too! Unfortunately not useful for word replacements as it can only replace single chars
d
del_hol

I don't know about speed but this is my workaday quick fix:

reduce(lambda a, b: a.replace(*b)
    , [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
    , 'tomato' #The string from which to replace values
    )

... but I like the #1 regex answer above. Note - if one new value is a substring of another one then the operation is not commutative.


T
Tommy Sandi

Starting from the precious answer of Andrew i developed a script that loads the dictionary from a file and elaborates all the files on the opened folder to do the replacements. The script loads the mappings from an external file in which you can set the separator. I'm a beginner but i found this script very useful when doing multiple substitutions in multiple files. It loaded a dictionary with more than 1000 entries in seconds. It is not elegant but it worked for me

import glob
import re

mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")

rep = {} # creation of empy dictionary

with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
    for line in temprep:
        (key, val) = line.strip('\n').split(sep)
        rep[key] = val

for filename in glob.iglob(mask): # recursion on all the files with the mask prompted

    with open (filename, "r") as textfile: # load each file in the variable text
        text = textfile.read()

        # start replacement
        #rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
        pattern = re.compile("|".join(rep.keys()))
        text = pattern.sub(lambda m: rep[m.group(0)], text)

        #write of te output files with the prompted suffice
        target = open(filename[:-4]+"_NEW.txt", "w")
        target.write(text)
        target.close()

e
emorjon2

this is my solution to the problem. I used it in a chatbot to replace the different words at once.

def mass_replace(text, dct):
    new_string = ""
    old_string = text
    while len(old_string) > 0:
        s = ""
        sk = ""
        for k in dct.keys():
            if old_string.startswith(k):
                s = dct[k]
                sk = k
        if s:
            new_string+=s
            old_string = old_string[len(sk):]
        else:
            new_string+=old_string[0]
            old_string = old_string[1:]
    return new_string

print mass_replace("The dog hunts the cat", {"dog":"cat", "cat":"dog"})

this will become The cat hunts the dog


A
Akhil Thayyil

Another example : Input list

error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']

The desired output would be

words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']

Code :

[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]] 

i
information_interchange

My approach would be to first tokenize the string, then decide for each token whether to include it or not.

Potentially, might be more performant, if we can assume O(1) lookup for a hashmap/set:

remove_words = {"we", "this"}
target_sent = "we should modify this string"
target_sent_words = target_sent.split()
filtered_sent = " ".join(list(filter(lambda word: word not in remove_words, target_sent_words)))

filtered_sent is now 'should modify string'


m
mnesarco

Here is a version with support for basic regex replacement. The main restriction is that expressions must not contain subgroups, and there may be some edge cases:

Code based on @bgusach and others

import re

class StringReplacer:

    def __init__(self, replacements, ignore_case=False):
        patterns = sorted(replacements, key=len, reverse=True)
        self.replacements = [replacements[k] for k in patterns]
        re_mode = re.IGNORECASE if ignore_case else 0
        self.pattern = re.compile('|'.join(("({})".format(p) for p in patterns)), re_mode)
        def tr(matcher):
            index = next((index for index,value in enumerate(matcher.groups()) if value), None)
            return self.replacements[index]
        self.tr = tr

    def __call__(self, string):
        return self.pattern.sub(self.tr, string)

Tests

table = {
    "aaa"    : "[This is three a]",
    "b+"     : "[This is one or more b]",
    r"<\w+>" : "[This is a tag]"
}

replacer = StringReplacer(table, True)

sample1 = "whatever bb, aaa, <star> BBB <end>"

print(replacer(sample1))

# output: 
# whatever [This is one or more b], [This is three a], [This is a tag] [This is one or more b] [This is a tag]

The trick is to identify the matched group by its position. It is not super efficient (O(n)), but it works.

index = next((index for index,value in enumerate(matcher.groups()) if value), None)

Replacement is done in one pass.


B
Brandon H

Or just for a fast hack:

for line in to_read:
    read_buffer = line              
    stripped_buffer1 = read_buffer.replace("term1", " ")
    stripped_buffer2 = stripped_buffer1.replace("term2", " ")
    write_to_file = to_write.write(stripped_buffer2)

S
Stefan Gruenwald

Here is another way of doing it with a dictionary:

listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)

佚名
sentence='its some sentence with a something text'

def replaceAll(f,Array1,Array2):
    if len(Array1)==len(Array2):
        for x in range(len(Array1)):
            return f.replace(Array1[x],Array2[x])

newSentence=replaceAll(sentence,['a','sentence','something'],['another','sentence','something something'])

print(newSentence)

Multiple returns are not possible from a single method