Split by comma and strip whitespace in Python

python whitespace strip

I have some python code that splits on comma, but doesn't strip the whitespace:

>>> string = "blah, lots  ,  of ,  spaces, here "
>>> mylist = string.split(',')
>>> print mylist
['blah', ' lots  ', '  of ', '  spaces', ' here ']

I would rather end up with whitespace removed like this:

['blah', 'lots', 'of', 'spaces', 'here']

I am aware that I could loop through the list and strip() each item but, as this is Python, I'm guessing there's a quicker, easier and more elegant way of doing it.

Sean Vieira

Use list comprehension -- simpler, and just as easy to read as a for loop.

my_string = "blah, lots  ,  of ,  spaces, here "
result = [x.strip() for x in my_string.split(',')]
# result is ["blah", "lots", "of", "spaces", "here"]

See: Python docs on List Comprehension
A good 2 second explanation of list comprehension.

Super good! I added one item as follows to get rid of the blank list entries. > text = [x.strip() for x in text.split('.') if x != '']

@Sean: was invalid/incomplete python code your "original intent of the post"? According to the review wankers it was: stackoverflow.com/review/suggested-edits/21504253. Can you please tell them otherwise by making the correction if they are wrong (again)?

The original was copy-pasted from a REPL (if I remember correctly) and the goal was understanding of the underlying concept (using list comprehension to perform an operation) - but you're right, it makes more sense if you see that list comprehension produces a new list.

wjandrea

I came to add:

map(str.strip, string.split(','))

but saw it had already been mentioned by Jason Orendorff in a comment.

Reading Glenn Maynard's comment on the same answer suggesting list comprehensions over map I started to wonder why. I assumed he meant for performance reasons, but of course he might have meant for stylistic reasons, or something else (Glenn?).

So a quick (possibly flawed?) test on my box (Python 2.6.5 on Ubuntu 10.04) applying the three methods in a loop revealed:

$ time ./list_comprehension.py  # [word.strip() for word in string.split(',')]
real    0m22.876s

$ time ./map_with_lambda.py     # map(lambda s: s.strip(), string.split(','))
real    0m25.736s

$ time ./map_with_str.strip.py  # map(str.strip, string.split(','))
real    0m19.428s

making map(str.strip, string.split(',')) the winner, although it seems they are all in the same ballpark.

Certainly though map (with or without a lambda) should not necessarily be ruled out for performance reasons, and for me it is at least as clear as a list comprehension.

better answer list(map(str.strip, string.split(',')))

tbc0

Split using a regular expression. Note I made the case more general with leading spaces. The list comprehension is to remove the null strings at the front and back.

>>> import re
>>> string = "  blah, lots  ,  of ,  spaces, here "
>>> pattern = re.compile("^\s+|\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['blah', 'lots', 'of', 'spaces', 'here']

This works even if ^\s+ doesn't match:

>>> string = "foo,   bar  "
>>> print([x for x in pattern.split(string) if x])
['foo', 'bar']
>>>

Here's why you need ^\s+:

>>> pattern = re.compile("\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['  blah', 'lots', 'of', 'spaces', 'here']

See the leading spaces in blah?

Clarification: above uses the Python 3 interpreter, but results are the same in Python 2.

I believe [x.strip() for x in my_string.split(',')] is more pythonic for the question asked. Maybe there are cases where my solution is necessary. I'll update this content if I run across one.

Why is ^\s+ necessary? I've tested your code without it and it doesn't work, but I don't know why.

If I use re.compile("^\s*,\s*$"), result is [' blah, lots , of , spaces, here '].

@laike9m, I updated my answer to show you the difference. ^\s+ makes. As you can see for yourself, ^\s*,\s*$ doesn't return desired results, either. So if you want split with a regexp, use ^\s+|\s*,\s*|\s+$.

The first match is empty if the leading pattern (^\s+) doesn't match so you get something like [ '', 'foo', 'bar' ] for the string "foo, bar".

user489041

Just remove the white space from the string before you split it.

mylist = my_string.replace(' ','').split(',')

Kind of a problem if the items separated by commas contain embedded spaces, e.g. "you just, broke this".

Geeze, a -1 for this. You guys are tough. It solved his problem, providing his sample data was only single words and there was no specification that the data would be phrases. But w/e, I guess thats how you guys roll around here.

Well thanks anyway, user. To be fair though I specifically asked for split and then strip() and strip removes leading and trailing whitespace and doesn't touch anything in between. A slight change and your answer would work perfectly, though: mylist = mystring.strip().split(',') although I don't know if this is particularly efficient.

Brad Montgomery

I know this has already been answered, but if you end doing this a lot, regular expressions may be a better way to go:

>>> import re
>>> re.sub(r'\s', '', string).split(',')
['blah', 'lots', 'of', 'spaces', 'here']

The \s matches any whitespace character, and we just replace it with an empty string ''. You can find more info here: http://docs.python.org/library/re.html#re.sub

Your example would not work on strings containing spaces. "for, example this, one" would become "for", "examplethis", "one". Not saying it's a BAD solution (it works perfectly on my example) it just depends on the task in hand!

Yep, that's very correct! You could probably adjust the regexp so it can handle strings with spaces, but if the list comprehension works, I'd say stick with it ;)

user470379

map(lambda s: s.strip(), mylist) would be a little better than explicitly looping. Or for the whole thing at once: map(lambda s:s.strip(), string.split(','))

Tip: any time you find yourself using map, particularly if you're using lambda with it, double-check to see if you should be using a list comprehension.

You can avoid the lambda with map(str.strip, s.split(',')).

Zieng

import re
result=[x for x in re.split(',| ',your_string) if x!='']

this works fine for me.

Dannid

re (as in regular expressions) allows splitting on multiple characters at once:

$ string = "blah, lots  ,  of ,  spaces, here "
$ re.split(', ',string)
['blah', 'lots  ', ' of ', ' spaces', 'here ']

This doesn't work well for your example string, but works nicely for a comma-space separated list. For your example string, you can combine the re.split power to split on regex patterns to get a "split-on-this-or-that" effect.

$ re.split('[, ]',string)
['blah',
 '',
 'lots',
 '',
 '',
 '',
 '',
 'of',
 '',
 '',
 '',
 'spaces',
 '',
 'here',
 '']

Unfortunately, that's ugly, but a filter will do the trick:

$ filter(None, re.split('[, ]',string))
['blah', 'lots', 'of', 'spaces', 'here']

Voila!

Why not just re.split(' *, *', string)?

@PaulTomblin good idea. One can also have done this: re.split('[, ]*',string) for the same effect.

Dannid I realized after writing that that it doesn't strip whitespace at the beginning and the end like @tbc0's answer does.

@PaulTomblinheh, and my rebuttal [, ]* leaves an empty string at the end of the list. I think filter is still a nice thing to throw in there, or stick to list comprehension like the top answer does.

Pang

s = 'bla, buu, jii'

sp = []
sp = s.split(',')
for st in sp:
    print st

Hrvoje

import re
mylist = [x for x in re.compile('\s*[,|\s+]\s*').split(string)]

Simply, comma or at least one white spaces with/without preceding/succeeding white spaces.

Please try!

crazysra

Instead of splitting the string first and then worrying about white space you can first deal with it and then split it

string.replace(" ", "").split(",")

What about the valid values like ABC CDE, AB C, AM BH N here stripping means removing leading or trailing spaces, not from middle

Split by comma and strip whitespace in Python

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US