I have some python code that splits on comma, but doesn't strip the whitespace:
>>> string = "blah, lots , of , spaces, here "
>>> mylist = string.split(',')
>>> print mylist
['blah', ' lots ', ' of ', ' spaces', ' here ']
I would rather end up with whitespace removed like this:
['blah', 'lots', 'of', 'spaces', 'here']
I am aware that I could loop through the list and strip() each item but, as this is Python, I'm guessing there's a quicker, easier and more elegant way of doing it.
Use list comprehension -- simpler, and just as easy to read as a for
loop.
my_string = "blah, lots , of , spaces, here "
result = [x.strip() for x in my_string.split(',')]
# result is ["blah", "lots", "of", "spaces", "here"]
See: Python docs on List Comprehension
A good 2 second explanation of list comprehension.
I came to add:
map(str.strip, string.split(','))
but saw it had already been mentioned by Jason Orendorff in a comment.
Reading Glenn Maynard's comment on the same answer suggesting list comprehensions over map I started to wonder why. I assumed he meant for performance reasons, but of course he might have meant for stylistic reasons, or something else (Glenn?).
So a quick (possibly flawed?) test on my box (Python 2.6.5 on Ubuntu 10.04) applying the three methods in a loop revealed:
$ time ./list_comprehension.py # [word.strip() for word in string.split(',')]
real 0m22.876s
$ time ./map_with_lambda.py # map(lambda s: s.strip(), string.split(','))
real 0m25.736s
$ time ./map_with_str.strip.py # map(str.strip, string.split(','))
real 0m19.428s
making map(str.strip, string.split(','))
the winner, although it seems they are all in the same ballpark.
Certainly though map (with or without a lambda) should not necessarily be ruled out for performance reasons, and for me it is at least as clear as a list comprehension.
list(map(str.strip, string.split(',')))
Split using a regular expression. Note I made the case more general with leading spaces. The list comprehension is to remove the null strings at the front and back.
>>> import re
>>> string = " blah, lots , of , spaces, here "
>>> pattern = re.compile("^\s+|\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['blah', 'lots', 'of', 'spaces', 'here']
This works even if ^\s+
doesn't match:
>>> string = "foo, bar "
>>> print([x for x in pattern.split(string) if x])
['foo', 'bar']
>>>
Here's why you need ^\s+:
>>> pattern = re.compile("\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
[' blah', 'lots', 'of', 'spaces', 'here']
See the leading spaces in blah?
Clarification: above uses the Python 3 interpreter, but results are the same in Python 2.
[x.strip() for x in my_string.split(',')]
is more pythonic for the question asked. Maybe there are cases where my solution is necessary. I'll update this content if I run across one.
^\s+
necessary? I've tested your code without it and it doesn't work, but I don't know why.
re.compile("^\s*,\s*$")
, result is [' blah, lots , of , spaces, here ']
.
^\s+
makes. As you can see for yourself, ^\s*,\s*$
doesn't return desired results, either. So if you want split with a regexp, use ^\s+|\s*,\s*|\s+$
.
Just remove the white space from the string before you split it.
mylist = my_string.replace(' ','').split(',')
"you just, broke this"
.
I know this has already been answered, but if you end doing this a lot, regular expressions may be a better way to go:
>>> import re
>>> re.sub(r'\s', '', string).split(',')
['blah', 'lots', 'of', 'spaces', 'here']
The \s
matches any whitespace character, and we just replace it with an empty string ''
. You can find more info here: http://docs.python.org/library/re.html#re.sub
map(lambda s: s.strip(), mylist)
would be a little better than explicitly looping. Or for the whole thing at once: map(lambda s:s.strip(), string.split(','))
map
, particularly if you're using lambda
with it, double-check to see if you should be using a list comprehension.
map(str.strip, s.split(','))
.
import re
result=[x for x in re.split(',| ',your_string) if x!='']
this works fine for me.
re
(as in regular expressions) allows splitting on multiple characters at once:
$ string = "blah, lots , of , spaces, here "
$ re.split(', ',string)
['blah', 'lots ', ' of ', ' spaces', 'here ']
This doesn't work well for your example string, but works nicely for a comma-space separated list. For your example string, you can combine the re.split power to split on regex patterns to get a "split-on-this-or-that" effect.
$ re.split('[, ]',string)
['blah',
'',
'lots',
'',
'',
'',
'',
'of',
'',
'',
'',
'spaces',
'',
'here',
'']
Unfortunately, that's ugly, but a filter
will do the trick:
$ filter(None, re.split('[, ]',string))
['blah', 'lots', 'of', 'spaces', 'here']
Voila!
re.split(' *, *', string)
?
re.split('[, ]*',string)
for the same effect.
[, ]*
leaves an empty string at the end of the list. I think filter is still a nice thing to throw in there, or stick to list comprehension like the top answer does.
s = 'bla, buu, jii'
sp = []
sp = s.split(',')
for st in sp:
print st
import re
mylist = [x for x in re.compile('\s*[,|\s+]\s*').split(string)]
Simply, comma or at least one white spaces with/without preceding/succeeding white spaces.
Please try!
Instead of splitting the string first and then worrying about white space you can first deal with it and then split it
string.replace(" ", "").split(",")
Success story sharing