How to search for occurrences of more than one space between words in a line
1. this is a line containing 2 spaces
2. this is a line containing 3 spaces
3. this is a line containing multiple spaces first second three four
All the above are valid matches for this regex. What regex should I use?
[ ]{2,}
SPACE (2 or more)
You could also check that before and after those spaces words follow. (not other whitespace like tabs or new lines)
\w[ ]{2,}\w
the same, but you can also pick (capture) only the spaces for tasks like replacement
\w([ ]{2,})\w
or see that before and after spaces there is anything, not only word characters (except whitespace)
[^\s]([ ]{2,})[^\s]
Simple solution:
/\s{2,}/
This matches all occurrences of one or more whitespace characters. If you need to match the entire line, but only if it contains two or more consecutive whitespace characters:
/^.*\s{2,}.*$/
If the whitespaces don't need to be consecutive:
/^(.*\s.*){2,}$/
.*
is usually greedy, meaning that it will reach the end of the tested string, and all which follows, if there are mandatory characters, won't match. Usually in this case it's a good practice to add ?
, like this .*?
. It happened to me using PHP's PCRE
/^.*b.*$/
does in fact match "foobar"
, even though you'd expect the first greedy .*
to match the entire string already.
This regex selects all spaces, you can use this and replace it with a single space
\s+
example in python
result = re.sub('\s+',' ', data))
Search for [ ]{2,}
. This will find two or more adjacent spaces anywhere within the line. It will also match leading and trailing spaces as well as lines that consist entirely of spaces. If you don't want that, check out Alexander's answer.
Actually, you can leave out the brackets, they are just for clarity (otherwise the space character that is being repeated isn't that well visible :)).
The problem with \s{2,}
is that it will also match newlines on Windows files (where newlines are denoted by CRLF
or \r\n
which is matched by \s{2}
.
If you also want to find multiple tabs and spaces, use [ \t]{2,}
.
more than one space between words in a line
. How is [ ]{2,}
between words? Have you even read the question?
Here is my solution
[^0-9A-Z,\n]
This will remove all the digits, commas and new lines but select the middle space such as data set of
20171106,16632 ESCG0000018SB
20171107,280 ESCG0000018SB
20171106,70476 ESCG0000018SB
Success story sharing
\w
means 'word characters', that is, alphanumeric and underscore, but not other non-space characters. To check for non-whitespace, use\S
(capital S). Also, the first one will only match lines that contain two or more spaces and nothing else.\S
, I just prefer not to rely on character case for such functionality, it's easier to read.\w[ ]{2,}\w
will fail to matchword.<2 spaces>more words
or a string that consists entirely of spaces.[^\s]([ ]{2,})[^\s]\w
will fail on lines that start with spaces or strings likebla<2 spaces>.
...{min,max}
operator is the general repetition quantifier and 2) Omittingmax
but leaving the comma means unlimited repetitions.