我正在尝试输出一个字符串,其中包含字符串的两个单词之间的所有内容:
输入:
"Here is a String"
输出:
"is a"
使用:
sed -n '/Here/,/String/p'
包括端点,但我不想包括它们。
Here is a Here String
,结果应该是什么?还是I Hereby Dub Thee Sir Stringy
?
sed
FAQ 是“如何在特定行之间提取文本”;这是stackoverflow.com/questions/16643288/…
GNU grep 还可以支持正负前瞻和回溯:对于您的情况,命令将是:
echo "Here is a string" | grep -o -P '(?<=Here).*(?=string)'
如果 Here
和 string
多次出现,您可以选择是要从第一个 Here
和最后一个 string
匹配还是单独匹配它们。就正则表达式而言,它被称为 greedy match (first case) 或 non-greedy match (second case)
$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*(?=string)' # Greedy match
is a string, and Here is another
$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*?(?=string)' # Non-greedy match (Notice the '?' after '*' in .*)
is a
is another
sed -e 's/Here\(.*\)String/\1/'
echo "Here is a one is a String" | sed -e 's/one is\(.*\)String/\1/'
。如果您只想要“one is”和“String”之间的部分,那么您需要使正则表达式匹配整行:sed -e 's/.*one is\(.*\)String.*/\1/'
。在 sed 中,s/pattern/replacement/
说“在每一行用 'replacement' 替换 'pattern'”。它只会改变任何匹配“模式”的东西,所以如果你想让它替换整行,你需要让“模式”匹配整行。
Here is a String Here is a String
时会中断
接受的答案不会删除可能在 Here
之前或 String
之后的文本。这将:
sed -e 's/.*Here\(.*\)String.*/\1/'
主要区别是在 Here
之前和 String
之后添加了 .*
。
Here
和 String
之间的 *
量词不贪婪(或懒惰)。但是,根据 this Stackoverflow 问题,sed 使用的正则表达式类型不支持惰性量词(紧跟在 .*
之后的 ?
)。通常要实现惰性量词,您只需匹配除您不想匹配的标记之外的所有内容,但在这种情况下,不只是一个标记,而是一个完整的字符串 String
。
.
不匹配换行符。如果要匹配换行符,可以将 .
替换为 [\s\s]
。
您可以单独在 Bash 中剥离字符串:
$ foo="Here is a String"
$ foo=${foo##*Here }
$ echo "$foo"
is a String
$ foo=${foo%% String*}
$ echo "$foo"
is a
$
如果您有一个包含 PCRE 的 GNU grep,则可以使用零宽度断言:
$ echo "Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
is a
如果您有一个包含许多多行出现的长文件,则首先打印数字行很有用:
cat -n file | sed -n '/Here/,/String/p'
cat
中的 -n
选项。
cat
可以完全省略; sed
知道如何读取文件或标准输入。
通过 GNU awk,
$ echo "Here is a string" | awk -v FS="(Here|string)" '{print $2}'
is a
带有 -P
(perl-regexp) 参数的 grep 支持 \K
,这有助于丢弃以前匹配的字符。在我们的例子中,之前匹配的字符串是 Here
,所以它从最终输出中被丢弃了。
$ echo "Here is a string" | grep -oP 'Here\K.*(?=string)'
is a
$ echo "Here is a string" | grep -oP 'Here\K(?:(?!string).)*'
is a
如果您希望输出为 is a
,那么您可以尝试以下操作,
$ echo "Here is a string" | grep -oP 'Here\s*\K.*(?=\s+string)'
is a
$ echo "Here is a string" | grep -oP 'Here\s*\K(?:(?!\s+string).)*'
is a
echo "Here is a string dfdsf Here is a string" | awk -v FS="(Here|string)" '{print $2}'
,它只返回 is a
而不是 is a is a
@Avinash Raj
您可以使用两个 s 命令
$ echo "Here is a String" | sed 's/.*Here//; s/String.*//'
is a
也有效
$ echo "Here is a StringHere is a String" | sed 's/.*Here//; s/String.*//'
is a
$ echo "Here is a StringHere is a StringHere is a StringHere is a String" | sed 's/.*Here//; s/String.*//'
is a
这可能对您有用(GNU sed):
sed '/Here/!d;s//&\n/;s/.*\n//;:a;/String/bb;$!{n;ba};:b;s//\n&/;P;D' file
这会在换行符上显示两个标记(在本例中为 Here
和 String
)之间的每个文本表示,并在文本中保留换行符。
要理解 sed
命令,我们必须逐步构建它。
这是你的原文
user@linux:~$ echo "Here is a String"
Here is a String
user@linux:~$
让我们尝试使用 sed
中的 s
ubstition 选项删除 Here
字符串
user@linux:~$ echo "Here is a String" | sed 's/Here //'
is a String
user@linux:~$
此时,我相信您也可以删除 String
user@linux:~$ echo "Here is a String" | sed 's/String//'
Here is a
user@linux:~$
但这不是您想要的输出。
要组合两个 sed 命令,请使用 -e
选项
user@linux:~$ echo "Here is a String" | sed -e 's/Here //' -e 's/String//'
is a
user@linux:~$
希望这可以帮助
上述所有解决方案都存在缺陷,即最后一个搜索字符串在字符串的其他地方重复。我发现最好写一个 bash 函数。
function str_str {
local str
str="${1#*${2}}"
str="${str%%$3*}"
echo -n "$str"
}
# test it ...
mystr="this is a string"
str_str "$mystr" "this " " string"
您可以使用 \1
(请参阅 http://www.grymoire.com/Unix/Sed.html#uh-4):
echo "Hello is a String" | sed 's/Hello\(.*\)String/\1/g'
括号内的内容将存储为 \1
。
问题。我存储的爪子邮件消息包装如下,我正在尝试提取主题行:
Subject: [SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular
link in major cell growth pathway: Findings point to new potential
therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is
Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as
a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway
identified [Lysosomal amino acid transporter SLC38A9 signals arginine
sufficiency to mTORC1]]
Message-ID: <20171019190902.18741771@VictoriasJourney.com>
根据此线程中的 A2,只要匹配的文本不包含换行符,How to use sed/grep to extract text between two words? 下面的第一个表达式“有效”:
grep -o -P '(?<=Subject: ).*(?=molecular)' corpus/01
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key
但是,尽管尝试了许多变体 (.+?; /s; ...
),但我无法让这些变体起作用:
grep -o -P '(?<=Subject: ).*(?=link)' corpus/01
grep -o -P '(?<=Subject: ).*(?=therapeutic)' corpus/01
etc.
解决方案 1。
每Extract text between two strings on different lines
sed -n '/Subject: /{:a;N;/Message-ID:/!ba; s/\n/ /g; s/\s\s*/ /g; s/.*Subject: \|Message-ID:.*//g;p}' corpus/01
这使
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
解决方案 2.*
每How can I replace a newline (\n) using sed?
sed ':a;N;$!ba;s/\n/ /g' corpus/01
将用空格替换换行符。
将其与 How to use sed/grep to extract text between two words? 中的 A2 链接起来,我们得到:
sed ':a;N;$!ba;s/\n/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
这使
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
此变体删除了双空格:
sed ':a;N;$!ba;s/\n/ /g; s/\s\s*/ /g' corpus/01 | grep -o -P '(?<=Subject: ).*(?=Message-ID:)'
给予
[SLC38A9 lysosomal arginine sensor; mTORC1 pathway] Key molecular link in major cell growth pathway: Findings point to new potential therapeutic target in pancreatic cancer [mTORC1 Activator SLC38A9 Is Required to Efflux Essential Amino Acids from Lysosomes and Use Protein as a Nutrient] [Re: Nutrient sensor in key growth-regulating metabolic pathway identified [Lysosomal amino acid transporter SLC38A9 signals arginine sufficiency to mTORC1]]
不定期副业成功案例分享
-P
选项不存在于 *BSD 中包含的grep
或任何 SVR4(Solaris 等)附带的选项中。在 FreeBSD 中,您可以安装包含pcregrep
的devel/pcre
端口,它支持 PCRE(和前瞻/后视)。旧版本的 OSX 使用 GNU grep,但在 OSX Mavericks 中,-P
派生自 FreeBSD 的版本,不包含该选项。Here is a string a string
,根据问题要求," is a "
和" is a string a "
都是有效答案(忽略引号)。这取决于您您想要哪一个,然后答案可能会有所不同。无论如何,根据您的要求,这将起作用:echo "Here is a string a string" | grep -o -P '(?<=Here).*?(?=string)'
echo $'Here is \na string' | grep -zoP '(?<=Here)(?s).*(?=string)'