ChatGPT解决这个技术问题 Extra ChatGPT

Looping through the content of a file in Bash

How do I iterate through each line of a text file with Bash?

With this script:

echo "Start!"
for p in (peptides.txt)
do
    echo "${p}"
done

I get this output on the screen:

Start!
./runPep.sh: line 3: syntax error near unexpected token `('
./runPep.sh: line 3: `for p in (peptides.txt)'

(Later I want to do something more complicated with $p than just output to the screen.)

The environment variable SHELL is (from env):

SHELL=/bin/bash

/bin/bash --version output:

GNU bash, version 3.1.17(1)-release (x86_64-suse-linux-gnu)
Copyright (C) 2005 Free Software Foundation, Inc.

cat /proc/version output:

Linux version 2.6.18.2-34-default (geeko@buildhost) (gcc version 4.1.2 20061115 (prerelease) (SUSE Linux)) #1 SMP Mon Nov 27 11:46:27 UTC 2006

The file peptides.txt contains:

RKEKNVQ
IPKKLLQK
QYFHQLEKMNVK
IPKKLLQK
GDLSTALEVAIDCYEK
QYFHQLEKMNVKIPENIYR
RKEKNVQ
VLAKHGKLQDAIN
ILGFMK
LEDVALQILL
Oh, I see many things have happened here: all the comments were deleted and the question being reopened. Just for reference, the accepted answer in Read a file line by line assigning the value to a variable addresses the problem in a canonical way and should be preferred over the accepted one here.

r
rogerdpack

One way to do it is:

while read p; do
  echo "$p"
done <peptides.txt

As pointed out in the comments, this has the side effects of trimming leading whitespace, interpreting backslash sequences, and skipping the last line if it's missing a terminating linefeed. If these are concerns, you can do:

while IFS="" read -r p || [ -n "$p" ]
do
  printf '%s\n' "$p"
done < peptides.txt

Exceptionally, if the loop body may read from standard input, you can open the file using a different file descriptor:

while read -u 10 p; do
  ...
done 10<peptides.txt

Here, 10 is just an arbitrary number (different from 0, 1, 2).


How should I interpret the last line? File peptides.txt is redirected to standard input and somehow to the whole of the while block?
"Slurp peptides.txt into this while loop, so the 'read' command has something to consume." My "cat" method is similar, sending the output of a command into the while block for consumption by 'read', too, only it launches another program to get the work done.
This method seems to skip the last line of a file.
Double quote the lines !! echo "$p" and the file.. trust me it will bite you if you don't!!! I KNOW! lol
Both versions fail to read a final line if it is not terminated with a newline. Always use while read p || [[ -n $p ]]; do ...
r
rogerdpack
cat peptides.txt | while read line 
do
   # do something with $line here
done

and the one-liner variant:

cat peptides.txt | while read line; do something_with_$line_here; done

These options will skip the last line of the file if there is no trailing line feed.

You can avoid this by the following:

cat peptides.txt | while read line || [[ -n $line ]];
do
   # do something with $line here
done

In general, if you're using "cat" with only one argument, you're doing something wrong (or suboptimal).
Yes, it's just not as efficient as Bruno's, because it launches another program, unnecessarily. If efficiency matters, do it Bruno's way. I remember my way because you can use it with other commands, where the "redirect in from" syntax doesn't work.
There's another, more serious problem with this: because the while loop is part of a pipeline, it runs in a subshell, and hence any variables set inside the loop are lost when it exits (see bash-hackers.org/wiki/doku.php/mirroring/bashfaq/024). This can be very annoying (depending on what you're trying to do in the loop).
I use "cat file | " as the start of a lot of my commands purely because I often prototype with "head file |"
This may be not that efficient, but it's much more readable than other answers.
t
tripleee

Option 1a: While loop: Single line at a time: Input redirection

#!/bin/bash
filename='peptides.txt'
echo Start
while read p; do 
    echo "$p"
done < "$filename"

Option 1b: While loop: Single line at a time: Open the file, read from a file descriptor (in this case file descriptor #4).

#!/bin/bash
filename='peptides.txt'
exec 4<"$filename"
echo Start
while read -u4 p ; do
    echo "$p"
done

For option 1b: does the file descriptor need to be closed again? E.g. the loop could be an inner loop.
The file descriptor will be cleaned up with the process exits. An explicit close can be done to reuse the fd number. To close a fd, use another exec with the &- syntax, like this: exec 4<&-
Thank you for Option 2. I ran into huge problems with Option 1 because I needed to read from stdin within the loop; in such a case Option 1 will not work.
You should point out more clearly that Option 2 is strongly discouraged. @masgo Option 1b should work in that case, and can be combined with the input redirection syntax from Option 1a by replacing done < $filename with done 4<$filename (which is useful if you want to read the file name from a command parameter, in which case you can just replace $filename by $1).
I need to loop over file contents such as tail -n +2 myfile.txt | grep 'somepattern' | cut -f3, while running ssh commands inside the loop (consumes stdin); option 2 here appears to be the only way?
m
mightypile

This is no better than other answers, but is one more way to get the job done in a file without spaces (see comments). I find that I often need one-liners to dig through lists in text files without the extra step of using separate script files.

for word in $(cat peptides.txt); do echo $word; done

This format allows me to put it all in one command-line. Change the "echo $word" portion to whatever you want and you can issue multiple commands separated by semicolons. The following example uses the file's contents as arguments into two other scripts you may have written.

for word in $(cat peptides.txt); do cmd_a.sh $word; cmd_b.py $word; done

Or if you intend to use this like a stream editor (learn sed) you can dump the output to another file as follows.

for word in $(cat peptides.txt); do cmd_a.sh $word; cmd_b.py $word; done > outfile.txt

I've used these as written above because I have used text files where I've created them with one word per line. (See comments) If you have spaces that you don't want splitting your words/lines, it gets a little uglier, but the same command still works as follows:

OLDIFS=$IFS; IFS=$'\n'; for line in $(cat peptides.txt); do cmd_a.sh $line; cmd_b.py $line; done > outfile.txt; IFS=$OLDIFS

This just tells the shell to split on newlines only, not spaces, then returns the environment back to what it was previously. At this point, you may want to consider putting it all into a shell script rather than squeezing it all into a single line, though.

Best of luck!


The bash $(
@JoaoCosta,maxpolk : Good points that I hadn't considered. I've edited the original post to reflect them. Thanks!
Using for makes the input tokens/lines subject to shell expansions, which is usually undesirable; try this: for l in $(echo '* b c'); do echo "[$l]"; done - as you'll see, the * - even though originally a quoted literal - expands to the files in the current directory.
@dblanchard: The last example, using $IFS, should ignore spaces. Have you tried that version?
The way how this command gets a lot more complex as crucial issues are fixed, presents very well why using for to iterate file lines is a a bad idea. Plus, the expansion aspect mentioned by @mklement0 (even though that probably can be circumvented by bringing in escaped quotes, which again makes things more complex and less readable).
c
codeforester

A few more things not covered by other answers:

Reading from a delimited file

# ':' is the delimiter here, and there are three fields on each line in the file
# IFS set below is restricted to the context of `read`, it doesn't affect any other code
while IFS=: read -r field1 field2 field3; do
  # process the fields
  # if the line has less than three fields, the missing fields will be set to an empty string
  # if the line has more than three fields, `field3` will get all the values, including the third field plus the delimiter(s)
done < input.txt

Reading from the output of another command, using process substitution

while read -r line; do
  # process the line
done < <(command ...)

This approach is better than command ... | while read -r line; do ... because the while loop here runs in the current shell rather than a subshell as in the case of the latter. See the related post A variable modified inside a while loop is not remembered.

Reading from a null delimited input, for example find ... -print0

while read -r -d '' line; do
  # logic
  # use a second 'read ... <<< "$line"' if we need to tokenize the line
done < <(find /path/to/dir -print0)

Related read: BashFAQ/020 - How can I find and safely handle file names containing newlines, spaces or both?

Reading from more than one file at a time

while read -u 3 -r line1 && read -u 4 -r line2; do
  # process the lines
  # note that the loop will end when we reach EOF on either of the files, because of the `&&`
done 3< input1.txt 4< input2.txt

Based on @chepner's answer here:

-u is a bash extension. For POSIX compatibility, each call would look something like read -r X <&3.

Reading a whole file into an array (Bash versions earlier to 4)

while read -r line; do
    my_array+=("$line")
done < my_file

If the file ends with an incomplete line (newline missing at the end), then:

while read -r line || [[ $line ]]; do
    my_array+=("$line")
done < my_file

Reading a whole file into an array (Bash versions 4x and later)

readarray -t my_array < my_file

or

mapfile -t my_array < my_file

And then

for line in "${my_array[@]}"; do
  # process the lines
done

More about the shell builtins read and readarray commands - GNU

More about IFS - Wikipedia

BashFAQ/001 - How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?

Related posts:

Creating an array from a text file in Bash

What is the difference between thee approaches to reading a file that has just one line?

Bash while read loop extremely slow compared to cat, why?


note that instead of command < input_filename.txt you can always do input_generating_command | command or command < <(input_generating_command)
Thanks for reading file into array. Exactly what I need, because I need each line to parse twice, add to new variables, do some validations etc.
this is by far the most useful version I think
'read -r -d ''` works for null delimited input in combination with while, not standalone (read -r d '' foo bar). See here.
J
Jahid

Use a while loop, like this:

while IFS= read -r line; do
   echo "$line"
done <file

Notes:

If you don't set the IFS properly, you will lose indentation. You should almost always use the -r option with read. Don't read lines with for


@DavidC.Rankin The -r option prevents backslash interpretation. Note #2 is a link where it is described in detail...
Combine this with the "read -u" option in another answer and then it's perfect.
@FlorinAndrei : The above example doesn't need the -u option, are you talking about another example with -u?
Looked through your links, and was surprised there's no answer that simply links your link in Note 2. That page provides everything you need to know about that subject. Or are link-only answers discouraged or something?
@EgorHans : link only answers are generally deleted.
d
dawg

Suppose you have this file:

$ cat /tmp/test.txt
Line 1
    Line 2 has leading space
Line 3 followed by blank line

Line 5 (follows a blank line) and has trailing space    
Line 6 has no ending CR

There are four elements that will alter the meaning of the file output read by many Bash solutions:

The blank line 4; Leading or trailing spaces on two lines; Maintaining the meaning of individual lines (i.e., each line is a record); The line 6 not terminated with a CR.

If you want the text file line by line including blank lines and terminating lines without CR, you must use a while loop and you must have an alternate test for the final line.

Here are the methods that may change the file (in comparison to what cat returns):

1) Lose the last line and leading and trailing spaces:

$ while read -r p; do printf "%s\n" "'$p'"; done </tmp/test.txt
'Line 1'
'Line 2 has leading space'
'Line 3 followed by blank line'
''
'Line 5 (follows a blank line) and has trailing space'

(If you do while IFS= read -r p; do printf "%s\n" "'$p'"; done </tmp/test.txt instead, you preserve the leading and trailing spaces but still lose the last line if it is not terminated with CR)

2) Using process substitution with cat will reads the entire file in one gulp and loses the meaning of individual lines:

$ for p in "$(cat /tmp/test.txt)"; do printf "%s\n" "'$p'"; done
'Line 1
    Line 2 has leading space
Line 3 followed by blank line

Line 5 (follows a blank line) and has trailing space    
Line 6 has no ending CR'

(If you remove the " from $(cat /tmp/test.txt) you read the file word by word rather than one gulp. Also probably not what is intended...)

The most robust and simplest way to read a file line-by-line and preserve all spacing is:

$ while IFS= read -r line || [[ -n $line ]]; do printf "'%s'\n" "$line"; done </tmp/test.txt
'Line 1'
'    Line 2 has leading space'
'Line 3 followed by blank line'
''
'Line 5 (follows a blank line) and has trailing space    '
'Line 6 has no ending CR'

If you want to strip leading and trading spaces, remove the IFS= part:

$ while read -r line || [[ -n $line ]]; do printf "'%s'\n" "$line"; done </tmp/test.txt
'Line 1'
'Line 2 has leading space'
'Line 3 followed by blank line'
''
'Line 5 (follows a blank line) and has trailing space'
'Line 6 has no ending CR'

(A text file without a terminating \n, while fairly common, is considered broken under POSIX. If you can count on the trailing \n you do not need || [[ -n $line ]] in the while loop.)

More at the BASH FAQ


A
Anjul Sharma

If you don't want your read to be broken by newline character, use -

#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
    echo "$line"
done < "$1"

Then run the script with file name as parameter.


J
Jieiku

This might be the simplest answer and maybe it don't work in all cases, but it is working great for me:

while read line;do echo "$line";done<peptides.txt

if you need to enclose in parenthesis for spaces:

while read line;do echo \"$line\";done<peptides.txt

Ahhh this is pretty much the same as the answer that got upvoted most, but its all on one line.


h
hamou92

I like to use xargs instead of while. xargs is powerful and command line friendly

cat peptides.txt | xargs -I % sh -c "echo %"

With xargs, you can also add verbosity with -t and validation with -p


There are serious security problems with this approach. What if your peptides.txt contains something that unescapes to $(rm -rf ~), or even worse, $(rm -rf ~)'$(rm -rf ~)'?
0
0zkr PM
#!/bin/bash
#
# Change the file name from "test" to desired input file 
# (The comments in bash are prefixed with #'s)
for x in $(cat test.txt)
do
    echo $x
done

This answer needs the caveats mentioned in mightypile's answer, and it can fail badly if any line contains shell metacharacters (due to the unquoted "$x").
I'm actually surprised people didn't yet come up with the usual Don't read lines with for...
This really doesn't work in any general way. Bash splits each line on spaces which is very unlikely a desired outcome.
W
Whome

Here is my real life example how to loop lines of another program output, check for substrings, drop double quotes from variable, use that variable outside of the loop. I guess quite many is asking these questions sooner or later.

##Parse FPS from first video stream, drop quotes from fps variable
## streams.stream.0.codec_type="video"
## streams.stream.0.r_frame_rate="24000/1001"
## streams.stream.0.avg_frame_rate="24000/1001"
FPS=unknown
while read -r line; do
  if [[ $FPS == "unknown" ]] && [[ $line == *".codec_type=\"video\""* ]]; then
    echo ParseFPS $line
    FPS=parse
  fi
  if [[ $FPS == "parse" ]] && [[ $line == *".r_frame_rate="* ]]; then
    echo ParseFPS $line
    FPS=${line##*=}
    FPS="${FPS%\"}"
    FPS="${FPS#\"}"
  fi
done <<< "$(ffprobe -v quiet -print_format flat -show_format -show_streams -i "$input")"
if [ "$FPS" == "unknown" ] || [ "$FPS" == "parse" ]; then 
  echo ParseFPS Unknown frame rate
fi
echo Found $FPS

Declare variable outside of the loop, set value and use it outside of loop requires done <<< "$(...)" syntax. Application need to be run within a context of current console. Quotes around the command keeps newlines of output stream.

Loop match for substrings then reads name=value pair, splits right-side part of last = character, drops first quote, drops last quote, we have a clean value to be used elsewhere.


While the answer is correct, I do understand how it ended up down here. The essential method is the same as proposed by many other answers. Plus, it completely drowns in your FPS example.
A
Alan Jebakumar

@Peter: This could work out for you-

echo "Start!";for p in $(cat ./pep); do
echo $p
done

This would return the output-

Start!
RKEKNVQ
IPKKLLQK
QYFHQLEKMNVK
IPKKLLQK
GDLSTALEVAIDCYEK
QYFHQLEKMNVKIPENIYR
RKEKNVQ
VLAKHGKLQDAIN
ILGFMK
LEDVALQILL

This answer is defeating all the principles set by the good answers above!
Please delete this answer.
Now guys, don't exaggerate. The answer is bad, but it seems to work, at least for simple use cases. As long as that's provided, being a bad answer doesn't take away the answer's right to exist.
@EgorHans, I disagree strongly: The point of answers is to teach people how to write software. Teaching people to do things in a way that you know is harmful to them and the people who use their software (introducing bugs / unexpected behaviors / etc) is knowingly harming others. An answer known to be harmful has no "right to exist" in a well-curated teaching resource (and curating it is exactly what we, the folks who are voting and flagging, are supposed to be doing here).
m
madD7

This is coming rather very late, but with the thought that it may help someone, i am adding the answer. Also this may not be the best way. head command can be used with -n argument to read n lines from start of file and likewise tail command can be used to read from bottom. Now, to fetch nth line from file, we head n lines, pipe the data to tail only 1 line from the piped data.

   TOTAL_LINES=`wc -l $USER_FILE | cut -d " " -f1 `
   echo $TOTAL_LINES       # To validate total lines in the file

   for (( i=1 ; i <= $TOTAL_LINES; i++ ))
   do
      LINE=`head -n$i $USER_FILE | tail -n1`
      echo $LINE
   done

Don't do this. Looping over line numbers and fetching each individual line by way of sed or head + tail is incredibly inefficient, and of course begs the question why you don't simply use one of the other solutions here. If you need to know the line number, add a counter to your while read -r loop, or use nl -ba to add a line number prefix to each line before the loop.
@tripleee i have clearly mentioned "this may not be the best way". I have not limited the discussion to "the best or the most efficient solution".
Iterating over the lines of a file with a for loop can be useful in some situations. For example some commands can make a while loop break. See stackoverflow.com/a/64049584/2761700
a
abhishek nair

Another way to go about using xargs

<file_name | xargs -I {} echo {}

echo can be replaced with other commands or piped further.


C
Chris

Per Ed Morton, Why is using a shell loop to process text considered bad practice?

The answer is: don't process text with bash, process text in bash with the tool designed for this task, awk.

https://www.gnu.org/software/gawk/manual/gawk.html

#! /usr/bin/env awk -f

BEGIN { print("do anything you want here!"); }
{
   print("processing line: ", $0);
}
END { print("and anything else here!") };

And invoke with:

./awk-script.awk peptides.txt

And in a bash script:

#!/usr/bin/env bash

echo "foo" | awk "{print}"

The question asks specifically for how to do it with bash
@Matt I am interpreting the intent here as a "how do I do it in bash" rather than "how do I do it with bash". And I've been frustrated enough with overly literal interpretations of my questions that I'm happy to wait for the OP to weigh in.
Does not parse. Missing closing curly brace on the last line.
@rsaxvc corrected.