Ora

How to read an input file in bash?

Published in Bash File Operations 6 mins read

Reading an input file in Bash is a fundamental operation, often accomplished with a while read loop, which provides a robust and efficient way to process files line by line. This method is highly flexible, allowing for intricate processing of each line of text.

The Robust while read Loop

The most common and recommended method for reading a file line by line in Bash involves a while read loop. This approach handles spaces, special characters, and empty lines effectively.

How it works:

The while read line; do ... done < filename construct redirects the content of filename as standard input to the while loop. The read command then reads one line at a time from this input, assigning it to the line variable until the end of the file is reached.

Basic Example

Let's say you have a file named read_file.txt with the following content:

Apple
Banana
Cherry Blossom
Date fruit

To read and display each line along with its line number, you would use the following script:

#!/bin/bash

# Define the input file name
file='read_file.txt'

# Initialize a line counter
i=1

# Loop through each line of the file
while read line; do
  # Process each line: print line number and content
  echo "Line No. $i : $line"
  # Increment the line counter
  i=$((i+1))
done < "$file"

Output of the script:

Line No. 1 : Apple
Line No. 2 : Banana
Line No. 3 : Cherry Blossom
Line No. 4 : Date fruit

Understanding the Components

  • #!/bin/bash: This is the shebang, specifying the interpreter for the script.
  • file='read_file.txt': A variable is defined to hold the name of the file to be read. Using a variable makes the script more maintainable.
  • i=1: An integer variable i is initialized to 1 to act as a line counter.
  • while read line; do ... done: This is the core loop structure.
    • read line: Reads a single line from the standard input and stores it in the variable line.
    • do ... done: Encloses the commands to be executed for each line.
  • echo "Line No. $i : $line": Inside the loop, this command prints the current line number and the content of the line variable.
  • i=$((i+1)): Increments the line counter i by 1 using arithmetic expansion.
  • < "$file": This is a crucial input redirection. It tells the while loop to take its input from the specified file (read_file.txt) instead of standard input (like the keyboard).

Advanced Considerations for while read

While the basic while read loop is powerful, understanding some advanced options can prevent common pitfalls:

1. Handling Backslashes and Special Characters: read -r

By default, read interprets backslashes as escape characters (e.g., \n for newline). To prevent this and treat backslashes literally, use the -r option:

while read -r line; do
  echo "$line"
done < "$file"

This is generally recommended for reliable file processing.

2. Controlling Word Splitting: IFS

The read command, by default, splits lines into words using the characters defined in the IFS (Internal Field Separator) variable (whitespace by default: space, tab, newline). If your lines contain spaces or other IFS characters that you want to keep as part of the line variable without splitting, you should set IFS to an empty string or specifically to a newline character only:

# Example 1: Read entire line without splitting words
IFS= read -r line

# Example 2: Read line, preserving spaces, but allowing fields to be split by newline
while IFS= read -r line; do
  echo "$line"
done < "$file"

Setting IFS= before read ensures that read does not perform word splitting, treating the entire line as a single field.

3. Preserving Leading/Trailing Whitespace

The read command, by default, trims leading and trailing whitespace. If preserving these is critical, you might need a more complex approach or ensure IFS is set correctly. However, IFS= typically helps in preserving leading/trailing spaces if the read command itself isn't further processing them with additional options.

Alternative Methods (Less Common for Line-by-Line Processing)

While while read is preferred for reliability, other methods exist for specific scenarios:

1. Using cat and a for Loop (Discouraged for Line-by-Line)

For very simple files where each "line" is actually a single word or an item without spaces, a for loop combined with cat can be used. However, this is generally not recommended for arbitrary file content because for loops iterate over words, not lines, by default, and cat loads the entire file into memory.

# DO NOT USE for files with spaces or special characters in lines
for word in $(cat "$file"); do
  echo "$word"
done

This will split lines like "Cherry Blossom" into "Cherry" and "Blossom", processing them as separate "words."

2. Using mapfile or readarray (Bash 4+ Specific)

For reading an entire file into an array where each array element is a line from the file, mapfile (or its alias readarray) is an efficient option in Bash version 4 and later.

#!/bin/bash

file='read_file.txt'

# Read the entire file into an array, one line per element
mapfile -t lines < "$file"

# Iterate over the array
for i in "${!lines[@]}"; do
  echo "Line No. $((i+1)) : ${lines[$i]}"
done
  • -t: Removes trailing newlines from each line read.
  • "${!lines[@]}": Expands to the indices of the array.

This method is useful when you need to access lines randomly or process the file multiple times without re-reading it from disk.

Choosing the Right Method

The choice of method depends on your specific needs:

Method Use Case Advantages Disadvantages
while read -r line Line-by-line processing of any text file Robust, efficient, handles special characters well. Slightly more verbose than simple cat pipes.
while IFS= read -r line Preserving exact line content, including spaces Handles leading/trailing spaces, no word splitting. Can be slightly confusing without IFS knowledge.
mapfile -t array Loading entire file into memory for array access Fast access to all lines, useful for re-processing. Consumes more memory for large files.
for word in $(cat file) Processing simple lists of single words/items Concise for very specific, simple cases. Not recommended for general file processing.

For most scenarios, while read -r line (or while IFS= read -r line if whitespace preservation is critical) is the most reliable and efficient way to read an input file in Bash.