tutorials & docs,tools & experiences for developers

The Starter Tutorial for AWK

awk is an application for processing text files, and almost all Linux systems come with the program.

It processes each line of the file in turn and reads every field inside it. awk is probably the most convenient tool for the text files of the same format for each line (such as logs and CSV).

awk is not only a tool software but also a programming language. However, I'll only introduce its command line usage in this article, which I think should be sufficient for most occasions.

1. Basic Usage

The following is the basic usage for awk.

# format
$ awk action file_name

# example
$ awk '{print $0}' demo.txt

In the above example, demo.txt is the text file that awk is to process. There is a pair of braces inside the single quotes, and the processing action print $0 for each line is inside the braces. As we can know, print is a print command, and $0 represents the current line, so the execution result of the above command is to print each line as it is.

Now let's demonstrate the above example with standard input (stdin).

$ echo 'this is a test' | awk '{print $0}'
this is a test

In the above code, print $0 is to reprint the stdin this is a test.

awk can divide each line into several fields based on spaces and tabs, and use $1, $2, and $3 in turn to represent the first field, the second field, the third field, and so on.

$ echo 'this is a test' | awk '{print $3}'
a

In the above code, $3 represents the third field a of this is a test.

Save the /etc/passwd file as demo.txt.

root:x:0:0:root:/root:/usr/bin/zsh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync

The field separator for this file are colons (:), so you need to use the -F parameter to specify the separator as a colon. Only then can you extract the first field of it.

$ awk -F ':' '{ print $1 }' demo.txt
root
daemon
bin
sys
sync

2. Variables

In addition to the $ + number representing a certain field, awk also provides some other variables.

The variable NF indicates how many fields there are in the current row, so $NF represents the last field.

$ echo 'this is a test' | awk '{print $NF}'
test

And $(NF-1) represents the penultimate field.

$ awk -F ':' '{print $1, $(NF-1)}' demo.txt
root /root
daemon /usr/sbin
bin /bin
sys /dev
sync /bin

In the above code, the comma inside the print command indicates that the spaces will be used to separate two parts when outputting.

The variable NR indicates which line is being processed currently.

$ awk -F ':' '{print NR ") " $1}' demo.txt
1) root
2) daemon
3) bin
4) sys
5) sync

In the above code, if you want to output the characters as they are in the print command, they should be placed in double quotes.

Other built-in variables of awk are as follows.

  • FILENAME: The current file name
  • FS: Field separator, the defaults are spaces and tabs.
  • RS: Line separator, which is used to split each line. The default is a line break.
  • OFS: The separator for the output field, which is used to separate fields when printing. The default is space.
  • ORS: The separator for the output record, which is used to separate records when printing. The default is a line break.
  • OFMT: The format of the digital output. The default is %.6g.

3. Functions

awk also provides some built-in functions to facilitate the processing for raw data.

The function toupper() is used to convert characters to uppercase.

$ awk -F ':' '{ print toupper($1) }' demo.txt
ROOT
DAEMON
BIN
SYS
SYNC

In the above code, the first fields are all capitalized when they are output.

Other common functions are as follows.

  • tolower(): Convert the characters to lowercase.
  • length(): Return the length of the string.
  • substr(): Return the substring.
  • sin(): Sine.
  • cos(): Cosine.
  • sqrt(): Square root.
  • rand(): Random number.

Check the manual for the complete list of awk built-in functions.

4. Conditions

awk allows you to specify output conditions and then it will only output the rows that match the criteria.

The output conditions should be written before the action.

$ awk 'condition action' filename

Take a look at the example below.

$ awk -F ':' '/usr/ {print $1}' demo.txt
root
daemon
bin
sys

In the above code, the regular expression is followed by the print command, so it only outputs the lines containing usr.

It only outputs the odd lines, and outputs the lines after the third line in the following example.

# Output the odd lines.
$ awk -F ':' 'NR % 2 == 1 {print $1}' demo.txt
root
bin
sync

# Output the line after the third line.
$ awk -F ':' 'NR >3 {print $1}' demo.txt
sys
sync

And in the following example it outputs the lines which are in the first fields and equal to the specified value.

$ awk -F ':' '$1 == "root" {print $1}' demo.txt
root

$ awk -F ':' '$1 == "root" || $1 == "bin" {print $1}' demo.txt
root
bin

5. The if Statement

awk provides the if structure for you to write complex conditions.

$ awk -F ':' '{if ($1 > "m") print $1}' demo.txt
root
sys
sync

The above code outputs the first fields of which the first characters are greater than m.

The if structure can be also used to specify the else part.

$ awk -F ':' '{if ($1 > "m") print $1; else print "---"}' demo.txt
root
---
---
sys
sync

6. Reference

0 Comment

temp