Calculate Word Frequency of Files in Bash

I was reading Ryan Tomayko’s blog post AWK-ward Ruby explaining how the Unix AWK Tool is among the ancestors of Ruby and Perl. He wrote a few examples showing some of AWK’s advanced features, one of them which listed the word frequencies of any file provided. I found this example quite useful and extracted it as a function in to my Dotfiles.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#!/bin/bash

function word_frequency() {
  awk '
     BEGIN { FS="[^a-zA-Z]+" }

     {
         for (i=1; i<=NF; i++) {
             word = tolower($i)
             words[word]++
         }
     }

     END {
         for (w in words)
              printf("%3d %s\n", words[w], w)
     }
 ' |
  sort -rn
}

Now you can pipe the output of any file to this function and it will list all words and their frequencies in Descending order.

1
2
3
4
5
6
7
# Examples:

cat my_text_file.txt | word_frequency
# Pipe the contents of a text file to the function using `cat`

curl -s https://github.com/humans.txt | word_frequency
# Get word frequency of a file on the internet

Looking forward to using AWK more and more.