[leetcode-shell]192-统计词频

来源：力扣（LeetCode）

著作权归领扣网络所有。商业转载请联系官方授权，非商业转载请注明出处。

一、题目描述

写一个 bash 脚本以统计一个文本文件 words.txt 中每个单词出现的频率。

为了简单起见，你可以假设：

示例：

假设 words.txt 内容如下：

the day is sunny the the
the sunny is is

1 2	the day is sunny the the the sunny is is

你的脚本应当输出（以词频降序排列）：

the 4
is 3
sunny 2
day 1

the 4

is 3

sunny 2

day 1

说明：

通过 NF 变量遍历所有字段，存到一个哈希表（数组）中，然后打印出所有的 key-value 组合，最后通过 sort 排序。

awk '{for (i = 1; i <= NF; i++) {m[$i]++;}} END {for (i in m) {print i, m[i]}}' words.txt | sort -nr -k 2

1	awk '{for (i = 1; i <= NF; i++) {m[$i]++;}} END {for (i in m) {print i, m[i]}}' words.txt \| sort -nr -k 2

通过 xargs 的-n 参数打印出所有的字段，然后使用 uniq 和 sort 对字段排序：

cat file.txt | xargs -n 1 | sort | uniq -c | sort -nr -k 2 | awk '{print $2" "$1}'

1	cat file.txt \| xargs -n 1 \| sort \| uniq -c \| sort -nr -k 2 \| awk '{print $2" "$1}'

uniq 的-c 参数是统计词频