zsh:funcs:finddup: Support filenames with spaces

Previously when filenames contained spaces, the function would break as
`awk {print $2,$1}` would only print a part of the filename.

The field swap was used as a workaround so that `uniq` only compares the
sizes, and `uniq` unfortunately only has a flag to **skip** fields.

Fix this issue by using a short awk script that mimics `uniq` but only
with the first field (i.e. the size).

My awk foo is unfortunately not very good, and that is why the one-liner
prints out the first duplicated line multiple time. The `sort -u` pipe
afterwards gets rid of those.
This commit is contained in:
2022-12-28 01:43:01 +01:00
parent 609d6d0dda
commit b38e01c72a

View File

@@ -554,15 +554,17 @@ suffix() {
finddup() { finddup() {
# find all files, filter the ones out with unique size, calculate md5 and # find all files, filter the ones out with unique size, calculate md5 and
# print duplicates # print duplicates
# TODO: Fix duplicate lines output in the awk script that currently `sort
# -u` handles
find "$@" -type f -exec du '{}' '+' \ find "$@" -type f -exec du '{}' '+' \
| awk '{print $2,$1}' \ | sort \
| sort -k2 \ | awk '{ if (!_[$1]) { _[$1] = $0 } else { print _[$1]; print $0; } }' \
| uniq -f1 -D \ | sort -u \
| awk '{print $1}' \ | cut -d$'\t' -f2- \
| xargs -d'\n' md5sum \ | xargs -d'\n' md5sum \
| sort \ | sort \
| uniq -w32 --all-repeated=separate \ | uniq -w32 --all-repeated=separate \
| awk '{print $2}' | cut -d' ' -f3-
} }
# Wrapper around tmsu that searches for .tmsu/db in all parent directories and # Wrapper around tmsu that searches for .tmsu/db in all parent directories and