zsh:funcs:finddup: Support filenames with spaces

Previously when filenames contained spaces, the function would break as
`awk {print $2,$1}` would only print a part of the filename.

The field swap was used as a workaround so that `uniq` only compares the
sizes, and `uniq` unfortunately only has a flag to **skip** fields.

Fix this issue by using a short awk script that mimics `uniq` but only
with the first field (i.e. the size).

My awk foo is unfortunately not very good, and that is why the one-liner
prints out the first duplicated line multiple time. The `sort -u` pipe
afterwards gets rid of those.
This commit is contained in:
2022-12-28 01:43:01 +01:00
parent 609d6d0dda
commit b38e01c72a

View File

@@ -554,15 +554,17 @@ suffix() {
finddup() {
# find all files, filter the ones out with unique size, calculate md5 and
# print duplicates
# TODO: Fix duplicate lines output in the awk script that currently `sort
# -u` handles
find "$@" -type f -exec du '{}' '+' \
| awk '{print $2,$1}' \
| sort -k2 \
| uniq -f1 -D \
| awk '{print $1}' \
| sort \
| awk '{ if (!_[$1]) { _[$1] = $0 } else { print _[$1]; print $0; } }' \
| sort -u \
| cut -d$'\t' -f2- \
| xargs -d'\n' md5sum \
| sort \
| uniq -w32 --all-repeated=separate \
| awk '{print $2}'
| cut -d' ' -f3-
}
# Wrapper around tmsu that searches for .tmsu/db in all parent directories and