zsh:funcs:finddup: Support filenames with spaces
Previously when filenames contained spaces, the function would break as
`awk {print $2,$1}` would only print a part of the filename.
The field swap was used as a workaround so that `uniq` only compares the
sizes, and `uniq` unfortunately only has a flag to **skip** fields.
Fix this issue by using a short awk script that mimics `uniq` but only
with the first field (i.e. the size).
My awk foo is unfortunately not very good, and that is why the one-liner
prints out the first duplicated line multiple time. The `sort -u` pipe
afterwards gets rid of those.
This commit is contained in:
@@ -554,15 +554,17 @@ suffix() {
|
||||
finddup() {
|
||||
# find all files, filter the ones out with unique size, calculate md5 and
|
||||
# print duplicates
|
||||
# TODO: Fix duplicate lines output in the awk script that currently `sort
|
||||
# -u` handles
|
||||
find "$@" -type f -exec du '{}' '+' \
|
||||
| awk '{print $2,$1}' \
|
||||
| sort -k2 \
|
||||
| uniq -f1 -D \
|
||||
| awk '{print $1}' \
|
||||
| sort \
|
||||
| awk '{ if (!_[$1]) { _[$1] = $0 } else { print _[$1]; print $0; } }' \
|
||||
| sort -u \
|
||||
| cut -d$'\t' -f2- \
|
||||
| xargs -d'\n' md5sum \
|
||||
| sort \
|
||||
| uniq -w32 --all-repeated=separate \
|
||||
| awk '{print $2}'
|
||||
| cut -d' ' -f3-
|
||||
}
|
||||
|
||||
# Wrapper around tmsu that searches for .tmsu/db in all parent directories and
|
||||
|
||||
Reference in New Issue
Block a user