Copy a folder overwriting ONLY smaller files in destination
I have tons of PDFs in multiple sub-folders in /home/user/original
that I have compressed using ghostscript pdfwrite
in /home/user/compressed
.
ghostscript
has done a great job at compressing about 90% of the files however the rest of them ended up bigger than originals.
I would like to cp /home/user/compressed
to /home/user/original
overwriting files that are only smaller than the ones in destination while the bigger ones are skipped.
Any ideas?
Perl’s -s operator to the rescue!
Create an executable Perl script overwrite-smaller:
#!/bin/perl
use warnings;
use strict;
use File::Copy;
my $file = shift;
(my $compressed = $file) =~ s/original/compressed/;
copy($compressed, $file) if -s $compressed < -s $file;
And run it for each file in the original directory:
find /home/user/original -type f -exec overwrite-smaller {} \;
Or, once in Perl, write the subtree walking there as well:
#!/usr/bin/perl
use warnings;
use strict;
use File::Copy;
use File::Find;
find({no_chdir => 1,
wanted => sub {
my $file = $File::Find::name;
-f $file or return;
(my $compressed = $file) =~ s/original/compressed/;
copy($compressed, $file) if -s $compressed < -s $file; }}, 'original');
The following find command should work for this:
cd /home/user/original
find . -type f -exec bash -c 'file="$1"; rsync --max-size=$(stat -c '%s' "$file") "/home/user/compressed/$file" "/home/user/original/$file"' _ {} \;
The key part of this solution is the –max-size provided by rsync. From the rsync manual:
--max-size=SIZE
This tells rsync to avoid transferring any file that is larger than the specified SIZE.
So the find command operates on the destination directory (/home/user/original) and returns a list of files. For each file, it spawns a bash shell that runs the rsync command. The SIZE parameter for –max-size option is set by running a stat command against the destination file.
In effect, the rsync processing logic becomes this:
If the source file is larger than than the destination file, the –max-size parameter will prevent the source file from being transferred.
If the source file is smaller than the destination file, the transfer will proceed as expected.
This logic will result in only the smaller files being transferred from the source directory to the destination directory.
I have tested this in a few different ways, and it works for me as expected. However, you may want to create a backup of the destination directory before you try it out on your system.