How To Use Rsync To Copy Only Specific Subdirectories?

Published August 11, 2024

Problem: Copying Specific Subdirectories with Rsync

Rsync is a tool for file synchronization, but it can be hard to copy only specific subdirectories within a larger directory structure. This issue can cause unnecessary data transfer and increase synchronization time when you only need certain subdirectories.

Step-by-Step Guide

Generating the Include File

To copy specific subdirectories using rsync, create an include file. This file will list the directories you want to copy. Use the ls and awk commands to generate this file:

ls /path/to/old/data | awk '{print "+ " $1 "/\n+ " $1 "/unique_folder1/***"}' > include.txt

This command creates an include file with this format:

+ company1/
+ company1/unique_folder1/***
+ company2/
+ company2/unique_folder1/***
...

Tip: Customizing the Include File

You can modify the awk command to include additional subdirectories or files. For example, to include two specific folders, you can use:

ls /path/to/old/data | awk '{print "+ " $1 "/\n+ " $1 "/unique_folder1/***\n+ " $1 "/unique_folder2/***"}' > include.txt

This will generate an include file that copies two specific subdirectories for each company.

Copying the Include File

After creating the include file on server 1, transfer it to server 2:

scp include.txt user@server2:/path/to/new/data/

This command copies the include file to the destination server where you'll run the rsync command.

Executing the Rsync Command

Now, run the rsync command on server 2 using the include file:

rsync -avz --include-from='/path/to/new/include.txt' --exclude='*' -e ssh user@server1:/path/to/old/data/ /path/to/new/data/

This command does the following:

  • -avz: Archive mode, verbose output, and compression
  • --include-from: Specifies the file with inclusion patterns
  • --exclude='*': Excludes all files and directories not included
  • -e ssh: Uses SSH for the transfer

By using this method, you can copy only the specific subdirectories you need, saving time and bandwidth.