Problem: Copying Specific Subdirectories with Rsync
Rsync is a tool for file synchronization, but it can be hard to copy only specific subdirectories within a larger directory structure. This issue can cause unnecessary data transfer and increase synchronization time when you only need certain subdirectories.
Step-by-Step Guide
Generating the Include File
To copy specific subdirectories using rsync, create an include file. This file will list the directories you want to copy. Use the ls and awk commands to generate this file:
ls /path/to/old/data | awk '{print "+ " $1 "/\n+ " $1 "/unique_folder1/***"}' > include.txt
This command creates an include file with this format:
+ company1/
+ company1/unique_folder1/***
+ company2/
+ company2/unique_folder1/***
...
Tip: Customizing the Include File
You can modify the awk command to include additional subdirectories or files. For example, to include two specific folders, you can use:
ls /path/to/old/data | awk '{print "+ " $1 "/\n+ " $1 "/unique_folder1/***\n+ " $1 "/unique_folder2/***"}' > include.txt
This will generate an include file that copies two specific subdirectories for each company.
Copying the Include File
After creating the include file on server 1, transfer it to server 2:
scp include.txt user@server2:/path/to/new/data/
This command copies the include file to the destination server where you'll run the rsync command.
Executing the Rsync Command
Now, run the rsync command on server 2 using the include file:
rsync -avz --include-from='/path/to/new/include.txt' --exclude='*' -e ssh user@server1:/path/to/old/data/ /path/to/new/data/
This command does the following:
-avz
: Archive mode, verbose output, and compression--include-from
: Specifies the file with inclusion patterns--exclude='*'
: Excludes all files and directories not included-e ssh
: Uses SSH for the transfer
By using this method, you can copy only the specific subdirectories you need, saving time and bandwidth.