dd is a very handy shell command for writing raw data blocks from one place to another. Since it can read directly from raw device files, it is very useful for copying entire partitions or drives from one location to another. One traditional way to get this drive data from one location to another is to pipe DD's output stream over SSH to a shell on a remote machine which in turn uses dd to pipe it to a given output file/device. This is commonly invoked as:
dd if=/dev/sda | ssh email@example.com "dd of=/dev/sdb"
This would make an exact copy of the local /dev/sda device on /dev/sdb attached to servername.net. This method generally suffices for fast speed transfers and smaller amounts of data, and also has the advantage of using SSH's built in encryption for secure transfers.
The problem, however, is that because of this encryption SSH has a lot of overhead which sacrifices transfer speed. When data encryption is not a concern (i.e. for internal network transfers), then there is another option. Netcat (or nc on some systems) is a handy utility for setting up quick and dirty TCP or UDP sockets for the transmission of, well really anything. I will get into the details of how to perform a transfer akin to the one referenced above, however let's first take a look at some of the speed comparisons of dd over netcat vs dd over SSH.
Tests below were conducted with an empty 10GB partition. Since bzip encoding includes pooling similar data to achieve compression, partitions with actual data are most definitely going to see slower transfer rates and longer times. For instance, a partition with 2.5GB worth of data slows bzipped results by 50%. The results included here are to showcase he performance improvements by using netcat, which has a much lower overhead that SSH.
Each transfer method was repeated three times with the same empty 10GB partition and then averaged here. The tests were conducted within our datacenter, from two servers within the same facility, in different VLANs, each uplinked at 100Mbit:
|Time Elapsed (Sec)||Speed (MB/s)|
|Over Netcat (no compression)||1622.4||6.6|
|Over Netcat (bzip compression)||889.3||12.1|
|Over Netcat (16M block size + bzip)||490.0||21.9|
|Time Savings (Seconds)||Percentage Savings|
|vs Netcat (no compression)||165.0||9%|
|vs Netcat (bzip compression)||898.1||50%|
|vs Netcat (16M block size + bzip)||1297.3||73%|
Further on the subject of the warning above: Because netcat does not use any sort of authentication mechanism, it is possible for someone who knows your netcat port (e.g. from a trivial portscan) to inject arbitrary data into the stream thereby corrupting your
dd operation. You will probably want to implement a firewall rule on the server to restrict traffic sent on the netcat port to only be permitted from the address of your remote transfer host.
We will assume for the purposes of this tutorial that you have the
nc version of netcat. If you have the the other, then the command line options will be slightly different but the idea is the same. You set up a listening server on the destination, and then you send data to the port you've specified form the source. Let's assume we are transferring a full disk image from serverA (/dev/sda) to serverB (dev/sdb). We are going to assume block size of incoming data for
dd will be 16MB and that it will be bzip compressed. On serverB we would run the following:
nc -l 19000|bzip2 -d|dd bs=16M of=/dev/sdb
This tells netcat to listen on port 19000 for incoming data, then pipe that data to bzip for decompression, and then finally pipe the decompressed data to dd to be written to /dev/sdb.
Once we have this listening (you won't see any output after you hit Enter), we can move on to starting the data transfer on serverA:
dd bs=16M if=/dev/sda|bzip2 -c|nc serverB.example.net 19000
You again will not see any output after you've hit Enter, but do not fret! You can start another session (or launch the netcat in a screen session and back out), and run a tcpdump on port 19000 on serverB to ensure that traffic is indeed flowing. You can also send a USR1 signal to dd and it will output it's current statistics. You'll get a DD output on both ends summarizing the read/write time and bytes transferred when the process is complete. In this case no additional configuration is needed. /dev/sdb is a mountable and readable block device that's ready for use!