Using DD Over Netcat vs SSH

dd is a very handy shell command for writing raw data blocks from one place to another. Since it can read directly from raw device files, it is very useful for copying entire partitions or drives from one location to another. One traditional way to get this drive data from one location to another is to pipe DD's output stream over SSH to a shell on a remote machine which in turn uses dd to pipe it to a given output file/device. This is commonly invoked as:

dd if=/dev/sda | ssh username@servername.net "dd of=/dev/sdb"

This would make an exact copy of the local /dev/sda device on /dev/sdb attached to servername.net. This method generally suffices for fast speed transfers and smaller amounts of data, and also has the advantage of using SSH's built in encryption for secure transfers.

The problem, however, is that because of this encryption SSH has a lot of overhead which sacrifices transfer speed. When data encryption is not a concern (i.e. for internal network transfers), then there is another option. Netcat (or nc on some systems) is a handy utility for setting up quick and dirty TCP or UDP sockets for the transmission of, well really anything. I will get into the details of how to perform a transfer akin to the one referenced above, however let's first take a look at some of the speed comparisons of dd over netcat vs dd over SSH.

Performance Benchmarking

Tests below were conducted with an empty 10GB partition. Since bzip encoding includes pooling similar data to achieve compression, partitions with actual data are most definitely going to see slower transfer rates and longer times. For instance, a partition with 2.5GB worth of data slows bzipped results by 50%. The results included here are to showcase he performance improvements by using netcat, which has a much lower overhead that SSH.

Average of Methods

Each transfer method was repeated three times with the same empty 10GB partition and then averaged here. The tests were conducted within our datacenter, from two servers within the same facility, in different VLANs, each uplinked at 100Mbit:

Time Elapsed (Sec) Speed (MB/s)
Over SSH 1787.4 6.1
Over Netcat (no compression) 1622.4 6.6
Over Netcat (bzip compression) 889.3 12.1
Over Netcat (16M block size + bzip) 490.0 21.9

Average Time Savings

Time Savings (Seconds) Percentage Savings
Over SSH - 0%
vs Netcat (no compression) 165.0 9%
vs Netcat (bzip compression) 898.1 50%
vs Netcat (16M block size + bzip) 1297.3 73%

Test Notes

  • Bzip may or may not be ideal for your transfer, and this should be judged on a case by case basis. Bzip compression of mostly textual data is certainly going to be more efficient than MP3 or JPEG data. Bzip compression also comes with increased CPU overhead, which may increase transfer time.
  • Although a block size of 16M was used in the test, you may have more luck with smaller or larger block sizes depending on the structure of your network. 16M was arrived at by trying different values from 1M to 64M, although these results were not included. A larger block size can also be used on servers with harddrives that have larger on-disk cache (some as high as 64MB).
  • A full empirical test would have also included bzip compression and block size setting with dd over SSH as well, however we felt this was unnecessary as a compressionless and default block size test clearly shows netcat is quicker. It would very likely retain a similar margin of speed if these tests were included.

Using DD over Netcat

Netcat opens an encryption-less connection from one host to another, which is why it outperforms SSH. If using the netcat method, take a moment to consider the implications of sending raw, unecrypted data over your network. We strongly recommend against using this method for WAN data transfers, unless you are doing so over an encrypted tunnel (e.g. VPN).

Further on the subject of the warning above: Because netcat does not use any sort of authentication mechanism, it is possible for someone who knows your netcat port (e.g. from a trivial portscan) to inject arbitrary data into the stream thereby corrupting your dd operation. You will probably want to implement a firewall rule on the server to restrict traffic sent on the netcat port to only be permitted from the address of your remote transfer host.

We will assume for the purposes of this tutorial that you have the nc version of netcat. If you have the the other, then the command line options will be slightly different but the idea is the same. You set up a listening server on the destination, and then you send data to the port you've specified form the source. Let's assume we are transferring a full disk image from serverA (/dev/sda) to serverB (dev/sdb). We are going to assume block size of incoming data for dd will be 16MB and that it will be bzip compressed. On serverB we would run the following:

nc -l 19000|bzip2 -d|dd bs=16M of=/dev/sdb

This tells netcat to listen on port 19000 for incoming data, then pipe that data to bzip for decompression, and then finally pipe the decompressed data to dd to be written to /dev/sdb.

Once we have this listening (you won't see any output after you hit Enter), we can move on to starting the data transfer on serverA:

dd bs=16M if=/dev/sda|bzip2 -c|nc serverB.example.net 19000

You again will not see any output after you've hit Enter, but do not fret! You can start another session (or launch the netcat in a screen session and back out), and run a tcpdump on port 19000 on serverB to ensure that traffic is indeed flowing. You can also send a USR1 signal to dd and it will output it's current statistics. You'll get a DD output on both ends summarizing the read/write time and bytes transferred when the process is complete. In this case no additional configuration is needed. /dev/sdb is a mountable and readable block device that's ready for use!