That doesn't seem to bad, does it? Well, let's explore some of the pitfalls we experienced along the way.
First, the original data set was actually more than twice as large as the subset we decided to process. We took a subset because it still offered scientific intrigue, but was more manageable. These data were downloaded from the public repository onto one workstation. We actually attempted to process much of this data on the one server on which the data resided, but this failed. So then we had to transfer it to various shares and external drives, then to redistribute to the final destinations for processing.