Cloud

Accelerated Ingest to Cloud Services

By Megan Cater, Senior Manager Digital Content at Signiant –

We are in an era of data inundation. According to a report by IDC, by 2020 there will be as many digital bits in existence on the Internet and in storage devices as there are known stars in the universe. Much of that data has a neglected life, created only to be forgotten moments later. But a subset is highly valued, from personal account information to business assets and intellectual property. Those are the files that we want to protect, analyze, edit, share, backup, and store.

However, as we move into a future where data will only continue to grow in amount and importance, we will need more storage for the valuable bits we want to keep as well as much improved digital mechanisms to move them around.

Many businesses want to utilize cloud object storage services such as Amazon S3 and Microsoft Azure to solve the storage issue, and to gain access to business specific cloud applications. But moving large data to and from the cloud, especially when multiple locations or distance is involved, remains an obstacle to effectively incorporating cloud services.

Cloud Services and the Network Bottleneck Dilemma

Cloud services such as object storage fit the needs of many companies, giving them scalability and global access to their business-critical files. For many applications, it’s important to get files into and out of the cloud very quickly — and that’s where cloud services fall short, because the standard mechanisms for cloud I/O (input/output) have significant limitations, especially when dealing with large files, latency and congestion.

Data is typically moved over the Internet, including in and out of cloud services, through HTTP (hypertext transfer protocol) and the underlying TCP (transmission control protocol). While adequate for most Internet traffic, HTTP with TCP is optimized for loading web resources in a browser, not sending large blocks of data to a destination, and don’t perform well for the transfer of large files over long distances when latency and packet loss come into play. Especially when you get high bandwidth pipes and some distance involved, HTTP with TCP starts to slow down quite dramatically.

It’s important to note that bandwidth and latency are independent variables, so simply getting a higher-bandwidth connection doesn’t solve the problem. Here’s the difference: bandwidth is the theoretical maximum rate data can be transferred over a given network or network segment (usually measured in bits per second).

Latency is the amount of time it takes for data to travel across the network or network segment (usually measured in thousandths of a second, or milliseconds).

If you think of a network as a garden hose, the bandwidth is the width of the hose and the latency is the time it takes for water to travel the full length of the hose. Latency is not reduced by increased bandwidth; no matter how big your hose is, the data still has to travel the distance. In theory, data travels at the speed of light. But, in practice, switching and other overhead in the network increase latency and slow it down.

The speed of light is pretty fast: it takes light approximately 1/10th of a second (or 100ms) to travel around the world; but that’s without considering any network switching overhead — and there are typically more switches involved for longer distances. The latency when sending data halfway around the world and back can be more than 1/3 of a second.

There are two ways to look at how far the data packet has to travel: physical distance, and network distance. Even when sending data between two locations in the same building, data can travel the equivalent of across the continent if it’s on different service provider’s networks; so, it’s the network distance, rather than the physical distance that’s important.

Network distance is what causes throughput bottlenecks in most systems, slowing down content transmission even if you’ve got a high-bandwidth pipe. These network bottlenecks can disrupt a successful business model if workflows depend on moving data into and out of cloud object storage or other cloud services in a fast, reliable manner.

Getting Data to the Cloud

Public cloud platforms like as Amazon Web Services and Microsoft Azure a few solutions for ingesting data including free upload utilities, shipping physical media and dedicated network connections. Some of these address latency better than others, and each has pros and cons depending on the use case.

Physical Media

For storing large amounts of data, one of the most basic approaches is to stick your content on data tapes or disk drives and ship them to a provider who then loads it into their cloud object storage. Offerings such as AWS Snowball provide a storage appliance bundled with shipping and ingest services. While shipping physical media works and you can’t beat the bandwidth of a truck, when you have to do this over and over again, it starts to have some serious logistical challenges. And it’s not very easy to automate. So, for ongoing workflows, most of us prefer an electronic, network-based approach.

Dedicated Connectivity

Dedicated network connections such as AWS Direct Connect and Azure ExpressRoute provide ways to connect directly into cloud infrastructure rather than going through the public Internet. A direct and fixed connection from your data center to your cloud provider’s data center will generally minimize latency. But, in order to take advantage of a dedicated connection, first you have to go to your friendly neighborhood Telco and order a fixed circuit that runs from your premise to the cloud provider’s POP. You can then buy connectivity from there to the cloud, but it isn’t cheap and isn’t available in every location.

If it is available, this approach can be a good choice if you have a predictable amount of data being transported every day, and it’s always coming from the same location. But it doesn’t work well if data is being moved from multiple locations into the cloud. And, because you’re deploying fixed network infrastructure to address the problem, it is especially inefficient if you have peaks and valleys in the amount of data you are sending. A dedicated network is not agile or elastic and cannot adapt to changing demands — you’re paying for that connection even when you’re not using it.

Upload Utilities

Cloud provider upload utilities are a useful and cost-effective way to ingest data. However, without acceleration built in, they can be slow and unreliable when transferring large amounts of data over a long distance. Some upload utilities attempt to address the latency issues with HTTP by breaking objects into chunks and uploading the chunks via multiple parallel HTTP streams. While these “multipart upload utilities” (such as Microsoft AzCopy) do not include acceleration, they can be used with Signiant Flight to optimize each stream for the most efficient data movement possible.

Acceleration Software

Modern acceleration software advances standard Internet protocols such as HTTP, transferring data up to 200 times faster and is especially impactful when latency is present. Signiant Flight utilizes Signiant’s core acceleration software along with Flight’s patented scale-out architecture, which enables the use of multiple cloud compute instances for a single transfer, to move data into and out of cloud services at unprecedented multi-Gbps speeds.

Flight is implemented as a secure, fully managed cloud service and works with any client-side software or upload utility. Easily accelerate transfers via familiar interfaces such as web browsers or CLIs and SDKs from cloud vendors. Flight improves the performance of any cloud workflow where AWS or Microsoft Azure client software is already embedded.

To learn more about Flight, visit signiant.com/flight.