Using AWS Step Functions and Lambda for Fanout
Static download test 1 Mio file = 1 mebioctet. = 2 20 octets = 1,024 Kio = 1,048,576 octets 10 Mio file = 10 mebioctet = 10 x 2 20 octets = 10,240 Kio = 10,485,760 octets 100 Mio file = 100 mebioctet = 100 x 2 20 octets = 102,400 Kio = 104,857,600 octets 1 Gio file = 1 gibioctet = 2 30 octets = 1,024 Mio = 1,073,741,824 octets 10 Gio file = 10 gibioctet = 10 x 2 30 octets = 10,240. Azure speed test tool. Test your network latency, download and upload speed to Azure datacenters around the world. Azure Storage Large File Upload Speed Test - Azure Speed Test. Test-Files Region: NBG1.
Feb 2, 2018·7 min read
See Full List On Download.inspire.net.nz
I have a love for FaaS, and in particular AWS Lambda for breaking so much ground in this space. Many of the most valuable uses I’ve found for Lambda involve cost and performance as core requirements — that is, if the service can be 10x faster or cheaper it will provide disruptive benefits to the customer.
Fanout is a key mechanism for achieving thatkind of cost-efficient performance with Lambda. Fanout is a category of patterns for spreading work among multiple Function invocations to get more done sooner. This is, of course, horizontal scaling (also known as “scaling out”) and works by using many resources to side-step limitations associated with a single resource. Specifically, this might mean getting more CPU cycles in less time, more bytes over the network in less time, more memory, etc.
An example I like to use here is moving a large file into S3, where there will be a limit on the bandwidth available to the Function *and* a limit on the time the function can run (5 minutes). I’ve done some experiments to demonstrate the effective size of file that can be moved through a Lambda in this way. The image below shows the result of a recent one where a Step Function state machine is used to measure the time to download increasingly large files.
The bottom line here is that files larger than a several GB won’t reliably download in a single Lambda invocation. The effective bandwidth over this range of files sizes varied from 400 to 700 million bits per second. Good, but not enough for moving some interesting things (e.g. the NSRL hashsets, Videos, ML training sets, etc.).
Note that AWS will very likely improve these numbers — they have a great track record of continuously delivering on such things. Nonetheless, there will always be a limit, and that limit is small enough now to cause problems. Lambda executions can only run for 5 minutes (300,000ms) so extrapolating the data above indicates that downloading anything above about 15GB will consistently fail. Extrapolating further, it looks like the Lambda execution time limit would need to be increased to over 30 minutes for a 100GB file to have a chance of downloading in a single execution.
At this point we could throw up our hands and go back to long-running transfers on EC2 or an ECS Container, but that would be silly. Fanout is the obvious answer, because:
- We’re moving the file from a website that supports HTTP Range requests (i.e. we can request a specific sub-section of the file rather than the entire thing).
- S3 supports Multi-part Uploads (i.e. we can upload different sections of the file into parts, and combine them once completed)
- Lambda Function executions run as isolated environments with their own CPU and network capabilities
By using multiple executions we can download different ranges of the source file in parallel, with each creating a “part” in S3, and then combine the parts once all ranges are complete. To demonstrate the idea, consider this simple prototype with AWS StepFunctions.
If you aren’t familiar with Step Functions (you might want to be, it’s an excellent tool to have in your kit), the important thing to know here is that each node in the diagram is either a link to a Lambda function to be run (aka. a Task state), or a flow-control node such as a Choice, Pass or Parallel state. Choice states allow control to be passed to one of many subsequent nodes based on conditions on the output of the preceding node. Pass states allow simple transformations to be applied to the input before passing it to the next node (without having to do so in a Lambda).
The core of this state machine is the Parallel state (represented by the dashed border region), which provides concurrency through: Executing its child state machines (aka. branches) asynchronously; waiting for them to complete, and; proceeding to the following node. The output of a Parallel state is an array containing the output of the last node in each child branch.
In the diagram above the left-most branch contains a single Task that downloads the first part of the file (the other two nodes are Pass states that exist only to format input or output). The other branches contain conditional logic based on the size of the file:
- If the file is larger than the minimum needed by the part, download the appropriate 1/5th of the file. For example the second branch will download and create a part only if the file is larger than 5MB, the third 10MB, etc.
- But if the file is less than 5MB ,(or 10, 15, etc. for the other branches) the download is skipped.
As you can see, this idea can be scaled-out to allow the download of very large files and with broad concurrency. What are some of the details here?
The first step is to determine if the source URL supports Ranges would normally be to make an OPTIONS request. AWS S3 endpoints support Ranges but because it’s used for CORS it doesn’t work for simple queries like ours (basically it requires a couple extra headers). So we instead make this check using a HEAD request, which achieves the same result:
Another HTTP-related detail is how to make a request for a subset of content once we know
true. Not all servers/domains will support ranges. If they don’t, asking for a range may (or may not depending on the server software) cause an error response. In some cases, the range request will simply be ignored and the entire content will be returned.
To create S3 upload parts from specific ranges we need to obey some rules for multi-part uploads. Primarily, only the last part can be smaller than 5MB. It’s also notable that we can have no more than 10,000 parts in all. This StepFunction based prototype works well within those bounds.
One more more implementation detail. The payload passed to the function for downloading and creating each part must include the:
- Source URL
- Multi-part Upload ID
- Part Number
The part number and upload ID are required by S3’s UploadPart API. The part number is also used to determine the range of bytes to copy (remember, the end byte index is inclusive). With all parts created, the final step is to combine them by calling S3’s CompleteMultipartUpload API:
- Multi-part Upload ID
- List of Part Numbers and associated ETags returned by the S3 UploadPart API
And that’s it.
Download 2gb File
Here are what the timings looked like for downloading the same large files mentioned in the start of this article:
Except for the smallest file, where the overhead of transitions in the state machine dominate, we’ve delivered a pretty nice speed up. For the largest file (10GB) the speed-up is a near-linear 5x. That’s what I wanted to see in a prototype.
With only 5 branches each limited to 5GB (the maximum size of a part) the maximum download is 25GB. To test the 100GB file I expanded the number of branches to 20 and found the download time to be 93,128ms (that’s an effective download speed of ~1GB/s or 8Gbps). Since each branch in the 10GB file case downloaded only 2GB vs. 5GB in the 100GB case, this again represents near-linear scaling — the best that can be hoped for with concurrency.
How far will this go? To support the full potential of S3 would require 10,000 branches — perhaps that would work, but think other things would start going sideways at that scale. Maybe I’ll find out by looking into dynamically-generating the AWS StepFunctions state machine (with retry and error handling, of course)…
This prototype has taken us from “it can’t do this” to “rocking the download world” with Lambda and a clear and obvious application of the Fanout concept. In a subsequent article I’ll look at a different fanout pattern and scaling out with recursive Lambda executions — mind the guardrails.
Want to go further with this? Improve robustness by making the part creation restart-able. S3 has an API to list incomplete multi-part uploads and the parts created so far. So when the state machine is restarted the parts that completed on the previous try can be no-op’d. Caution, though. Think about all the ways corruption of the file might happen and what kind of verification is needed to make sure a set of parts are safe to use to complete the upload.
- Download Here - https://tinyurl.com/yc5vyvfn (Copy and Paste Link)
- How do I download the files on a Linux Server / via a Linux Shell? MB 1GB 10GB GB GB. MB File: Only to see. You can either install the app from Google PlayStore inside the emulator or download GB Cloud Memory Card APK file from the below link from our site and. Download test files of any size. Including 1GB, 2GB, 5GB, 10GB or generate a file of any size. Download over the network or generate large files on your.
- HIPAA-compliant, group messaging, audit logs, granular security control, embedable on website. Tresorit, 3 GB free trial, GB paid plans, MB free. Your important work, files, and precious memories are protected by our industry-leading security. All Google Accounts include 15 GB of storage for free. When you upload a file to a channel or direct message, it will be stored in Slack. Enterprise Grid plan, 1TB (1,GB) per member.