October 27, 2024
When it comes to uploading large files to AWS-S3, there are two main techniques that can be used: creating chunks on the frontend and uploading them using the AWS multipart API, or uploading the complete file from the frontend and then creating chunks on the backend and doing parallel upload using the AWS SDK 3.
GitHub Repository: https://github.com/Huzaifa-Asif/aws-s3-fast-upload
In this technique, the frontend creates chunks of the file that need to be uploaded, then sends these chunks to the backend, which uploads them to the S3 bucket using the AWS multipart API.
The frontend code is in javascript, In this we are first calling /initiateUpload endpoint to get the UploadId and then we make requests to /upload endpoint to upload all the endpoints with the UploadId we get from the first endpoint, then once all the chunks are uploaded then we finally call the /completeUpload endpoint and it will combine all the chunks and provide us the final URL.
// Set up AWS S3 bucket configuration
const s3 = new AWS.S3({
accessKeyId: process.env.YOUR_ACCESS_KEY,
secretAccessKey: process.env.YOUR_SECRET_KEY,
region: process.env.YOUR_BUCKET_REGION,
useAccelerateEndpoint: true,
});
const bucketName = process.env.YOUR_BUCKET_NAME;
// Listen for upload button click event
uploadBtn.addEventListener('click', async () => {
if (!file) {
return alert('Please select a file');
}
uploadBtn.disabled = true;
try {
// Start the timer
const startTime = new Date();
// Initiate multipart upload
const requestBody = { fileName };
console.log("requestBody ", requestBody);
const res = await fetch(`${baseUrl}/initiateUpload`, {
method: 'POST',
body: JSON.stringify(requestBody),
headers: {
'Content-Type': 'application/json',
},
});
const { uploadId } = await res.json();
console.log("uploadId ", uploadId);
// Send file chunks
const uploadPromises = [];
let uploadedChunks = 0;
let start = 0, end;
for (let i = 0; i < totalChunks; i++) {
end = start + CHUNK_SIZE;
const chunk = file.slice(start, end);
const formData = new FormData();
formData.append('index', i);
formData.append('totalChunks', totalChunks);
formData.append('fileName', fileName);
formData.append('file', chunk);
const uploadPromise = fetch(`${baseUrl}/upload?uploadId=${uploadId}`, {
method: "POST",
body: formData,
}).then(() => {
uploadedChunks++;
const progress = Math.floor((uploadedChunks / totalChunks) * 100);
updateProgressBar(progress);
});
uploadPromises.push(uploadPromise);
start = end;
}
await Promise.all(uploadPromises);
// Complete multipart upload
const completeRes = await fetch(`${baseUrl}/completeUpload?fileName=${fileName}&uploadId=${uploadId}`, { method: 'POST' });
const { success, data } = await completeRes.json();
console.log("file link: ", data);
if (!success) {
throw new Error('Error completing upload');
}
// End the timer and calculate the time elapsed
const endTime = new Date();
const timeElapsed = (endTime - startTime) / 1000;
console.log('Time elapsed:', timeElapsed, 'seconds');
alert('File uploaded successfully');
resetProgressBar();
} catch (err) {
console.log(err);
alert('Error uploading file');
}
uploadBtn.disabled = false;
});
The backend code is in Node.js, In this we have installed AWS-SDK version 2 and have the endpoint for /initiateUpload to create UploadId, then we have /upload endpoint to upload chunks and then we have the endpoint /completeUpload to combine all the chunks and provide the URL.
// Initiate multipart upload and return uploadId
app.post('/initiateUpload', async (req, res) => {
try {
const { fileName } = req.body;
const params = {
Bucket: bucketName,
Key: fileName,
};
const upload = await s3.createMultipartUpload(params).promise();
res.json({ uploadId: upload.UploadId });
} catch (error) {
console.error(error);
res.status(500).json({ success: false, message: 'Error initializing upload' });
}
});
// Receive chunk and write it to S3 bucket
app.post('/upload', upload.single("file"), (req, res) => {
const { index, fileName } = req.body;
const file = req.file;
const s3Params = {
Bucket: bucketName,
Key: fileName,
Body: file.buffer,
PartNumber: Number(index) + 1,
UploadId: req.query.uploadId
};
s3.uploadPart(s3Params, (err, data) => {
if (err) {
console.log(err);
return res.status(500).json({ success: false, message: 'Error uploading chunk' });
}
return res.json({ success: true, message: 'Chunk uploaded successfully' });
});
});
// Complete multipart upload
app.post('/completeUpload', (req, res) => {
const { fileName } = req.query;
const s3Params = {
Bucket: bucketName,
Key: fileName,
UploadId: req.query.uploadId,
};
s3.listParts(s3Params, (err, data) => {
if (err) {
console.log(err);
return res.status(500).json({ success: false, message: 'Error listing parts' });
}
const parts = [];
data.Parts.forEach(part => {
parts.push({
ETag: part.ETag,
PartNumber: part.PartNumber
});
});
s3Params.MultipartUpload = {
Parts: parts
};
s3.completeMultipartUpload(s3Params, (err, data) => {
if (err) {
console.log(err);
return res.status(500).json({ success: false, message: 'Error completing upload' });
}
console.log("data: ", data)
return res.json({ success: true, message: 'Upload complete', data: data.Location});
});
});
});
In this technique, the complete file is uploaded from the frontend, and then the backend creates chunks of the file and uploads them to the S3 bucket in parallel using the AWS SDK 3.
The frontend code is in javascript, In this we are calling /upload_parallel endpoint to upload the file.
// Listen for upload button click event
uploadBtn.addEventListener('click', async () => {
if (!file) {
return alert('Please select a file');
}
uploadBtn.disabled = true;
try {
// Start the timer
const startTime = new Date();
const formData = new FormData();
formData.append('file', file);
const uploadRes = await fetch(`${baseUrl}/upload_parallel`, {
method: "POST",
body: formData,
});
const { success, data } = await uploadRes.json();
console.log("file link: ", data);
if (!success) {
throw new Error('Error completing upload');
}
// End the timer and calculate the time elapsed
const endTime = new Date();
const timeElapsed = (endTime - startTime) / 1000;
console.log('Time elapsed:', timeElapsed, 'seconds');
alert('File uploaded successfully');
} catch (err) {
console.log(err);
alert('Error uploading file');
}
uploadBtn.disabled = false;
});
The backend code is in Node.js, In this we have installed AWS-SDK version 3 (lib-storage, client-s3) and have the endpoint for /upload_parallel that will upload large file by creating small chunks and uploading them in parallel.
// Receive Large file and write in chunks to S3 bucket
app.post('/upload_parallel', upload.single("file"), (req, res) => {
const file = req.file
// params for s3 upload
const params = {
Bucket: bucketName,
Key: `${Date.now().toString()}_${file.originalname}`,
Body: file.buffer,
}
try {
// upload file to s3 parallelly in chunks
// it supports min 5MB of file size
const uploadParallel = new Upload({
client: s3,
queueSize: 4, // optional concurrency configuration
partSize: 5542880, // optional size of each part
leavePartsOnError: false, // optional manually handle dropped parts
params,
})
// checking progress of upload
uploadParallel.on("httpUploadProgress", progress => {
console.log(progress)
})
// after completion of upload
uploadParallel.done().then(data => {
console.log("upload completed!", { data })
return res.json({ success: true, data: data.Location })
})
} catch (error) {
res.send({
success: false,
message: error.message,
})
}
});
The bucket should be in the following region mentioned below, because transfer acceleration is not available in all the regions. To check the updated list visit this link: https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration.html
Once you create the bucket then click on the properties tab and scroll down to Transfer Acceleration Tab, then click on edit and Enable it.
Enabling Transfer Acceleration in your S3 bucket it will make the S3 bucket faster to upload large files.
Select the Bucket then click on Permissions and tab then scroll down and click on the edit button in Bucket Settings, then paste the properties mentioned below and change the arn.
S3 Bucket policy - Change arn with your bucket ARN
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": [
"s3:GetObject"
],
"Resource": [
"paste your bucket ARN/*"
]
}
]
}
Both techniques have their pros and cons. Technique-1 allows for finer-grained control over the upload process but requires more work on the frontend to create the chunks. Technique-2 simplifies the frontend code but requires more processing on the backend to make the chunks and upload them in parallel.
Ultimately, the choice of technique depends on the specific use case and requirements. By understanding the strengths and weaknesses of each technique, you can make an informed decision on which one to use for your particular situation.