Service-icon-list-icon

October 27, 2024

1. Introduction

When it comes to uploading large files to AWS-S3, there are two main techniques that can be used: creating chunks on the frontend and uploading them using the AWS multipart API, or uploading the complete file from the frontend and then creating chunks on the backend and doing parallel upload using the AWS SDK 3.

GitHub Repository: https://github.com/Huzaifa-Asif/aws-s3-fast-upload

2. Technique 1: Creating Chunks on Frontend and Uploading with AWS Multipart API

In this technique, the frontend creates chunks of the file that need to be uploaded, then sends these chunks to the backend, which uploads them to the S3 bucket using the AWS multipart API.

2.1. The frontend code for this technique has the following logic:

The frontend code is in javascript, In this we are first calling /initiateUpload endpoint to get the UploadId and then we make requests to /upload endpoint to upload all the endpoints with the UploadId we get from the first endpoint, then once all the chunks are uploaded then we finally call the /completeUpload endpoint and it will combine all the chunks and provide us the final URL.

// Set up AWS S3 bucket configuration
const s3 = new AWS.S3({
  accessKeyId: process.env.YOUR_ACCESS_KEY,
  secretAccessKey: process.env.YOUR_SECRET_KEY,
  region: process.env.YOUR_BUCKET_REGION,
  useAccelerateEndpoint: true,
});
const bucketName = process.env.YOUR_BUCKET_NAME;

// Listen for upload button click event
uploadBtn.addEventListener('click', async () => {
  if (!file) {
    return alert('Please select a file');
  }

  uploadBtn.disabled = true;

  try {
    // Start the timer
    const startTime = new Date();

    // Initiate multipart upload
    const requestBody = { fileName };
    console.log("requestBody ", requestBody);
    const res = await fetch(`${baseUrl}/initiateUpload`, {
      method: 'POST',
      body: JSON.stringify(requestBody),
      headers: {
        'Content-Type': 'application/json',
      },
    });
    const { uploadId } = await res.json();
    console.log("uploadId ", uploadId);
    // Send file chunks
    const uploadPromises = [];
    let uploadedChunks = 0;
    let start = 0, end;
    for (let i = 0; i < totalChunks; i++) {
      end = start + CHUNK_SIZE;
      const chunk = file.slice(start, end);
      const formData = new FormData();
      formData.append('index', i);
      formData.append('totalChunks', totalChunks);
      formData.append('fileName', fileName);
      formData.append('file', chunk);
      const uploadPromise = fetch(`${baseUrl}/upload?uploadId=${uploadId}`, {
        method: "POST",
        body: formData,
      }).then(() => {
        uploadedChunks++;
        const progress = Math.floor((uploadedChunks / totalChunks) * 100);
        updateProgressBar(progress);
      });
      uploadPromises.push(uploadPromise);
      start = end;
    }

    await Promise.all(uploadPromises);

    // Complete multipart upload
    const completeRes = await fetch(`${baseUrl}/completeUpload?fileName=${fileName}&uploadId=${uploadId}`, { method: 'POST' });
    const { success, data } = await completeRes.json();
    console.log("file link: ", data);
    if (!success) {
      throw new Error('Error completing upload');
    }

    // End the timer and calculate the time elapsed
    const endTime = new Date();
    const timeElapsed = (endTime - startTime) / 1000;
    console.log('Time elapsed:', timeElapsed, 'seconds');
    alert('File uploaded successfully');
    resetProgressBar();
  } catch (err) {
    console.log(err);
    alert('Error uploading file');
  }

  uploadBtn.disabled = false;
});

2.2. The backend code for this technique has the following API endpoints:

The backend code is in Node.js, In this we have installed AWS-SDK version 2 and have the endpoint for /initiateUpload to create UploadId, then we have /upload endpoint to upload chunks and then we have the endpoint /completeUpload to combine all the chunks and provide the URL.

// Initiate multipart upload and return uploadId
app.post('/initiateUpload', async (req, res) => {
  try {
    const { fileName } = req.body;
    const params = {
      Bucket: bucketName,
      Key: fileName,
    };
    const upload = await s3.createMultipartUpload(params).promise();
    res.json({ uploadId: upload.UploadId });
  } catch (error) {
    console.error(error);
    res.status(500).json({ success: false, message: 'Error initializing upload' });
  }
});

// Receive chunk and write it to S3 bucket
app.post('/upload', upload.single("file"), (req, res) => {
  const { index, fileName } = req.body;
  const file = req.file;
 
  const s3Params = {
    Bucket: bucketName,
    Key: fileName,
    Body: file.buffer,
    PartNumber: Number(index) + 1,
    UploadId: req.query.uploadId
  };

  s3.uploadPart(s3Params, (err, data) => {
    if (err) {
      console.log(err);
      return res.status(500).json({ success: false, message: 'Error uploading chunk' });
    }

    return res.json({ success: true, message: 'Chunk uploaded successfully' });
  });
});

// Complete multipart upload
app.post('/completeUpload', (req, res) => {
  const { fileName } = req.query;
  const s3Params = {
    Bucket: bucketName,
    Key: fileName,
    UploadId: req.query.uploadId,
  };

  s3.listParts(s3Params, (err, data) => {
    if (err) {
      console.log(err);
      return res.status(500).json({ success: false, message: 'Error listing parts' });
    }

    const parts = [];
    data.Parts.forEach(part => {
      parts.push({
        ETag: part.ETag,
        PartNumber: part.PartNumber
      });
    });

    s3Params.MultipartUpload = {
      Parts: parts
    };

    s3.completeMultipartUpload(s3Params, (err, data) => {
      if (err) {
        console.log(err);
        return res.status(500).json({ success: false, message: 'Error completing upload' });
      }

      console.log("data: ", data)
      return res.json({ success: true, message: 'Upload complete', data: data.Location});
    });
  });
});

3. Technique 2: Uploading Complete File from Frontend and Creating Chunks on The Backend

In this technique, the complete file is uploaded from the frontend, and then the backend creates chunks of the file and uploads them to the S3 bucket in parallel using the AWS SDK 3.

3.1. The frontend code for this technique has the following logic:

The frontend code is in javascript, In this we are calling /upload_parallel endpoint to upload the file.

// Listen for upload button click event
uploadBtn.addEventListener('click', async () => {
  if (!file) {
    return alert('Please select a file');
  }

  uploadBtn.disabled = true;

  try {
    // Start the timer
    const startTime = new Date();

    const formData = new FormData();
    formData.append('file', file);
    const uploadRes = await fetch(`${baseUrl}/upload_parallel`, {
      method: "POST",
      body: formData,
    });
    const { success, data } = await uploadRes.json();
    console.log("file link: ", data);
    if (!success) {
      throw new Error('Error completing upload');
    }

    // End the timer and calculate the time elapsed
    const endTime = new Date();
    const timeElapsed = (endTime - startTime) / 1000;
    console.log('Time elapsed:', timeElapsed, 'seconds');
    alert('File uploaded successfully');
  } catch (err) {
    console.log(err);
    alert('Error uploading file');
  }

  uploadBtn.disabled = false;
});

3.2. The backend code for this technique has the following API endpoints:

The backend code is in Node.js, In this we have installed AWS-SDK version 3 (lib-storage, client-s3) and have the endpoint for /upload_parallel that will upload large file by creating small chunks and uploading them in parallel.

// Receive Large file and write in chunks to S3 bucket
app.post('/upload_parallel', upload.single("file"), (req, res) => {
  const file = req.file
 // params for s3 upload
 const params = {
  Bucket: bucketName,
  Key: `${Date.now().toString()}_${file.originalname}`,
  Body: file.buffer,
 }

 try {
  // upload file to s3 parallelly in chunks
  // it supports min 5MB of file size
  const uploadParallel = new Upload({
   client: s3,
   queueSize: 4, // optional concurrency configuration
   partSize: 5542880, // optional size of each part
   leavePartsOnError: false, // optional manually handle dropped parts
   params,
  })

  // checking progress of upload
  uploadParallel.on("httpUploadProgress", progress => {
   console.log(progress)
  })

  // after completion of upload
  uploadParallel.done().then(data => {
   console.log("upload completed!", { data })
   return res.json({ success: true, data: data.Location })
  })
 } catch (error) {
  res.send({
   success: false,
   message: error.message,
  })
 }
});

4. S3 Bucket Configuration

The bucket should be in the following region mentioned below, because transfer acceleration is not available in all the regions. To check the updated list visit this link: https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration.html

  1. Asia Pacific (Tokyo) (ap-northeast-1)
  2. Asia Pacific (Seoul) (ap-northeast-2)
  3. Asia Pacific (Mumbai) (ap-south-1)
  4. Asia Pacific (Singapore) (ap-southeast-1)
  5. Asia Pacific (Sydney) (ap-southeast-2)
  6. Canada (Central) (ca-central-1)
  7. Europe (Frankfurt) (eu-central-1)
  8. Europe (Ireland) (eu-west-1)
  9. Europe (London) (eu-west-2)
  10. Europe (Paris) (eu-west-3)
  11. South America (São Paulo) (sa-east-1)
  12. US East (N. Virginia) (us-east-1)
  13. US East (Ohio) (us-east-2)
  14. US West (N. California) (us-west-1)
  15. US West (Oregon) (us-west-2)

4.1. Transfer Acceleration

Once you create the bucket then click on the properties tab and scroll down to Transfer Acceleration Tab, then click on edit and Enable it.

S3 Transfer Acceleration

Enabling Transfer Acceleration in your S3 bucket it will make the S3 bucket faster to upload large files.

4.2. Bucket Permission

Select the Bucket then click on Permissions and tab then scroll down and click on the edit button in Bucket Settings, then paste the properties mentioned below and change the arn.

S3 Bucket Policy

4.3. S3 Bucket policy - Change arn with your bucket ARN

S3 Bucket policy - Change arn with your bucket ARN
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "paste your bucket ARN/*"
            ]
        }
    ]
}

5. Conclusion

Both techniques have their pros and cons. Technique-1 allows for finer-grained control over the upload process but requires more work on the frontend to create the chunks. Technique-2 simplifies the frontend code but requires more processing on the backend to make the chunks and upload them in parallel.

Ultimately, the choice of technique depends on the specific use case and requirements. By understanding the strengths and weaknesses of each technique, you can make an informed decision on which one to use for your particular situation.

Blog-details-qoute