Expand my Community achievements bar.

SOLVED

Data transfer in S3 bucket

Avatar

Level 2

Hi all, 

 

We are seeing more data transfer in s3 bucket which consume the high cost. When i checked the logs it shows the below information. I tried to find any solution but i couldnt able to see anything. Initailly data store garabage collection failed later it was ran successfully. Also, there is no missing blobs i did the consistency check as well. 

 

 GET /bin/acs-commo ns/jcr-compare.dump.json.servlet.css/1.ico HTTP/1.1] org.apache.jackrabbit.oak. plugins.blob.DataStoreCacheUtils Deleted file [/crx/aem/author/crx-quickstart/r epository/datastore/download/88/e2/76/8

 

How to stop this activity. Please provide your suggestion. 

1 Accepted Solution

Avatar

Correct answer by
Employee

Hi @user65294,

You mentioned that the garbage collection initially failed but later ran successfully. This process is crucial for cleaning up unused blobs in the Data Store, which helps manage storage costs. The log entry you provided indicates a request to a servlet, possibly related to the ACS Commons JCR Compare tool, which might generate data traffic.

To identify the root cause please check the below details:

  1. Review your access logs to identify which requests are generating the most data transfer. Look for patterns such as specific IP addresses, user agents, or request paths. Check if there are any application features or integrations that might be causing excessive data usage, such as backup processes, analytics tools, or automated scripts.
  2. Ensure that the Data Store garbage collection process is configured correctly and runs regularly to remove unused blobs. This will help reduce storage costs. Review your Data Store cache settings to ensure that frequently accessed blobs are cached effectively, reducing the need to fetch them from S3 repeatedly.
  3. The log entry indicates requests to a servlet. Investigate if these requests are necessary and if they can be optimized or reduced. Consider caching responses or using lazy loading to minimize data transfer. If your application frequently reads or writes large files, consider optimizing these operations. For example, compress data before transfer or use incremental updates instead of full file transfers.
  4. Limit access to your S3 bucket to only necessary IP addresses or users. Implement security groups and IAM policies to control who can access and transfer data. Set up monitoring and alerting for unusual data transfer patterns. AWS CloudWatch can help track S3 bucket usage and alert you to potential issues.

Thanks
Pranay

View solution in original post

4 Replies

Avatar

Level 2

anyone have idea on this ? 

Avatar

Correct answer by
Employee

Hi @user65294,

You mentioned that the garbage collection initially failed but later ran successfully. This process is crucial for cleaning up unused blobs in the Data Store, which helps manage storage costs. The log entry you provided indicates a request to a servlet, possibly related to the ACS Commons JCR Compare tool, which might generate data traffic.

To identify the root cause please check the below details:

  1. Review your access logs to identify which requests are generating the most data transfer. Look for patterns such as specific IP addresses, user agents, or request paths. Check if there are any application features or integrations that might be causing excessive data usage, such as backup processes, analytics tools, or automated scripts.
  2. Ensure that the Data Store garbage collection process is configured correctly and runs regularly to remove unused blobs. This will help reduce storage costs. Review your Data Store cache settings to ensure that frequently accessed blobs are cached effectively, reducing the need to fetch them from S3 repeatedly.
  3. The log entry indicates requests to a servlet. Investigate if these requests are necessary and if they can be optimized or reduced. Consider caching responses or using lazy loading to minimize data transfer. If your application frequently reads or writes large files, consider optimizing these operations. For example, compress data before transfer or use incremental updates instead of full file transfers.
  4. Limit access to your S3 bucket to only necessary IP addresses or users. Implement security groups and IAM policies to control who can access and transfer data. Set up monitoring and alerting for unusual data transfer patterns. AWS CloudWatch can help track S3 bucket usage and alert you to potential issues.

Thanks
Pranay

Avatar

Employee

Hi @user65294,


Please let me know if you need any additional information on the same.

Thanks
Pranay

Avatar

Moderator

Hi @user65294 ,

Please let us know if you need any additional information to resolve the issue.