Archiving Service Logs from OpenSearch to S3: A Comprehensive Guide | Edstem Technologies

OpenSearch

AWS S3

EKS

Log Management

Kubernetes

Python

Archiving Service Logs from OpenSearch to S3: A Comprehensive Guide

by: Ashish Sharma

September 23, 2025

Featured image for blog post: Archiving Service Logs from OpenSearch to S3: A Comprehensive Guide

In this post, we'll walk through the end-to-end process of isolating logs for a sample Kubernetes service (myapp-service), sending them to OpenSearch for real-time querying, and archiving them to Amazon S3 to keep your search cluster lean. We'll cover configuring Fluent Bit via Helm and Terraform, setting up OpenSearch indices, managing IAM roles, troubleshooting VPC networking, and writing a scalable Python archival script.

Whether you're managing critical service logs or optimizing storage, this guide will provide a robust, automated solution to archive service-specific logs to S3 and maintain your OpenSearch cluster's performance.

1. Fluent Bit Configuration on EKS

We used the AWS EKS Fluent Bit Helm chart (aws-for-fluent-bit) managed via Terraform to filter and route logs:

resource "helm_release" "fluentbit" {
  name       = "fluentbit"
  repository = "https://aws.github.io/eks-charts"
  chart      = "aws-for-fluent-bit"
  namespace  = "kube-system"

  values = [<<-EOT
    opensearch:
      enabled: true
      index: "app-logs-001"
      tls: "On"
      awsAuth: "Off"
      host: "your-opensearch-domain.amazonaws.com"
      awsRegion: "ap-south-1"
      httpUser: "admin"
      httpPasswd: "${module.opensearch_credentials.password}"

    cloudWatchLogs:
      enabled: false

    filters:
      extraFilters: |
        [FILTER]
            Name   grep
            Match  kube.*
            Regex  kubernetes.container_name myapp-service

    # disable built-in S3, we'll inject a custom output next
    s3:
      enabled: false

    outputs:
      extraOutputs: |
        [OUTPUT]
            Name              s3
            Match             *
            Match_Regex       kubernetes.container_name myapp-service
            region            ap-south-1
            bucket            myapp-logs-archive
            use_put_object    On
            total_file_size   100M
            s3_key_format     /service-logs/$TAG/%Y/%m/%d/%H/%M/%S
  EOT]
}

Key Points:

We applied a grep filter to select only myapp-service logs
We disabled the chart's built-in S3 output and added a custom [OUTPUT] s3 block capturing just those logs
All other logs still flow to OpenSearch under the original index app-logs-001

2. OpenSearch Index Setup

We created a new index app-logs-002 with an explicit date mapping so Dashboards can filter by time:

PUT /app-logs-002
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date",
        "format": "strict_date_optional_time||epoch_millis"
      }
    }
  }
}

Configuring OpenSearch Dashboards:

In OpenSearch Dashboards under Stack Management → Data Views, we:

Selected the wildcard pattern app-logs-*
Clicked Refresh fields and set @timestamp as the time filter
Saved and verified logs from app-logs-002 appear in Discover

3. IAM Roles: IRSA vs. Node Instance Role

Our cluster did not use IRSA, so Fluent Bit inherited the worker node instance role. We identified and configured the appropriate permissions by:

Inspecting the aws-auth ConfigMap to find the node IAM role under mapRoles:
Attaching an inline policy granting s3:PutObject on our archive bucket:

aws iam put-role-policy \
  --role-name eks-yourCluster-NodeInstanceRole \
  --policy-name fluentbit-s3-put \
  --policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::myapp-logs-archive/*"
    }]
  }'

No new IAM roles were needed—just an extension of the existing node role.

4. Networking: VPC Endpoint for S3

Our EC2 instance couldn't reach S3 initially (no NAT). We resolved this by:

Creating a Gateway VPC Endpoint for com.amazonaws.ap-south-1.s3 in our VPC
Associated it with the private subnets of our EC2
Applied a bucket policy restricting access to that endpoint:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "AllowFromVpcEndpoint",
    "Effect": "Allow",
    "Principal": "*",
    "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
    "Resource": "arn:aws:s3:::myapp-logs-archive/*",
    "Condition": {
      "StringEquals": {
        "aws:sourceVpce": "vpce-0abcdef1234567890"
      }
    }
  }]
}

We confirmed connectivity via curl, AWS CLI, and a small Python test.

5. Archival Script on EC2

We developed a Python script (archive_logs_to_s3.py) that connects to OpenSearch, scrolls through logs, and uploads them to S3 in parallel batches.

Script Overview:

The script performs the following operations:

Connects to OpenSearch with opensearch-py using elevated timeouts and retries
Scrolls through app-logs-001 matching myapp-service
Uploads batches in parallel to S3 under service-logs/<DATE>/<TIME>/batch-XXXXXX.json using ThreadPoolExecutor

# snippet from archive_logs_to_s3.py
resp = es.search(
    index=ES_INDEX,
    scroll="2m",
    size=BATCH_SIZE,
    body={"query": {"match_phrase": {"kubernetes.container_name": "myapp-service"}}},
    request_timeout=120
)

for batch_no in count(1):
    hits = resp["hits"]["hits"]
    executor.submit(upload_batch, batch_no, hits)
    resp = es.scroll(scroll_id=sid, scroll="2m", request_timeout=120)

Running the Script:

We executed the script in the background and monitored its progress:

nohup python3 -u archive_logs_to_s3.py > archive_full.log 2>&1 &
tail -f archive_full.log

And saw each batch upload successfully:

✅ Batch 1: 5,000 docs → s3://myapp-logs-archive/service-logs/2025-05-16/123456/batch-000001.json
...

6. Performance and Cost Analysis

Performance Metrics:

Measured throughput (first 143 batches in 254 seconds): ~2,815 docs/s
Projected run time: ~88M / 2,815 ≈ 8 hours 42 minutes

Cost Breakdown:

S3 PUT requests: ~17,662 → $0.09 in API fees
Data transfer: All in-region via gateway → no egress charges

Conclusion

By combining Helm/Terraform for infrastructure management, OpenSearch for real-time querying, IAM and VPC best practices for security, and a scalable Python script for archival, we created a robust pipeline to archive service-specific logs to S3. This approach keeps your search cluster performant, archives critical logs for compliance requirements, and minimizes AWS costs.

The solution demonstrates how thoughtful architecture and automation can solve common challenges in log management at scale. Feel free to adapt these examples to your own services and infrastructure. Happy archiving!

The Human Algorithm: Why the Humane Touch in HR Still Outperforms AI

Anisha Mariam Thomas

November 10, 2025

Mastering Strapi Data Migrations: A Real-World Journey from v3 to v4

Ashish Sharma

October 15, 2025

Cost-Effective Athena Partition Management: Beyond Glue Crawlers

Azhar MA

October 03, 2025

Get started now

Get a quote for your project.

Contact us section background featuring professional consultation setup