Understanding and Mitigating AWS Lambda Throttling in High-Concurrency Workloads

Jan 25, 2025 - ⧖ 6 min

Introduction

When dealing with high-concurrency workloads, scaling AWS Lambda effectively while avoiding throttling can become a challenge. This article explores a real-world scenario where an application, written in Kotlin, processed over 100,000 records using a custom asynchronous iteration method. Each record triggered an asynchronous Lambda invocation that interacted with DynamoDB. However, the setup led to 429 Too Many Requests errors, indicating throttling issues with AWS Lambda.

We will:

Outline the problem faced while processing high-concurrency workloads.
Understand AWS Lambda throttling mechanisms, based on the AWS Compute Blog article by James Beswick.
Present solutions to mitigate throttling.
Provide a real-world proof of concept (POC) to evaluate each mitigation technique.

The Challenge

Problem Context

Our workload involved processing a large file of over 100,000 records. Using Kotlin's mapAsync extension function, we implemented concurrency to invoke an AWS Lambda function for each record. The Lambda function performed a putItem operation on DynamoDB.

Here’s the Kotlin code for mapAsync:

suspend fun <T, R> Iterable<T>.mapAsync(
    transformation: suspend (T) -> R
): List<R> = coroutineScope {
    this@mapAsync
        .map { async { transformation(it) } }
        .awaitAll()
}

suspend fun <T, R> Iterable<T>.mapAsync(
    concurrency: Int,
    transformation: suspend (T) -> R
): List<R> = coroutineScope {
    val semaphore = Semaphore(concurrency)
    this@mapAsync
        .map { async { semaphore.withPermit { transformation(it) } } }
        .awaitAll()
}

While this method processed records significantly faster than a standard for loop, it caused the Lambda invocations to "flood" the system, triggering throttling. The 429 Too Many Requests errors were linked to:

Concurrency Limits: Lambda limits the number of concurrent executions per account.
TPS (Transactions Per Second) Limits: High TPS can overwhelm the Invoke Data Plane.
Burst Limits: A limit on how quickly concurrency can scale up, using the token bucket algorithm.

Observed Errors

429 Too Many Requests: Errors indicated that the Lambda invocations exceeded the allowed concurrency or burst limits.
DynamoDB “Provisioned Throughput Exceeded” errors were observed when spikes occurred in DynamoDB writes.

AWS Lambda Throttling Mechanisms

AWS enforces three key throttle limits to protect its infrastructure and ensure fair resource distribution:

1. Concurrency Limits

Concurrency defines the number of in-flight Lambda executions allowed at a time. For example, if your concurrency limit is 1,000, you can have up to 1,000 Lambda functions executing simultaneously. This limit is shared across all Lambdas in your account and region.

2. TPS Limits

TPS is a derived limit based on concurrency and function duration. For example:

Function duration: 100 ms
Concurrency: 1,000

TPS = Concurrency / Function Duration = 10,000 TPS

However, if function duration drops below 100 ms, TPS is capped at 10x the concurrency.

3. Burst Limits

The burst limit ensures that concurrency increases gradually, avoiding large spikes in cold starts. AWS uses a token bucket algorithm to regulate this:

Each invocation consumes a token.
The bucket refills at a fixed rate (e.g., 500 tokens per minute).
The bucket has a maximum capacity (e.g., 1,000 tokens).

For more details, refer to the AWS Lambda Burst Limits.

Mitigation Strategies

Here are some techniques we implemented to mitigate the throttling issues:

1. Limit Concurrency Using Semaphore

We added a concurrency limit to the mapAsync function to control the number of simultaneous Lambda invocations:

val results = records.mapAsync(concurrency = 100) { record ->
    invokeLambda(record)
}

Pros:

Simple to implement.
Reduces 429 errors significantly.

Cons:

Slower overall processing time due to limited concurrency.

2. Retry with Exponential Backoff

We implemented a retry mechanism with exponential backoff to handle throttled requests:

suspend fun invokeWithRetry(record: Record, retries: Int = 3) {
    var attempts = 0
    while (attempts < retries) {
        try {
            invokeLambda(record)
            break
        } catch (e: Exception) {
            if (++attempts == retries) throw e
            delay((2.0.pow(attempts) * 100).toLong())
        }
    }
}

Pros:

Handles transient errors gracefully.
Avoids overwhelming the system during retries.

Cons:

Adds latency.
Increases code complexity.

3. Use SQS for Decoupling

Instead of invoking Lambdas directly, we used SQS to queue the requests and let the Lambdas process them at a controlled rate:

SQS Decoupling Architecture

Pros:

Decouples producers and consumers.
Avoids throttling by controlling the consumer rate.

Cons:

Adds architectural complexity.
Increases latency due to queueing.

Proof of Concept (POC)

We tested each mitigation strategy using the following setup:

Test Setup

Dataset: 100,000 records.
AWS Lambda: 512 MB memory, default concurrency limits.
Environment: Local machine (32 GB RAM, 8 cores) for testing mapAsync.

Implementing the Lambda Function

The Lambda function was written in Go and performed a putItem operation on DynamoDB:

package main

import (
    "context"
    "fmt"
    "github.com/aws/aws-lambda-go/lambda"
    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/dynamodb"
)

type Request struct {
    TableName string `json:"tableName"`
    Item      map[string]*dynamodb.AttributeValue `json:"item"`
}

func handler(ctx context.Context, req Request) (string, error) {
    sess := session.Must(session.NewSession())
    svc := dynamodb.New(sess)

    _, err := svc.PutItem(&dynamodb.PutItemInput{
        TableName: aws.String(req.TableName),
        Item:      req.Item,
    })
    if err != nil {
        return "", fmt.Errorf("failed to put item: %v", err)
    }

    return "Success", nil
}

func main() {
    lambda.Start(handler)
}

Invoking the Lambda in Kotlin

We invoked the Lambda function from Kotlin using AWS SDK for Java:

import software.amazon.awssdk.services.lambda.LambdaClient
import software.amazon.awssdk.services.lambda.model.InvokeRequest
import software.amazon.awssdk.services.lambda.model.InvokeResponse

fun invokeLambda(record: String): String {
    val lambdaClient = LambdaClient.create()

    val payload = """
        {
            "tableName": "MyTable",
            "item": {
                "id": { "S": "$record" },
                "value": { "S": "SomeValue" }
            }
        }
    """.trimIndent()

    val request = InvokeRequest.builder()
        .functionName("MyLambdaFunction")
        .payload(payload)
        .build()

    val response: InvokeResponse = lambdaClient.invoke(request)

    return response.payload().asUtf8String()
}

Results

Mitigation Strategy	Total Time	Throttled Requests	Notes
No Concurrency Limitation	15 min	1,500	High throughput but unstable
Concurrency Limit (100)	25 min	0	Stable, slower
Retry with Backoff	20 min	200	Improved with retries
SQS Decoupling	30 min	0	Most stable, added latency

Conclusion

High-concurrency workloads require careful consideration of AWS Lambda’s throttling limits. By applying strategies such as concurrency control, retry mechanisms, or decoupling with SQS, you can mitigate throttling and improve system stability. Each solution has trade-offs, so the choice depends on your specific use case and performance requirements.

For more details on Lambda throttling, refer to the AWS Lambda Developer Guide and the AWS Compute Blog.

Jan 25, 2025 - ⧖ 6 min

Introduction

The Challenge

Problem Context

Observed Errors

AWS Lambda Throttling Mechanisms

1. Concurrency Limits

2. TPS Limits

3. Burst Limits

Mitigation Strategies

1. Limit Concurrency Using Semaphore

Pros:

Cons:

2. Retry with Exponential Backoff

Pros:

Cons:

3. Use SQS for Decoupling

Pros:

Cons:

Proof of Concept (POC)

Test Setup

Implementing the Lambda Function

Invoking the Lambda in Kotlin

Results

Conclusion

About the Author