Rate Limiting IdentityServer Endpoints

Maarten Balliauw

March 3, 2026

Your identity provider is the front door to every application in your organization. Every request to your identity infrastructure shares the same resources: CPU, memory, database connections, and cryptographic operations such as token signing.

A recent community discussion highlighted what happens when one consumer takes more than their fair share of incoming requests. A specific client application was making an excessive number of requests to the /connect/token endpoint, resulting in an unintentional denial-of-service attack. Not by an attacker or a malicious actor, just a misbehaving client that overwhelmed the shared infrastructure. Misconfigurations are a common source of issues we see with customers, and this was no exception.

For teams dealing with critical identity infrastructure, this occurrence raises an important question: Should you add rate limiting to your Duende IdentityServer deployment?

Does Duende IdentityServer Need Rate Limiting?

In most deployments, the answer is no. Your IdentityServer instance typically serves a known set of clients and users within your organization. You might implement rate limiting for generic abuse prevention, but specific rate limits on OAuth and OpenID Connect endpoints are typically not required. Most often, your IdentityServer is consumed by applications you control. If a client is making too many requests, the right fix is usually to address the root cause: investigate misconfigured token lifetimes, missing token caching, and wasteful loops in any client application.

Duende IdentityServer does not include built-in rate limiting by design. Rate limiting is an infrastructure concern that varies between deployments. For example, a small internal deployment behind a corporate firewall has different needs than a multi-tenant SaaS platform serving thousands of organizations. The right rate-limiting strategy depends on your architecture, your threat model, and your traffic patterns.

That said, there are scenarios where rate limiting does make sense:

Misbehaving clients that you cannot immediately fix, such as third-party integrations or legacy applications stuck on an older deployment cycle.
Public-facing deployments where your authorize or token endpoints are exposed to the internet and could be targeted by automated attacks.
Multi-tenant environments where one tenant's traffic should not impact another tenant's ability to authenticate. Ideally, all clients have equal access to shared resources within the boundaries of what your IdentityServer supports.
Compliance requirements that mandate throttling controls on authentication endpoints.

If you find yourself in one of these situations, take the time to assess each option, weighing the different benefits and trade-offs.

Option 1: Rate Limit At The Network Layer

The most straightforward approach is to throttle traffic before it reaches your application. A reverse proxy, load balancer, or API gateway can enforce rate limits based on IP address, geographic region, or request path. Tools like nginx, Azure Application Gateway, AWS API Gateway, or Cloudflare all support this.

Enabling network-layer rate limiting has several advantages. No code changes are required in your IdentityServer deployment, and requests are rejected before consuming any application resources.

The downside is that you lose fine-grained control over rate-limiting decisions. While you can throttle by IP address, you cannot throttle requests based on OAuth client or user identity. This means you cannot block a specific client or user from making excessive requests.

There's also the risk of blocking legitimate traffic, for example, when traffic for multiple users is routed through a single IP address.

So while rate limiting at the network layer option works well as a first line of defense, it lacks the context to make fine-grained decisions about specific OAuth traffic characteristics.

Option 2: ASP.NET Core Rate Limiting Middleware

Since Duende IdentityServer runs on ASP.NET Core, you can use the built-in rate limiting middleware to throttle requests in the HTTP pipeline before IdentityServer processes them. This gives you more control than network-layer rate limiting, because you can partition limits by client, IP address, or any other dimension you can extract from the request.

Here is a basic example that applies a fixed window rate limit to the token endpoint:

Csharp

// Program.cs
using System.Threading.RateLimiting;
using Microsoft.AspNetCore.RateLimiting;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddIdentityServer();

builder.Services.AddRateLimiter(options =>
{
    options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
    options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(httpContext =>
        RateLimitPartition.GetFixedWindowLimiter(
            // Partition requests by the path of the endpoint.
            // For the token endpoint, limit to 10 requests per minute.
            // For all other endpoints, use a new Guid as the partitionkey
            // to effectively disable rate limiting.
            partitionKey: httpContext.Request.Path == "/connect/token" 
                ? "/connect/token" 
                : Guid.NewGuid().ToString(),
            factory: partition => new FixedWindowRateLimiterOptions
            {
                AutoReplenishment = true,
                PermitLimit = 10,
                QueueLimit = 0,
                Window = TimeSpan.FromMinutes(1)
            }));
});

var app = builder.Build();

app.UseRateLimiter(); // needs to go before UseIdentityServer()
app.UseIdentityServer();

app.Run();

The ASP.NET Core middleware supports several algorithm types (fixed window, sliding window, token bucket, and concurrency), chaining multiple limiters, partitioning by custom keys like client_id, and customizing rejection responses with Retry-After headers. Requests can be filtered early in the pipeline, before IdentityServer performs expensive operations like client secret validation and database lookups.

Note: For a thorough walkthrough of these capabilities, see ASP.NET Core rate limiting middleware in .NET 7 and the official Microsoft documentation.

There is one (rather important) limitation to this approach: since Duende IdentityServer does not currently use ASP.NET Core endpoint routing in version 7, you can not use specific policies and apply them to a specific endpoint. Only the global rate limiter can be used here, and you'll want to be mindful of how to partition to achieve more granular rate limiting. In addition, you'll want to customize what the client receives when processing rejection responses. For example, the token endpoint should probably return a more descriptive JSON error response instead of a generic 429 response.

As such, we'd recommend using the ASP.NET Core Rate Limiting Middleware for non-IdentityServer endpoints such as your login page, consent page, etc. For IdentityServer endpoints, the next option may be better suited.

Option 3: Custom Token Request Validator

Duende IdentityServer provides the ICustomTokenRequestValidator interface, which lets you inject custom logic into the token issuance pipeline. At this point in the pipeline, the request has already been validated: you know the authenticated client and user. This means you can make rate-limiting decisions with full confidence in the caller's identity.

You can implement your own rate limiting logic in this validator, or use the PartitionedRateLimiter<T> that ASP.NET Core Rate Limiting Middleware provides.

Here is an example that applies a fixed window rate limit of 10 requests per client per minute, using the rate limiting middleware's primitives:

Csharp

// ClientRateLimitingValidator.cs
using Duende.IdentityServer.Validation;

public class ClientRateLimitingValidator : ICustomTokenRequestValidator
{
    private static readonly PartitionedRateLimiter<CustomTokenRequestValidationContext> Limiter 
        = PartitionedRateLimiter.Create<CustomTokenRequestValidationContext, string>(context =>
            RateLimitPartition.GetFixedWindowLimiter(
                // Partition requests by client ID
                partitionKey: context.Result?.ValidatedRequest.Client.ClientId
                    ?? Guid.NewGuid().ToString(),
                factory: partition => new FixedWindowRateLimiterOptions
                {
                    // Allow 10 requests per minute (per client ID)
                    AutoReplenishment = true,
                    PermitLimit = 10,
                    QueueLimit = 0,
                    Window = TimeSpan.FromMinutes(1)
                }));

    public async Task ValidateAsync(CustomTokenRequestValidationContext context)
    {
        var lease = await Limiter.AcquireAsync(context);
        if (!lease.IsAcquired)
        {
            context.Result.IsError = true;
            context.Result.Error = "rate_limit_exceeded";
            context.Result.ErrorDescription = "Too many token requests. Try again later.";
        }
    }
}

Csharp

// Program.cs
builder.Services.AddSingleton<ICustomTokenRequestValidator, ClientRateLimitingValidator>();

Using this approach, you can implement fine-grained rate limiting based on client identity. Since you have full access to the validated client and current user identity, you could implement per-client or per-user limits with different thresholds for each client. This approach also integrates directly into the IdentityServer pipeline, so rejected requests return proper OAuth error responses.

The downside is that the request has already been partially processed. Client authentication, secret validation, and database lookups have already occurred, so you are still consuming resources for requests that will ultimately be rejected. For high-volume abuse scenarios, this approach alone may not be sufficient to protect your server's resources.

Which Approach Should You Use?

Let's first reiterate that while you might implement rate limiting for generic abuse prevention, most Duende IdentityServer deployments typically do not require specific rate limits on OAuth and OpenID Connect endpoints.

If you do require rate limiting, a layered approach may work best:

Network-layer rate limiting as a broad shield against volumetric attacks and obviously abusive traffic.
ASP.NET Core middleware with a global limiter that catches excessive requests before they hit your application pipeline. Use endpoint-specific limiters for interactive pages such as login, consent, …
Custom validators for application-aware rate limiting when you need decisions based on authenticated client or user identity.

Most deployments will only need the first layer if they need rate limiting at all. Start by investigating why a client is making excessive requests. Fixing token caching, adjusting token lifetimes, or correcting a retry loop is almost always preferable to adding rate-limiting infrastructure.

Remember, rate limiting is a safety net, not a substitute for properly configured clients.

Summary

Duende IdentityServer intentionally leaves rate limiting to the infrastructure and application layers, because the right strategy depends on your specific deployment. For most organizations, properly configured clients and token management eliminate the need for rate limiting entirely.

When you do need rate limiting, ASP.NET Core's built-in middleware and IdentityServer's extensibility points give you the tools to implement it at the right level of granularity.

Have questions or want to share your rate-limiting strategy? Join the conversation on GitHub Discussions.

Rate Limiting IdentityServer Endpoints

Does Duende IdentityServer Need Rate Limiting?

Option 1: Rate Limit At The Network Layer

Option 2: ASP.NET Core Rate Limiting Middleware

Option 3: Custom Token Request Validator

Which Approach Should You Use?

Summary

Related Articles

Stop Accepting Breached Passwords: Integrating HaveIBeenPwned with Duende UserManagement

.NET July 2026 Security Updates: What They Mean for Your Duende Deployments

You're Already Using .NET's ChangeToken (You Just Don't Know It)