The Microsoft Graph API is an incredibly powerful tool that gives you unified access to a wide range of Microsoft 365 services. Whether you’re building applications, automating administrative tasks, or pulling reports, the Graph API offers unified access to services from Exchange Online and SharePoint to Microsoft Teams, Entra ID, and much more.
However, as usage grows, so does the potential to run into throttling limits!
However, as the scale of your usage increases, the likelihood of hitting throttling limits also rises. These throttling measures are in place to protect the overall health of the services and ensure fair access for all users. While this is great for the platform’s stability, it can also be a challenge if you’re trying to build applications that rely on seamless, high-frequency calls to Microsoft 365 services.
In this post, I’ll demystify Graph API throttling, e xplore the practical steps you can take to avoid hitting those limits, and discuss how to handle throttling gracefully when it inevitably happens.
What is Microsoft Graph API Throttling?
Throttling is essentially Microsoft’s way of managing traffic to the Graph API to ensure the service stays stable and responsive. When an application exceeds certain usage thresholds, the API will respond with a 429 Too Many Requests or 503 Service Unavailable error. This helps prevent overloading the system and ensures that all users get fair access to the services.
It’s important to understand that throttling isn’t a bug—it’s a designed feature. Some common scenarios where throttling might happen include:
- High-frequency polling (e.g., constantly checking mailbox messages every few seconds)
- Large data migrations or syncing huge volumes of data.
- Burst traffic during scheduled jobs
- Poorly designed retry logic that floods the API
Without throttling, a single app that’s sending too many requests could potentially slow down or even break the experience for an entire tenant—or worse, for other tenants sharing the same infrastructure.
Throttling can happen at various levels, including:
- Per user
- Per app
- Per tenant
- Per resource type (e.g., mail, calendar, directory)
Additionally, each Microsoft 365 service behind Graph (like SharePoint or Exchange) has its own specific throttling policies to further fine-tune how the system manages traffic.
Types of Throttling Limits in Microsoft Graph API
Microsoft Graph API uses various throttling limits to ensure that resources are used fairly and the service remains stable for everyone. These limits can be applied in different ways—whether per user, per app, or even across the entire tenant. Knowing the different types of throttling will help you fine-tune your requests and avoid hitting those limits.
Global Throttling Limits
Before we get into service-specific limits, let’s start with the global throttling limit. This is a universal cap that applies across all Microsoft Graph services.
Any request type: 130,000 requests per 10 seconds per app across all tenants.
This global limit applies no matter which service you’re accessing. When you make a request, it’s evaluated against several factors, like the scope of the limit (per app across all tenants, per tenant for all apps, per app per tenant, etc.), the type of request (GET, POST, PATCH, etc.), and more. The throttling kicks in once any of these limits are reached.
Key Service-Specific Throttling Limits
Now, let’s see the service-specific throttling limits. Each service, like SharePoint, Outlook, and Teams, has its own set of thresholds. For example, when interacting with Excel, the API will allow up to 5,000 requests per 10 seconds, but only across all tenants for a given app.
I’ve broken down these service limits into an easy-to-reference table below. You can use it as your own quick cheat sheet, especially when the official documentation feels a bit overwhelming. That said, for a full, detailed breakdown, make sure to check out the official Microsoft Graph Throttling Limits documentation.
Service | Operation/Scope | Limit | Context |
Microsoft Teams | GET team | 30 requests/sec | Per app per tenant |
POST channel | 30 requests/sec | Per app per tenant | |
DELETE channel | 15 requests/sec | Per app per tenant | |
GET channel message | 20 requests/sec | Per app per tenant | |
POST channel message | 50 requests/sec | Per app per tenant | |
Any request on a given team | 4 requests/sec | Per app per team (hard cap) | |
Outlook | API requests | 10,000 requests/10 minutes | Per app ID + mailbox combination |
Concurrent requests | 4 concurrent requests | Per app ID + mailbox | |
Upload (PATCH/POST/PUT) | 150 MB/5 minutes | Per app ID + mailbox | |
Excel | Any request (global) | 5,000 requests/10 seconds | Per app across all tenants |
Any request (tenant-specific) | 1,500 requests/10 seconds | Per app per tenant | |
Identity & Access | Small tenants (<50 users) | 3,500 Resource Units/10 seconds | Per app + tenant |
Medium tenants (50–500 users) | 5,000 Resource Units/10 seconds | Per app + tenant | |
Large tenants (>500 users) | 8,000 Resource Units/10 seconds | Per app + tenant | |
Write operations | 3,000 requests/2.5 minutes | Per app + tenant | |
OneNote | Delegated context | 120 requests/minute, 400/hour | Per app per user |
Concurrent requests (delegated) | 5 concurrent requests | Per app per user | |
App-only context | 240 requests/minute, 800/hour | Per app | |
Concurrent requests (app-only) | 20 concurrent requests | Per app | |
Subscriptions | POST, PUT, DELETE, PATCH | 2,000 requests/20 seconds | Per app across all tenants |
POST, PUT, DELETE, PATCH | 500 requests/20 seconds | Per app per tenant | |
POST /reauthorize by ID | 4,000 requests/20 seconds | Per app across all tenants | |
POST /reauthorize by ID | 1,000 requests/20 seconds | Per app per tenant |
How Throttling is Communicated to Developers/Admins
When your application gets throttled by Microsoft Graph, the API doesn’t just cut you off—it tells you what’s happening and what to do next. Specifically, you’ll get an HTTP response like 429 Too Many Requests or 503 Service Unavailable, often with headers that guide your retry strategy. Here are the key headers to watch for:
- Retry-After: This is the most important one—it tells your app exactly how many seconds to wait before making another request. You should always honor this.
- x-ms-throttle-limit (occasionally included): This can give you visibility into the threshold that was hit, though it’s not always present or documented consistently.
Example response:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
If you see a Retry-After header, back off for the specified duration before retrying. Ignoring it will just keep getting you throttled and might even lead to longer delays.
Best Practices to Avoid Throttling
While you can’t eliminate throttling entirely—especially at scale—you can dramatically reduce how often it happens with smart API usage. Here are best practices every Microsoft 365 developer or admin should keep in mind:
1. Optimize Your Data Requests:
- Use Query Parameters: Utilize $select to retrieve only necessary fields, and $filter to narrow down results. Always request only the data you need. If you don’t need the entire user object, don’t ask for it. This reduces payload size and processing time.
- Implement Pagination: Large result sets should be paged using @odata.nextLink. Trying to pull too much data in one shot is a fast track to throttling.
2. Use Batching Wisely:
- Batch requests: Microsoft Graph supports batching up to 20 requests in a single HTTP call. This reduces network overhead, but remember: each request within the batch is still counted against throttling limits independently.
- Group Strategically: Don’t throw 20 expensive operations into one batch. Mix lightweight and heavy requests to avoid burst spikes.
3. Use Change Tracking to Minimize Polling:
- Delta Queries: Instead of re-fetching everything repeatedly, use delta queries to only retrieve data that has changed since the last sync instead of the entire dataset.
- Change Notifications: Subscribing to change notifications (webhooks) lets your app respond to events as they happen, rather than polling the API unnecessarily. It’s much more efficient and throttle-friendly.
4. Implement Intelligent Retry Logic:
When a 429 response is received:
- Respect Retry-After: Always wait the exact amount of time specified before retrying. Microsoft is telling you how long the system needs—listen to it.
- Use Exponential Backoff: If the header isn’t present, implement exponential backoff to increase wait times between retries. This reduces the pressure on the API and increases your chances of a successful retry.
- Set Retry Limits: Always include a retry cap to avoid infinite loops and unnecessary load.
Example pseudocode:
wait = 1
while retry:
try:
make_api_call()
break
except Throttled:
time.sleep(wait)
wait = min(wait * 2, 60) # cap wait time
5. Schedule Workloads During Off-Peak Hours:
- Many organizations generate peak load during business hours. If your app runs nightly reports, sync jobs, or batch processes, schedule them for early morning, evenings, or weekends when overall traffic is lower.
6. Distribute Load Intelligently:
- Spread Requests Across Users: If your app acts on behalf of many users, distribute the workload across them rather than funneling everything through a single user context.
- Use Queues or Throttling Middleware: Introduce slight delays, queues, or background workers to space out requests, especially when handling large batches.
7. Monitor and Log Throttling Events:
- Always log 429 and 503 responses, including Retry-After durations and affected endpoints. This helps you identify which patterns or endpoints are more prone to throttling—and optimize accordingly.
Implementing these best practices not only improves the stability of your application but also ensures it remains respectful of shared cloud resources. A well-behaved app that handles throttling gracefully is one that scales reliably and earns fewer support tickets.
Building Resilient Apps with Microsoft Graph
Microsoft Graph API throttling is an important mechanism to ensure performance and stability for all tenants. As your app or script grows in scale and complexity, understanding how throttling works becomes crucial to keeping things smooth and reliable.
By proactively applying the strategies we’ve covered—like efficient querying, smart batching, exponential backoff, and change tracking—you can design systems that not only avoid hitting limits but also recover gracefully when they do.
If you’re working with Microsoft Graph at scale, it’s worth investing in solid telemetry. Monitor your API usage patterns, log throttling responses, and track retry behavior. These insights will help you identify bottlenecks early and fine-tune your app over time.
Stay within the limits, build with resilience in mind, and your app will be better for it.
Happy coding! And may your 429s be few and far between!