By: Joseph Carboni
I recently ran into a production issue that a lot of API integrations eventually hit: an upstream service tightened its request pacing, and a previously stable integration started failing under normal traffic.
In my case, the upstream service was FRED (Federal Reserve Economic Data). The symptom was a wave of 429 Too Many Requests responses, including cases where a single client page refresh caused cascading failures. This behavior was new.
This post walks through what changed, why my original approach broke, and the Python implementation I used to stabilize things.
What Changed
My request shape was still valid: standard query params, key usage, and JSON responses. What changed was FRED’s practical enforcement of API Rate Limits. Exceeding either of these windows used to trigger soft-throttling from FRED’s servers:
- 2 requests per second
- 120 requests per minute
Recently, FRED implemented changes such that FRED servers now reject micro-bursts immediately. The second you cross either threshold, their firewall cuts the connection instantly, returning HTTP 429 Too Many Requests to all requests.
Why FRED Likely Tightened Enforcement
FRED has good reasons to enforce stricter limits, especially on shared keys:
- Fair access across many users and applications.
- Better API reliability during traffic spikes.
- Protection against abusive or accidental high-frequency clients.
- More predictable infrastructure cost and capacity planning.
FRED is public economic infrastructure, so tighter pacing controls help keep the service stable and broadly available. In practice, that means integrations like mine need explicit traffic shaping instead of optimistic retry behavior.
Why A Simple Refresh Caused 429s
The real problem was fan-out plus concurrency:
- One logical endpoint call could trigger multiple upstream FRED calls.
- A page refresh triggered several endpoints in parallel.
- Retries happened per request path, not as a coordinated global pause.
- Concurrent tasks kept sending requests while one task was backing off.
So the system kept producing bursts and getting blocked with 429 responses.
Engineering Goals
FRED data changes slowly, so responses are always served from a local cache first and long TTLs make sense. The request limits here mainly protect refreshes for stale or missing cache entries.
For a single-instance deployment, I wanted a fix that was:
- Global inside one process.
- Conservative by default.
- Aware of both short and long time windows.
- Respectful of
Retry-Afterheaders. - Easy to tune through environment variables.
Python Pattern I Implemented
1) Centralized Limiter State
import asyncio
import os
import time
from collections import deque
_FRED_MAX_CONCURRENT_REQUESTS = max(
1, int(os.getenv("FRED_MAX_CONCURRENT_REQUESTS", "2"))
)
_FRED_MAX_RETRIES = max(0, int(os.getenv("FRED_MAX_RETRIES", "2")))
_FRED_BACKOFF_BASE_SECONDS = float(os.getenv("FRED_BACKOFF_BASE_SECONDS", "0.75"))
_FRED_REQUEST_TIMEOUT_SECONDS = float(os.getenv("FRED_REQUEST_TIMEOUT_SECONDS", "15"))
_FRED_RATE_LIMIT_PER_SECOND = max(1, int(os.getenv("FRED_RATE_LIMIT_PER_SECOND", "1")))
_FRED_RATE_LIMIT_PER_MINUTE = max(1, int(os.getenv("FRED_RATE_LIMIT_PER_MINUTE", "90")))
2) A Dedicated Limiter Class
class FREDLimiter:
def __init__(
self,
max_concurrent_requests: int,
rate_limit_per_second: int,
rate_limit_per_minute: int,
):
self.concurrency_guard = asyncio.Semaphore(max_concurrent_requests)
self.rate_limit_per_second = rate_limit_per_second
self.rate_limit_per_minute = rate_limit_per_minute
self.rate_limit_lock = asyncio.Lock()
self.request_timestamps: deque[float] = deque()
self.global_cooldown_until = 0.0
async def acquire_rate_limit_slot(self) -> None:
while True:
wait_seconds = 0.0
async with self.rate_limit_lock:
now = time.monotonic()
if self.global_cooldown_until > now:
wait_seconds = max(wait_seconds, self.global_cooldown_until - now)
minute_window_start = now - 60.0
second_window_start = now - 1.0
while self.request_timestamps and self.request_timestamps[0] <= minute_window_start:
self.request_timestamps.popleft()
minute_count = len(self.request_timestamps)
second_count = 0
for timestamp in reversed(self.request_timestamps):
if timestamp <= second_window_start:
break
second_count += 1
if second_count >= self.rate_limit_per_second:
earliest_second_slot = self.request_timestamps[-self.rate_limit_per_second]
wait_seconds = max(wait_seconds, 1.0 - (now - earliest_second_slot))
if minute_count >= self.rate_limit_per_minute:
earliest_minute_slot = self.request_timestamps[-self.rate_limit_per_minute]
wait_seconds = max(wait_seconds, 60.0 - (now - earliest_minute_slot))
if wait_seconds <= 0:
self.request_timestamps.append(now)
return
await asyncio.sleep(wait_seconds)
async def set_global_cooldown(self, delay_seconds: float) -> None:
if delay_seconds <= 0:
return
async with self.rate_limit_lock:
candidate = time.monotonic() + delay_seconds
if candidate > self.global_cooldown_until:
self.global_cooldown_until = candidate
_FRED_LIMITER = FREDLimiter(
_FRED_MAX_CONCURRENT_REQUESTS,
_FRED_RATE_LIMIT_PER_SECOND,
_FRED_RATE_LIMIT_PER_MINUTE,
)
3) Retry-After Aware Request Pipeline
async def _request_json(
self,
client: httpx.AsyncClient,
url: str,
series_id: str,
endpoint: str,
) -> dict:
for attempt in range(_FRED_MAX_RETRIES + 1):
await _FRED_LIMITER.acquire_rate_limit_slot()
async with _FRED_LIMITER.concurrency_guard:
response = await client.get(url, headers=self.USER_AGENT)
try:
payload = response.json()
except ValueError:
payload = {}
if response.status_code == 429 and attempt < _FRED_MAX_RETRIES:
delay = self._retry_after_seconds(response)
if delay is None:
delay = _FRED_BACKOFF_BASE_SECONDS * (2 ** attempt)
await _FRED_LIMITER.set_global_cooldown(delay)
await asyncio.sleep(delay)
continue
if response.status_code >= 400:
if response.status_code == 429:
delay = self._retry_after_seconds(response)
if delay is None:
delay = _FRED_BACKOFF_BASE_SECONDS * max(1, 2 ** attempt)
await _FRED_LIMITER.set_global_cooldown(delay)
raise HTTPException(
status_code=response.status_code,
detail=payload.get("error_message")
or f"FRED upstream error for {series_id} ({endpoint})",
)
return payload
raise HTTPException(
status_code=429,
detail=f"FRED rate limit exceeded for {series_id} ({endpoint})",
)
Configuration For Single-Instance Apps
This is the baseline I recommend for a single instance:
FRED_RATE_LIMIT_PER_SECOND=1
FRED_RATE_LIMIT_PER_MINUTE=90
FRED_MAX_CONCURRENT_REQUESTS=2
FRED_MAX_RETRIES=2
FRED_BACKOFF_BASE_SECONDS=0.75
FRED_REQUEST_TIMEOUT_SECONDS=15
For the FRED-backed endpoints, I keep the cache TTL long enough to avoid unnecessary upstream traffic. In my app, that means treating FRED as slow-moving reference data rather than live telemetry: cache aggressively, serve cached content first, and only refresh when the TTL expires or the cache is stale.
If I still see 429s during peak fan-out, I tighten it further:
FRED_RATE_LIMIT_PER_SECOND=1
FRED_RATE_LIMIT_PER_MINUTE=60
FRED_MAX_CONCURRENT_REQUESTS=1
Practical Takeaways
- Treat shared API keys like a finite budget, not an unlimited pipe.
- Limit globally per process, not only per call path.
- Cache slow-moving upstream data aggressively and prefer cache hits over fresh fetches.
- Add a coordinated cooldown on 429 so concurrent traffic actually pauses.
- Preserve upstream status codes for observability and alerting.
- Start conservative, then tune upward from metrics.
The key lesson is straightforward: an integration can break because of traffic pattern changes even when the request syntax is still correct.
Results
Once I shipped this change, the upstream behavior became predictable again:
- Traffic bursts were flattened before leaving the app.
- 429 responses were avoided, and rare instances where they did occur no longer cascaded across parallel request paths
- Normal refresh traffic stabilized under the shared API key.
For multi-instance deployments, the next step would be moving limiter state into Redis so every node participates in one shared global budget. For a single instance, this in-process pattern is lightweight and effective.
ABOUT THE AUTHOR
Joseph Carboni is a multifaceted programmer with a background in bioinformatics, neuroscience, and sales, now focusing on software development. He developed a ribosomal loading model and contributed to a neuroscience research paper before transitioning to a sales career, enhancing his understanding of business and client relations. Currently, he’s the Founder & CEO of Carboni Technology. Joseph welcomes collaborations and discussions via LinkedIn (Joseph Carboni) or email (joe@carbonitech.com).
PUBLISH YOUR WRITINGS HERE!
We are always looking to publish your writings on the pyATL website. All content must be related to Python, non-commercial (pitches), and comply with out code of conduct.
If you’re interested, reach out to the editors at hello@pyatl.dev
