Timeouts, 502/504 Gateway Errors, and Slow Performance on BitGo Platform
Timeouts, 502/504 Gateway Errors, and Slow Performance on BitGo Platform
Problem
Customers report a range of performance-related issues when interacting with the BitGo platform via the UI or API. Symptoms include slow page loads (5–10 seconds or longer), HTTP 502 Bad Gateway errors, HTTP 504 Gateway Timeout errors, Cloudflare-originated timeout pages, Error: Timeout of 30000ms exceeded messages, Error: socket hang up (code ECONNRESET), rate-limiting errors ("Too many requests, slow down!"), and webhook delivery interruptions. These issues affect multiple coins and endpoints including /api/v2/{coin}/wallet/{walletId}/transfer, /api/v2/{coin}/enterprise/{enterpriseId}/feeaddressbalance, /api/v2/{coin}/wallet/{walletId}/maximumSpendable, and /api/prime/trading/v1/accounts/${walletId}/trades. They can be platform-wide outages or isolated to specific endpoints or coins.
Diagnostics
- Check the BitGo status page at https://status.bitgo.com/ for any active incidents, degraded performance notices, or scheduled maintenance windows. This is the single most important first step.
- Determine scope: Ask the customer whether the issue affects all coins/wallets or specific ones. Ask for the specific API endpoint(s), coin ticker(s), and timestamps (with timezone).
- Collect standard diagnostic details: Environment (Prod/Test), Enterprise Name, Enterprise ID, Wallet ID, full API endpoint as called (including base URL), full response body (if any),
requestId/reqIdvalues, SDK/Express version, and whether they are using the UI or API. - Identify the HTTP error code: Distinguish between 502 (Bad Gateway), 504 (Gateway Timeout), 429 (Too Many Requests / rate limiting), 500 (Internal Server Error), and client-side timeouts (no HTTP response at all, e.g.,
ECONNRESETorsocket hang up). - Check for rate limiting: If the customer reports "Too many requests, slow down!" or HTTP 429, determine whether multiple users or automated processes are hitting the same endpoints in rapid succession. Ask whether they have retry logic and whether it respects rate-limit headers.
- Check for Cloudflare-level errors: If the response HTML references Cloudflare (e.g., "Error code 504", Cloudflare Ray ID), this indicates the request reached Cloudflare but timed out waiting for the BitGo origin server, pointing to a server-side issue.
- Review SDK/Express version: If the customer is using BitGo SDK (
@bitgo/sdk-api,@bitgo/sdk-coin-*) or BitGo Express, ask them to confirm the version. Outdated versions have been linked to slowness on specific coins (e.g., SOL, DOT, ADA). - Check for webhook delivery issues: If the customer reports missed webhooks during an outage window, note that webhooks may need to be re-simulated after resolution.
Resolution
Scenario: timeout-gateway-timeouts-502#platform-wide-latency
Trigger: Multiple customers or the BitGo status page report platform-wide latency, slow UI, or widespread API timeouts at the same time.
Signals: slow UI, latency, everything taking forever to load, 502, 504, Timeout of 30000ms exceeded, platform-wide, status page incident
Steps:
- Confirm the incident by checking https://status.bitgo.com/ for active alerts.
- Acknowledge the customer's report and let them know the engineering team is aware and working to resolve the issue.
- Direct the customer to subscribe to the status page for real-time updates: https://status.bitgo.com/
- Once the status page shows the incident as resolved, follow up with the customer to confirm the issue is resolved on their end.
- If the customer reports that the issue persists after the status page shows resolution, escalate internally to the engineering team with the customer's Enterprise ID, affected endpoints, and timestamps.
Notes: Platform-wide latency events are resolved by the engineering team; support's role is communication and verification. These events are typically transient and resolve without customer-side changes.
"We are aware of some overall wallet-platform latency. Our engineering team is working diligently to resolve. We will follow up as soon as possible with further updates." (ticket #45454)
"We are currently experiencing some latency on our platform. BitGo Page status: https://status.bitgo.com/ We will update you once this issue has been fixed." (ticket #265772)
"The issue has been resolved. Please try again now. Thank you for your patience." (ticket #336088)
Scenario: timeout-gateway-timeouts-502#502-504-isolated-api
Trigger: Customer receives HTTP 502 Bad Gateway or HTTP 504 Gateway Timeout errors on specific API endpoints, but the BitGo status page shows no active incident.
Signals: 502, Bad Gateway, 504, Gateway Timeout, Cloudflare, gateway time-out, Error code 504, /api/v2/, /api/prime/trading/
Steps:
- Confirm no active incident on https://status.bitgo.com/.
- Collect the full API endpoint URL, timestamps of the errors, HTTP response body (including Cloudflare Ray ID if present), and
requestIdvalues. - Ask the customer whether the error is intermittent or consistent, and whether it affects all requests or only requests with certain parameters (e.g., large date ranges, high
limitvalues). - If the customer is paginating over large date ranges or requesting large datasets (e.g.,
/trades?limit=25over long periods), suggest reducing the date range or page size to avoid server-side processing timeouts. - If the error is consistent and reproducible, escalate to the engineering team with all collected details (environment, Enterprise ID, endpoint, timestamps, Cloudflare Ray IDs, request IDs).
- Follow up with the customer once engineering provides an update.
Notes: 504 errors from Cloudflare indicate the BitGo origin server did not respond within Cloudflare's timeout window. This is a server-side issue even when the status page does not reflect it. For the trading API endpoint https://app.bitgo.com/api/prime/trading/v1/accounts/${walletId}/trades, 504 errors have been observed with Cloudflare Ray IDs.
"We're encountering repeated issues when trying to import transactions from your API. Our process involves paginating and segmenting requests over varying date ranges — from short to long periods — but we consistently hit 504 errors from Cloudflare before the import completes." (ticket #237706)
"Today we tried to use the /api/v2/{coin}/wallet/{walletId}/transfer API at approximately 10:16 AM Panama local time and received the error Error 1.00. The remote server returned an error: (502) Bad Gateway." (ticket #108578)
Scenario: timeout-gateway-timeouts-502#api-timeout-specific-endpoints
Trigger: Customer reports API calls timing out (no response received) on specific endpoints such as /transfer/{transferId}, /maximumSpendable, or /feeaddressbalance, with errors like Timeout of 30000ms exceeded or Error: socket hang up.
Signals: timeout, 30000ms, socket hang up, ECONNRESET, feeEstimate, feeaddressbalance, maximumSpendable, transfer, more than 7000ms, IMS service returned error status 500
Steps:
- Collect the full API endpoint, coin, wallet ID, timestamps,
requestIdvalues, and the exact error message. - Ask which coins are affected. Certain endpoints (e.g.,
feeEstimate,feeaddressbalance,maximumSpendable) may have coin-specific latency issues. - If the customer is using the BitGo SDK or BitGo Express, ask for the version and recommend updating to the latest version. Engineering has resolved slowness issues via SDK updates in the past (particularly for SOL, DOT, ADA).
- If updating the SDK does not resolve the issue, or the customer is calling the REST API directly, escalate to the engineering team with all collected details including the specific endpoints and coins affected.
- Recommend the customer implement retry logic with exponential backoff as a resilience measure while the root cause is investigated.
Notes: The feeEstimate function and feeaddressbalance endpoint have been specifically flagged in past tickets. Errors including Error: socket hang up with code ECONNRESET and ApiResponseError: IMS service returned error status 500 have been observed. Engineering has resolved these server-side in past incidents.
"With constant frequency, we get the following errors in the method 'feeEstimate': 1) error case: 'Error: socket hang up'. Error: socket hang up ... code: 'ECONNRESET', response: undefined ... 2) error case: ApiResponseError: IMS service returned error status 500: 'Request failed with status code 500'" (ticket #264225)
"Coins: ADA, DOT, AVAX, ALGO, XLM, BCH — /api/v2/{coin}/wallet/{wallet-id}; XRP, BTC, ETC, SOL — /api/v2/{coin}/wallet/{wallet-id}//maximumSpendable; AVAX, ETC — /api/v2/{coin}/enterprise/{enterpriseId}/feeaddressbalance" (ticket #211989)
"When we call this API, /api/v2/{coin}/wallet/{walletId}/transfer/{transferId}, have timeout error. it takes more than 7000ms" (ticket #317786)
Scenario: timeout-gateway-timeouts-502#coin-specific-sdk-slowness
Trigger: Customer reports loading/timeout errors specifically for SOL, DOT, or ADA when using the BitGo SDK or BitGo Express.
Signals: SOL, DOT, ADA, slow, loading time, error, SDK, BitGo Express, @bitgo/sdk-coin-sol, @bitgo/sdk-api
Steps:
- Ask the customer which version of the BitGo SDK (
@bitgo/sdk-api,@bitgo/sdk-coin-sol, etc.) or BitGo Express they are running. - Instruct the customer to update to the latest version of the SDK/Express. Engineering has pushed updates specifically to resolve slowness on certain coins.
- If the customer is already on the latest version, ask them to check their network connectivity and try from another network to rule out local issues.
- If the issue persists after updating, ask the customer to share the exact error message and, if possible, the script or code snippet being used.
- Escalate to engineering if the problem continues after the SDK update.
Notes: This scenario has been specifically observed with SOL, DOT, and ADA. The resolution in past tickets was to update the SDK after engineering pushed fixes.
"Could you please try to update the SDK once again? Our engineering team has pushed some new updates to resolve the issue of slowness in some coins and the issue has been resolved for may of our clients." (ticket #145666)
Scenario: timeout-gateway-timeouts-502#rate-limiting-too-many-requests
Trigger: Customer sees "Too many requests, slow down!" error message, HTTP 429 status, or error ID pattern bg-ui-* when using the UI or API.
Signals: Too many requests, slow down, 429, rate limit, bg-ui-, OTP, 2FA, too many requests
Steps:
- Determine whether the error occurs during login/OTP entry or during normal API usage.
- If during login/OTP: The error typically occurs when multiple rapid or unsuccessful OTP (one-time password) attempts are made within a short period. This is a security measure. Advise the customer to wait for a short cooldown period (typically a few minutes) before trying again. They should ensure they are entering a fresh OTP code and not reusing an expired one.
- If during API usage: Ask whether multiple users or automated systems are calling the same endpoints concurrently. Confirm the customer is respecting BitGo's rate-limit guidelines. Recommend implementing retry logic with exponential backoff and reducing concurrent request volume.
- If the customer believes they are within rate limits and the error persists, collect the ErrorID (e.g.,
bg-ui-*value), timestamps, and affected endpoints, then escalate to engineering.
Notes: The "Too many requests, slow down!" message can appear both in the UI (with an ErrorID like bg-ui-95e850ffbc0741fe7c6037915f287331) and via API (HTTP 429). The UI rate limit on OTP attempts resolves automatically after a cooldown period.
"The error message you saw ('Too many requests, slow down!') typically occurs when there are multiple rapid or unsuccessful OTP (one-time password) attempts within a short period. This is a security measure designed to prevent unauthorized access and protect your account." (ticket #337323)
Scenario: timeout-gateway-timeouts-502#webhook-interruptions-during-outage
Trigger: Customer reports missed or delayed webhooks during or after a platform incident or authentication issue.
Signals: webhooks, missed, interruptions, webhook delivery, simulate, authentication issue
Steps:
- Confirm whether there was a recent platform incident by checking https://status.bitgo.com/ or internal incident records.
- Inform the customer that webhook delivery may be interrupted during platform incidents, including authentication-related outages.
- Advise the customer to simulate missed webhooks using the BitGo UI or API to recover any notifications that were not delivered during the incident window.
- Confirm that the underlying platform issue has been resolved and that new webhooks are being delivered normally.
Notes: Webhook interruptions are a secondary effect of platform incidents. They do not require a separate fix — but the customer must re-simulate missed webhooks to catch up.
"Your team may still see interruptions in webhook notices being sent as a result of the current authentication issue we are experiencing. You can check the status page linked in my sig for status updates. ... For any missed webhooks, your team will want to simulate the missed webhooks using our UI or our API." (ticket #216919)
Related
- ethereum-fees-and-gas-tank — Gas tank funding issues can sometimes be confused with API timeout errors on Ethereum endpoints
- bitcoin-transactions — Bitcoin-specific transaction delays may overlap with general platform latency symptoms
- bitgo-api-rate-limits — Detailed rate-limit thresholds relevant to the "Too many requests" scenario