Debugging intermittent performance degradation in our React Native application was a recurring nightmare. A user reports a slow screen transition after tapping a button, but the backend metrics—CPU, memory, database query times—all look nominal. The critical gap was correlation. We had client-side logs and backend traces, but no definitive way to link a specific user interaction within the mobile app to the cascade of microservice calls it triggered. The entire flow, from the user’s tap to the final database commit, was a frustrating black box. The root of the problem was that our observability story began at the edge of our infrastructure, not at the true origin of the request: the client itself.
Our initial concept was to invert the traditional tracing model. Instead of the first-hop service generating a trace ID, the React Native client would become the originator of the trace context. Every significant, user-initiated business process, encapsulated as a MobX
action, would create the root span of a new distributed trace. This context, containing a unique traceId
and a root spanId
, would then be systematically injected into the headers of every subsequent API call made during that action’s execution. This approach would allow us to create a single, unified trace that visualizes the entire lifecycle of a user interaction, from the mobile client’s state change through the API Gateway and across all downstream services.
The technology selection was critical for this to work without adding significant overhead.
- MobX: Its architecture is uniquely suited for this task. The concept of
actions
provides a clean, centralized point of interception. We can wrap or monitor every action to initiate and tear down a trace context, effectively binding the trace’s lifecycle to the business logic’s lifecycle. - React Native: As the client platform, the main challenge was implementing a reliable, non-intrusive mechanism to inject headers into every outgoing network request, regardless of which component or service initiated it.
- Envoy Proxy: Using Envoy as our API Gateway was the linchpin. Its first-class, built-in support for distributed tracing is far superior to implementing custom logic in each backend service. Envoy can intelligently parse incoming trace headers (like B3 or W3C Trace Context), participate in the trace by creating its own spans, and propagate the context correctly to upstream services. This keeps the tracing concern out of the application logic of the microservices.
Phase 1: Instrumenting the Client-Side Trace Origin
The first step is to establish a mechanism on the client to generate and manage the trace context. We need a way to know when a user-initiated action begins and ends. A dedicated MobX store is the cleanest way to manage this state globally.
The TraceStore
This store will hold the active trace and span IDs. It’s intentionally simple, designed to hold the context for the currently executing root action.
// src/stores/TraceStore.ts
import { makeAutoObservable, runInAction } from 'mobx';
import 'react-native-get-random-values';
import { v4 as uuidv4 } from 'uuid';
// Generates a 16-character hex string for B3-style IDs
const generateTraceIdPart = (): string => {
const buffer = new Uint8Array(8);
crypto.getRandomValues(buffer);
return Array.from(buffer)
.map((b) => b.toString(16).padStart(2, '0'))
.join('');
};
interface TraceContext {
traceId: string;
spanId: string;
}
class TraceStore {
activeTraceContext: TraceContext | null = null;
constructor() {
makeAutoObservable(this);
}
/**
* Starts a new trace, generating new Trace and Span IDs.
* This should be called at the beginning of a root MobX action.
* @param {string} actionName - The name of the action for context.
*/
startTrace(actionName: string): TraceContext {
const newContext = {
traceId: generateTraceIdPart() + generateTraceIdPart(), // 32-char hex
spanId: generateTraceIdPart(), // 16-char hex
};
console.log(`[TraceStore] Starting trace for action "${actionName}": ${newContext.traceId}`);
runInAction(() => {
this.activeTraceContext = newContext;
});
return newContext;
}
/**
* Clears the active trace context.
* This should be called at the end of a root MobX action.
*/
endTrace() {
if (this.activeTraceContext) {
console.log(`[TraceStore] Ending trace: ${this.activeTraceContext.traceId}`);
runInAction(() => {
this.activeTraceContext = null;
});
}
}
get currentContext(): TraceContext | null {
return this.activeTraceContext;
}
}
export const traceStore = new TraceStore();
A key decision here is the format of the trace IDs. We’re generating B3-compatible hex-encoded IDs (traceId
is 32 hex chars, spanId
is 16) because it’s widely supported, especially by Envoy and Zipkin.
Intercepting MobX Actions
Now we need to automatically call startTrace
and endTrace
. We can create a higher-order function (or a decorator) that wraps our MobX store actions. This keeps the tracing logic separate from the business logic.
// src/utils/traceAction.ts
import { traceStore } from '../stores/TraceStore';
/**
* A higher-order function that wraps a MobX action to provide
* automatic start and end tracing.
*
* In a real-world project, this would be more robust, potentially using
* MobX's `onAction` for a more global interception approach, but this
* explicit wrapper is clearer for demonstration.
*
* @param {string} actionName - A descriptive name for the action, used for logging.
* @param {Function} fn - The asynchronous action function to be wrapped.
* @returns {Function} The wrapped function.
*/
export function traceAction<T extends any[], R>(
actionName: string,
fn: (...args: T) => Promise<R>
): (...args: T) => Promise<R> {
return async function (...args: T): Promise<R> {
// Avoid nested traces. If a trace is already active, run inside it.
if (traceStore.currentContext) {
console.warn(`[traceAction] Action "${actionName}" called within an existing trace. Reusing context.`);
return await fn(...args);
}
traceStore.startTrace(actionName);
try {
// The core business logic is executed here
const result = await fn(...args);
return result;
} catch (error) {
// Ensure the trace context is cleared even if the action fails.
console.error(`[traceAction] Error in action "${actionName}":`, error);
throw error;
} finally {
traceStore.endTrace();
}
};
}
// Example usage in a MobX store
class UserStore {
// ... other properties
constructor() {
// ...
this.fetchUserProfile = traceAction('fetchUserProfile', this.fetchUserProfile.bind(this));
}
async fetchUserProfile(userId: string) {
// ... actual implementation to fetch data
}
}
The traceAction
wrapper is the core of our client-side instrumentation. It ensures that a trace context exists only for the duration of the top-level asynchronous operation initiated by the user. The check for traceStore.currentContext
prevents nested actions from overwriting the root trace ID.
Phase 2: Propagating Context via Network Requests
With the trace context managed, we now need to inject it into every outgoing API request. An axios
interceptor is the perfect tool for this. It’s a single point of configuration that affects all requests made with that axios
instance.
// src/api/apiClient.ts
import axios from 'axios';
import { traceStore } from '../stores/TraceStore';
const apiClient = axios.create({
baseURL: 'http://<your-envoy-gateway-ip>:8080', // Point to the Envoy Gateway
timeout: 10000,
});
// Request Interceptor to add Trace Headers
apiClient.interceptors.request.use(
(config) => {
const context = traceStore.currentContext;
if (context) {
// B3 Propagation Headers
// These headers will be picked up by Envoy to continue the trace.
config.headers['X-B3-TraceId'] = context.traceId;
config.headers['X-B3-SpanId'] = context.spanId;
config.headers['X-B3-Sampled'] = '1'; // Signal that this trace should be recorded
console.log(`[APIClient] Injecting trace context: ${context.traceId}`);
} else {
// In a production app, you might want to generate a request ID even
// for requests not part of a user-initiated trace.
// For now, we log a warning.
console.warn(`[APIClient] Making a request outside of a trace context for URL: ${config.url}`);
}
return config;
},
(error) => {
// This part handles errors that occur when setting up the request
console.error('[APIClient] Request Interceptor Error:', error);
return Promise.reject(error);
}
);
// Response Interceptor for logging (optional, but good for debugging)
apiClient.interceptors.response.use(
(response) => {
return response;
},
(error) => {
// Log detailed error information
if (error.response) {
console.error(
`[APIClient] Response Error: ${error.response.status}`,
error.response.data
);
} else if (error.request) {
console.error('[APIClient] No response received:', error.request);
} else {
console.error('[APIClient] Error setting up request:', error.message);
}
return Promise.reject(error);
}
);
export default apiClient;
Now, any code in the React Native application that imports and uses apiClient
will automatically have the B3 trace headers attached, provided the request is made from within a function wrapped by traceAction
.
Phase 3: Configuring Envoy Proxy as the Tracing-Aware API Gateway
This is where the client-side work connects to the infrastructure. Envoy will act as the API Gateway, receiving requests from the mobile app, participating in the trace, and forwarding the request with the correct headers to the appropriate backend service.
Here is a complete, production-grade envoy.yaml
configuration. It sets up a listener on port 8080
, routes traffic to two different backend services (user_service
and order_service
), and configures Zipkin tracing.
# envoy.yaml
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 8080
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
# This tells Envoy to generate a request ID if one isn't present.
# More importantly, it respects incoming x-request-id headers.
generate_request_id: true
# Tracing Configuration
tracing:
provider:
name: envoy.tracers.zipkin
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v3.ZipkinConfig
collector_cluster: zipkin_cluster
collector_endpoint: "/api/v2/spans"
collector_endpoint_version: HTTP_JSON
# This is critical. It ensures Envoy joins the trace initiated by the client.
shared_span_context: false
# We can add custom tags to our spans for better context
custom_tags:
- tag: "envoy.node.id"
literal:
value: "api-gateway-node"
- tag: "guid:x-request-id"
request_header:
name: "x-request-id"
default_value: "unknown"
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
prefix: "/users"
route:
cluster: user_service
# Automatically retry on 5xx errors
retry_policy:
retry_on: 5xx
num_retries: 3
per_try_timeout: 2s
- match:
prefix: "/orders"
route:
cluster: order_service
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: user_service
connect_timeout: 5s
type: LOGICAL_DNS
# For Docker Compose, 'userservice' is the name of the service container.
load_assignment:
cluster_name: user_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: userservice
port_value: 3001
- name: order_service
connect_timeout: 5s
type: LOGICAL_DNS
load_assignment:
cluster_name: order_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: orderservice
port_value: 3002
- name: zipkin_cluster
connect_timeout: 1s
type: LOGICAL_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: zipkin_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: zipkin
port_value: 9411
The most important section is tracing
.
provider
: We configure it to use thezipkin
tracer.collector_cluster
: This tells Envoy where to send the trace data (to ourzipkin_cluster
definition).shared_span_context: false
: This is a subtle but crucial setting. Whenfalse
, Envoy respects the incomingX-B3-SpanId
as the parent of the span it creates. This ensures the Envoy span is correctly nested under the client’s root span in the trace hierarchy. If it weretrue
, Envoy would reuse the incoming span ID, breaking the parent-child relationship.
To run this, you would typically use Docker Compose to orchestrate Envoy, Zipkin, and the backend services.
sequenceDiagram participant RN as React Native App participant MobX as MobX Store participant Envoy as Envoy API Gateway participant UserSvc as User Service participant OrderSvc as Order Service participant Zipkin RN->>MobX: User taps button, calls traced action `loadDashboard()` MobX->>MobX: `traceStore.startTrace()` -> Generates TraceID: A, SpanID: B RN->>Envoy: GET /users/123 (Headers: X-B3-TraceId: A, X-B3-SpanId: B) Envoy->>Envoy: Receives request, creates new SpanID: C (Parent: B) Envoy->>Zipkin: Sends span data (TraceID: A, SpanID: C, ParentID: B) Envoy->>UserSvc: GET /users/123 (Headers: X-B3-TraceId: A, X-B3-SpanId: C) UserSvc->>UserSvc: Processing... UserSvc->>Envoy: GET /orders?userId=123 (Headers: X-B3-TraceId: A, X-B3-SpanId: C) Envoy->>Envoy: Receives request, creates new SpanID: D (Parent: C) Envoy->>Zipkin: Sends span data (TraceID: A, SpanID: D, ParentID: C) Envoy->>OrderSvc: GET /orders?userId=123 (Headers: X-B3-TraceId: A, X-B3-SpanId: D) OrderSvc-->>Envoy: 200 OK (Order Data) Envoy-->>UserSvc: 200 OK (Order Data) UserSvc-->>Envoy: 200 OK (User + Order Data) Envoy-->>RN: 200 OK RN->>MobX: `loadDashboard()` completes MobX->>MobX: `traceStore.endTrace()` -> Clears context
Phase 4: Propagating Headers in Backend Services
The final piece of the puzzle is ensuring that backend services continue to propagate the trace headers when they call each other. This is crucial for building a complete trace graph. Here are two simple Express.js services demonstrating this.
// userservice/index.js
const express = require('express');
const axios = require('axios');
const app = express();
const port = 3001;
const B3_HEADERS = [
'x-request-id',
'x-b3-traceid',
'x-b3-spanid',
'x-b3-parentspanid',
'x-b3-sampled',
'x-b3-flags',
];
app.get('/users/:id', async (req, res) => {
console.log(`[UserService] Received request for user ${req.params.id}`);
console.log('[UserService] Incoming headers:', req.headers);
// Propagate trace headers for the downstream call
const propagatedHeaders = {};
B3_HEADERS.forEach(header => {
if (req.headers[header]) {
propagatedHeaders[header] = req.headers[header];
}
});
try {
// This call goes back through Envoy to reach the order service
// In a real K8s setup, this would be `http://orderservice:3002`
// but routing through the gateway ensures tracing continues seamlessly.
const ordersResponse = await axios.get(`http://envoy:8080/orders?userId=${req.params.id}`, {
headers: propagatedHeaders,
});
res.json({
userId: req.params.id,
name: 'John Doe',
orders: ordersResponse.data,
});
} catch (error) {
console.error('[UserService] Failed to fetch orders:', error.message);
res.status(500).send('Error fetching orders');
}
});
app.listen(port, () => {
console.log(`User service listening on port ${port}`);
});
// orderservice/index.js
const express = require('express');
const app = express();
const port = 3002;
app.get('/orders', (req, res) => {
const userId = req.query.userId;
console.log(`[OrderService] Received request for orders for user ${userId}`);
console.log('[OrderService] Incoming headers:', req.headers);
// In a real app, this would query a database.
res.json([
{ orderId: 'abc-123', amount: 100 },
{ orderId: 'def-456', amount: 250 },
]);
});
app.listen(port, () => {
console.log(`Order service listening on port ${port}`);
});
The critical part is the propagatedHeaders
logic in userservice
. It extracts all relevant B3 headers from the incoming request and forwards them on its outgoing request to the orderservice
. This disciplined propagation is what links the service-to-service calls together under the same trace.
When this entire system is running, a single tap in the React Native app produces a beautiful, comprehensive trace in Zipkin. The trace starts with a root span representing the MobX action, followed by a child span from Envoy, which in turn has a child span from the userservice
, which itself has a child span for the call to the orderservice
. The black box is gone. We can now pinpoint latency at any stage of the process, whether it’s network latency between the client and the gateway, slow processing in a service, or a delay in a downstream dependency.
This architecture, however, is not without its limitations and areas for future refinement. The current client-side context management is simple and handles a single active action at a time. It does not gracefully handle concurrent, overlapping user actions, which could lead to context clashes. A more advanced implementation might explore a solution similar to AsyncLocalStorage for Node.js, but adapted for the React Native environment, to provide more robust context isolation. Furthermore, this system currently traces all initiated actions. For a high-traffic application, this would be prohibitively expensive. The next logical iteration would be to implement client-side sampling, allowing the mobile app to decide intelligently (e.g., based on a percentage or specific user IDs) which actions should generate a full trace, thus controlling the volume of data sent to the observability backend.