Federating Geo-Distributed SQL and Log Data with Trino for a Real-Time Nuxt.js Observability Dashboard

Platform Engineering

Word Count: 3.7k

Read Times: 22 Min

The technical pain point that triggered this entire project was the unacceptable mean time to resolution (MTTR) for a specific class of bugs in our geo-distributed checkout service. The service relies on CockroachDB, chosen for its resilience and multi-region consistency guarantees. When a user reported a failed transaction, our on-call engineers were faced with a fragmented, manual debugging process. They had to grab a transaction ID from a support ticket, ssh into a production bastion, query CockroachDB to find the order state, and then manually grep through terabytes of logs in Grafana Loki, attempting to correlate timestamps and IDs. This process could take anywhere from 30 minutes to several hours. It was inefficient, error-prone, and unsustainable as the platform scaled.

Our initial concept was a “Unified Transaction View” within our internal developer platform. The goal was simple on the surface: provide a single UI where an engineer could paste a transaction ID and immediately see the corresponding structured order data from CockroachDB side-by-side with the unstructured application and system logs from Loki. This required a system that could query two vastly different, distributed data sources and join the results.

The first architectural debate was whether to build a bespoke microservice to act as this data aggregator. The service would have clients for both the CockroachDB dialect of PostgreSQL and the Loki LogQL API. It would receive a request, query both systems in parallel, and then attempt to merge the data in-memory before returning a consolidated JSON payload. The primary pitfall here is the long-term maintenance burden. We would own the clients, the merging logic, the error handling, and the API schema. Every change in the underlying systems could necessitate a change in our service. It felt like building custom plumbing when a standard solution might exist.

This led us to Trino (formerly PrestoSQL). Its core value proposition is query federation. We could deploy Trino and configure two “catalogs”: one pointing to our CockroachDB cluster and another to our Loki cluster. This would, in theory, allow us to write standard SQL to query both systems as if they were just tables in a single database. For the front end, a standard Vue.js Single Page Application (SPA) was considered, but we opted for Nuxt.js. The Server-Side Rendering (SSR) would give us a faster initial load, and more importantly, its server API routes provided a clean Backend-for-Frontend (BFF) pattern. This BFF layer could encapsulate the complexity of communicating with Trino, shielding the browser from long-running queries and providing a single, clean data contract.

Trino Configuration: The Federation Backbone

The first step was to get a Trino cluster running and talking to our data sources. For local development and demonstration, a docker-compose setup is sufficient to model the production environment. In a real-world project, these would be managed deployments, likely on Kubernetes.

# docker-compose.yml
version: '3.8'

services:
  trino-coordinator:
    image: trinodb/trino:435
    ports:
      - "8080:8080"
    volumes:
      - ./trino-config/coordinator:/etc/trino
    environment:
      - TZ=UTC

  cockroachdb:
    image: cockroachdb/cockroach:v23.1.9
    command: start-single-node --insecure
    ports:
      - "26257:26257"
      - "8081:8080" # CockroachDB UI
    volumes:
      - cockroach-data:/cockroach/cockroach-data

  loki:
    image: grafana/loki:2.9.2
    command: -config.file=/etc/loki/local-config.yaml
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config:/etc/loki

volumes:
  cockroach-data:

The magic of Trino lies in its catalog configuration. These are simple property files that tell the coordinator how to connect to a data source.

First, the coordinator needs its own configuration. This defines it as the coordinator and sets basic properties like the discovery URI.

# trino-config/coordinator/config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=1GB
query.max-total-memory=2GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080

Next, the JVM configuration is critical for stability, especially under load. A common mistake is to leave this with default settings, leading to unpredictable OutOfMemoryErrors.

# trino-config/coordinator/jvm.config
-server
-Xmx2G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true
-Djava.security.krb5.conf=/etc/krb5.conf

Now for the connectors. The CockroachDB connector uses the PostgreSQL JDBC driver. In a real-world project, credentials must be managed via secrets management, not hardcoded.

# trino-config/coordinator/catalog/cockroachdb.properties
connector.name=postgresql
connection-url=jdbc:postgresql://cockroachdb:26257/defaultdb?sslmode=disable
connection-user=root
# In production, use a secrets management tool. For this example, we leave password blank for insecure mode.
# connection-password=...

# Performance tuning for the connection pool is not optional in production.
# These values prevent the pool from being a bottleneck.
connection-pool.max-size=20
connection-pool.min-size=5

The Loki connector configuration maps Loki’s API to a SQL interface. We define the schema loki_logs which will contain “tables” corresponding to our log streams.

# trino-config/coordinator/catalog/loki.properties
connector.name=loki
loki.uri=http://loki:3100
loki.schema-name=loki_logs
# This is crucial for performance. It tells Trino how many parallel requests it can make to Loki.
# Tune this based on your Loki cluster's capacity.
loki.client.max-threads=10

With this configuration, after running docker-compose up, we can use the Trino CLI to verify the connections.

# Connect to the Trino CLI
docker exec -it <trino-container-id> trino-cli

-- Verify catalogs are registered
trino> SHOW CATALOGS;
-- Catalog
------------
-- cockroachdb
-- loki
-- system
--(3 rows)

-- Query CockroachDB
trino> SHOW SCHEMAS IN cockroachdb;
trino> CREATE TABLE cockroachdb.public.orders (
   id UUID PRIMARY KEY,
   transaction_id VARCHAR(255) NOT NULL,
   user_id UUID,
   amount_cents INT,
   status VARCHAR(50),
   created_at TIMESTAMPTZ DEFAULT now()
);
trino> INSERT INTO cockroachdb.public.orders (id, transaction_id, user_id, amount_cents, status) VALUES 
  (gen_random_uuid(), 'txn_abc123', gen_random_uuid(), 5000, 'FAILED'),
  (gen_random_uuid(), 'txn_def456', gen_random_uuid(), 2500, 'COMPLETED');
trino> SELECT * FROM cockroachdb.public.orders;

-- Query Loki. Loki tables are named after the 'stream' you want to query.
-- The table name is the schema name followed by the stream name.
-- If my logs have a label `app="checkout-service"`, the table is `loki_logs."app=checkout-service"`
-- This query is unlikely to work initially until logs are ingested.
trino> SELECT line FROM loki."loki_logs"."{app=\"checkout-service\"}" LIMIT 10;

The Federated Query Bottleneck

The first attempt at a federated query was a naive JOIN. The idea was to join the orders table from CockroachDB with the log stream from Loki on the transaction_id. To make this work, our application logs had to be structured (e.g., JSON) and contain the transaction_id.

-- Naive and extremely inefficient query
SELECT
    o.transaction_id,
    o.status,
    o.amount_cents,
    l.line
FROM
    cockroachdb.public.orders AS o
JOIN
    loki."loki_logs"."{app=\"checkout-service\"}" AS l
ON
    -- This assumes JSON logs that can be parsed.
    -- The performance problem exists regardless of the parsing method.
    json_extract_scalar(l.line, '$.transaction_id') = o.transaction_id
WHERE
    o.transaction_id = 'txn_abc123';

This query was catastrophically slow, often timing out. The EXPLAIN plan revealed the core issue. Trino could not push the JOIN predicate down to Loki. The Loki API does not support filtering logs based on the content of other tables. Consequently, Trino’s query planner decided on a terrible strategy:

Fetch the single matching row from CockroachDB where transaction_id = 'txn_abc123'.
Scan all logs from the {app="checkout-service"} stream in Loki for the entire retention period.
Stream terabytes of log data from Loki into the Trino workers’ memory.
Perform the json_extract_scalar on every single log line.
Finally, perform the join in-memory.

This is a classic federated query pitfall. The solution is to manually orchestrate the query in a way that minimizes data transfer from the least efficient source (Loki). We shifted from a single declarative SQL query to a two-step procedural approach, which would be implemented in our Nuxt.js BFF.

Step 1: Get context and a time window from the fast source (CockroachDB).

-- Executed first
SELECT
    id,
    transaction_id,
    status,
    created_at
FROM
    cockroachdb.public.orders
WHERE
    transaction_id = 'txn_abc123';

This query is fast and returns a small amount of data, including the created_at timestamp.

Step 2: Use the context to construct a highly specific query for the slow source (Loki).

Using the created_at timestamp from the first query (e.g., 2023-10-27T10:30:00Z), we can construct a Loki query with a very narrow time range. This is the single most important optimization. Instead of scanning all logs, we now only scan logs from, say, 5 minutes before to 5 minutes after the transaction was created.

-- Executed second, constructed dynamically
SELECT
    ts,
    line
FROM
    loki."loki_logs"."{app=\"checkout-service\"}"
WHERE
    -- Use Loki's time pushdown capability
    ts BETWEEN TIMESTAMP '2023-10-27 10:25:00.000 Z' AND TIMESTAMP '2023-10-27 10:35:00.000 Z'
    -- Also filter by the content if possible, though time is the primary filter
    AND CONTAINS(line, 'txn_abc123')
ORDER BY ts ASC;

This approach dramatically reduces the data scanned by Loki, from terabytes to megabytes, making the entire operation feasible.

Building the Nuxt.js BFF API Route

With a viable query strategy, we built the API endpoint in our Nuxt 3 application. This endpoint acts as the orchestrator.

// file: server/api/debug/[transactionId].ts
import { Trino } from 'trino-client';
import { z } from 'zod';

// Define the expected shape of our data
const OrderSchema = z.object({
  id: z.string(),
  transaction_id: z.string(),
  status: z.string(),
  created_at: z.string(), // Trino returns timestamps as strings
});

const LogEntrySchema = z.object({
  ts: z.string(),
  line: z.string(),
});

// A robust configuration management is key for production.
// Nuxt 3's runtimeConfig is perfect for this.
const { trinoHost, trinoPort, trinoUser } = useRuntimeConfig();

const trinoClient = Trino.create({
  server: `http://${trinoHost}:${trinoPort}`,
  user: trinoUser,
  // catalog and schema can be set here or in the query
});

// A utility to run queries and handle potential errors.
// In a real-world app, this would include more sophisticated retry logic and monitoring.
async function executeQuery<T>(query: string, schema: z.ZodType<T>): Promise<T[]> {
  try {
    const iterator = await trinoClient.query(query);
    const results = [];
    for await (const result of iterator) {
      if (result.data) {
        // Here we map columns to object properties for easier consumption
        const mappedData = result.data.map(row => {
            const obj: { [key: string]: any } = {};
            result.columns.forEach((col, index) => {
                obj[col.name] = row[index];
            });
            return obj;
        });
        results.push(...mappedData);
      }
    }
    return z.array(schema).parse(results);
  } catch (error) {
    console.error(`Trino query failed for query: ${query}`, error);
    // Throw a structured error that the event handler can catch
    throw createError({
      statusCode: 502,
      statusMessage: 'Bad Gateway',
      message: 'Failed to execute query against the data federation layer.',
    });
  }
}

export default defineEventHandler(async (event) => {
  const transactionId = getRouterParam(event, 'transactionId');

  if (!transactionId) {
    throw createError({ statusCode: 400, message: 'Transaction ID is required.' });
  }

  // Step 1: Fetch transaction context from CockroachDB
  const getOrderQuery = `
    SELECT id, transaction_id, status, created_at
    FROM cockroachdb.public.orders
    WHERE transaction_id = '${transactionId}'
    LIMIT 1
  `;
  const orders = await executeQuery(getOrderQuery, OrderSchema);

  if (orders.length === 0) {
    throw createError({ statusCode: 404, message: `Transaction with ID ${transactionId} not found.` });
  }
  const order = orders[0];
  const orderCreatedAt = new Date(order.created_at.replace(' ', 'T') + 'Z');

  // Step 2: Dynamically build the Loki query with a narrow time window
  const startTime = new Date(orderCreatedAt.getTime() - 5 * 60 * 1000).toISOString();
  const endTime = new Date(orderCreatedAt.getTime() + 5 * 60 * 1000).toISOString();
  
  // A pitfall: Date formatting for SQL is tricky. Ensure ISO 8601 format.
  const getLogsQuery = `
    SELECT ts, line
    FROM loki."loki_logs"."{app=\\"checkout-service\\"}"
    WHERE ts BETWEEN from_iso8601_timestamp('${startTime}') AND from_iso8601_timestamp('${endTime}')
    AND CONTAINS(line, '${transactionId}')
    ORDER BY ts ASC
  `;
  
  const logs = await executeQuery(getLogsQuery, LogEntrySchema);

  return {
    order,
    logs,
  };
});

This API route is the heart of the system. It encapsulates the two-step logic, provides robust error handling, and uses Zod for runtime type validation, which prevents malformed data from Trino from crashing the application.

Crafting the Vue.js Frontend

The final piece was the UI. We used Nuxt 3 and Vue 3’s Composition API with <script setup>, which makes components clean and readable. The useAsyncData composable is ideal for fetching data from our API route, as it handles both server-side and client-side data fetching seamlessly.

<!-- file: pages/debug/[id].vue -->
<script setup lang="ts">
// Define the structure of the data we expect from our API
interface LogEntry {
  ts: string;
  line: string;
}
interface Order {
  id: string;
  transaction_id: string;
  status: string;
  created_at: string;
}
interface DebugData {
  order: Order;
  logs: LogEntry[];
}

const route = useRoute();
const transactionId = computed(() => route.params.id as string);

// useAsyncData fetches the data. It returns the data, and states for pending/error.
// This is the key to building a responsive UI that handles loading and failure states.
const { data: debugData, pending, error } = await useAsyncData<DebugData>(
  `debug-data-${transactionId.value}`,
  () => $fetch(`/api/debug/${transactionId.value}`),
  { watch: [transactionId] } // Re-fetch if the transaction ID changes
);

// A simple utility to format timestamps for display
const formatTimestamp = (ts: string) => {
  return new Date(ts.replace(' ', 'T') + 'Z').toLocaleString();
};

// A utility to attempt to parse and colorize JSON logs
const parseLogLine = (line: string) => {
  try {
    const logJson = JSON.parse(line);
    const level = logJson.level || 'info';
    let colorClass = 'text-gray-400';
    if (level === 'error') colorClass = 'text-red-400';
    if (level === 'warn') colorClass = 'text-yellow-400';
    return { isJson: true, content: logJson, colorClass };
  } catch (e) {
    return { isJson: false, content: line, colorClass: 'text-gray-500' };
  }
};
</script>

<template>
  <div class="bg-gray-900 text-white min-h-screen p-8 font-sans">
    <header class="mb-8">
      <h1 class="text-3xl font-bold">Transaction Debugger</h1>
      <p class="text-gray-400">Transaction ID: <span class="font-mono bg-gray-700 px-2 py-1 rounded">{{ transactionId }}</span></p>
    </header>

    <!-- Loading State: A common mistake is to not handle this, leading to a blank page. -->
    <div v-if="pending" class="text-center text-xl text-blue-400">
      <p>Loading federated data from CockroachDB and Loki...</p>
    </div>

    <!-- Error State: Crucial for user feedback on failed queries. -->
    <div v-else-if="error" class="bg-red-900 border border-red-500 p-4 rounded">
      <h2 class="text-xl font-bold text-red-300">Error Fetching Data</h2>
      <p class="text-red-400 mt-2">{{ error.message }}</p>
      <pre class="bg-black p-2 mt-2 rounded overflow-x-auto">{{ error.data }}</pre>
    </div>

    <!-- Success State -->
    <div v-else-if="debugData" class="grid grid-cols-1 lg:grid-cols-3 gap-8">
      <!-- Left Column: Structured CockroachDB Data -->
      <div class="lg:col-span-1 bg-gray-800 p-6 rounded-lg shadow-lg">
        <h2 class="text-2xl font-semibold mb-4 border-b border-gray-600 pb-2">Order Details (CockroachDB)</h2>
        <dl class="space-y-3">
          <div>
            <dt class="font-medium text-gray-400">Transaction ID</dt>
            <dd class="font-mono text-lg">{{ debugData.order.transaction_id }}</dd>
          </div>
          <div>
            <dt class="font-medium text-gray-400">Internal Order ID</dt>
            <dd class="font-mono text-sm text-gray-300">{{ debugData.order.id }}</dd>
          </div>
          <div>
            <dt class="font-medium text-gray-400">Status</dt>
            <dd :class="debugData.order.status === 'FAILED' ? 'text-red-400' : 'text-green-400'" class="font-bold text-lg">
              {{ debugData.order.status }}
            </dd>
          </div>
          <div>
            <dt class="font-medium text-gray-400">Created At</dt>
            <dd class="font-mono text-gray-300">{{ formatTimestamp(debugData.order.created_at) }}</dd>
          </div>
        </dl>
      </div>

      <!-- Right Column: Unstructured Loki Logs -->
      <div class="lg:col-span-2 bg-gray-800 p-6 rounded-lg shadow-lg">
        <h2 class="text-2xl font-semibold mb-4 border-b border-gray-600 pb-2">Associated Logs (Loki)</h2>
        <div class="bg-black rounded p-4 h-[60vh] overflow-y-auto font-mono text-sm">
          <div v-if="debugData.logs.length === 0" class="text-gray-500">
            No logs found for this transaction in the specified time window.
          </div>
          <div v-else v-for="(log, index) in debugData.logs" :key="index" class="flex">
            <span class="text-gray-600 mr-4 select-none">{{ formatTimestamp(log.ts) }}</span>
            <pre :class="parseLogLine(log.line).colorClass" class="whitespace-pre-wrap flex-1">{{ JSON.stringify(parseLogLine(log.line).content, null, 2) }}</pre>
          </div>
        </div>
      </div>
    </div>
  </div>
</template>

This component provides a complete user experience, handling the three critical states: pending, error, and success. The UI is split clearly to present the structured versus unstructured data, achieving the project’s original goal.

Architecture and Testing Considerations

The final architecture is clean and effective. The client browser talks only to the Nuxt.js BFF, which in turn orchestrates queries via Trino to the underlying data stores.

graph TD
    A[User's Browser] -->|HTTP Request| B(Nuxt.js Server);
    B -->|useAsyncData call| C{Nuxt API Route /api/debug/...};
    C -->|SQL Query 1| D(Trino Coordinator);
    D -->|JDBC| E[CockroachDB Cluster];
    E -->|Order Data| D;
    D -->|Order Data| C;
    C -->|Constructs SQL Query 2| D;
    D -->|Loki API Query| F[Loki Cluster];
    F -->|Log Data| D;
    D -->|Log Data| C;
    C -->|JSON Payload| B;
    B -->|Renders HTML| A;

Testing such a system requires a multi-layered approach. Unit testing the Vue component is straightforward with Vitest and @vue/test-utils. We can mock the $fetch call to provide sample data and assert that the component renders correctly for pending, error, and success states.

// file: test/pages/debug.spec.ts
import { mount } from '@vue/test-utils';
import { describe, it, expect, vi } from 'vitest';
import DebugPage from '~/pages/debug/[id].vue';

// Mock the Nuxt composables
vi.mock('#app', () => ({
  useRoute: () => ({
    params: { id: 'txn_abc123' }
  }),
  useAsyncData: vi.fn((key, handler) => {
    // Mock a successful response
    const mockData = {
      order: { id: 'uuid-1', transaction_id: 'txn_abc123', status: 'FAILED', created_at: '2023-10-27 10:30:00.123' },
      logs: [{ ts: '2023-10-27 10:30:01.456', line: '{"level":"error","msg":"payment processor failed"}' }]
    };
    return { data: ref(mockData), pending: ref(false), error: ref(null) };
  }),
}));

describe('Transaction Debugger Page', () => {
  it('renders order and log data correctly', async () => {
    const wrapper = mount(DebugPage);
    await nextTick(); // Wait for component to update
    
    expect(wrapper.html()).toContain('Order Details (CockroachDB)');
    expect(wrapper.html()).toContain('txn_abc123');
    expect(wrapper.html()).toContain('payment processor failed');
    expect(wrapper.find('.text-red-400').text()).toBe('FAILED');
  });
});

Integration testing is more complex. The best strategy is to use a framework like Testcontainers to spin up the entire docker-compose stack (Trino, CockroachDB, Loki) in a CI pipeline, seed it with test data, and then run tests against the live Nuxt.js server API endpoint to validate the end-to-end query logic.

This solution is not without its limitations. The performance is entirely dependent on the two-step query optimization. If an engineer needs to perform a query that cannot be narrowly constrained by time, performance will suffer. The Trino cluster itself can become a bottleneck or a single point of failure if not managed with high availability in mind. Future iterations could involve scaling the Trino cluster into workers and a coordinator, implementing Trino resource groups to prevent query abuse, and potentially exploring a real-time push model using WebSockets to stream logs to the UI as they arrive in Loki, rather than the current request/response model.

Nuxt.js CockroachDB Observability Loki Trino Vue.js

Implementing SAML Authentication for a Serverless Containerized Keras Inference API

2023-10-27 Cloud Architecture

Serverless React Keras SAML Docker

Building a Scalable CRDT-Based Collaborative Backend with Micronaut WebSockets and TiDB

2023-10-27 Distributed Systems

TiDB WebSockets CRDT Micronaut Headless UI