Building Component-Level Cost Attribution by Correlating Snowflake Query History with Prometheus Frontend Metrics

Observability

Word Count: 2.6k

Read Times: 16 Min

Our Snowflake bill was becoming a black box. The finance team saw a rising monthly total, and engineering saw a dashboard page responsible for a significant portion of daily compute credits. The problem was that this “dashboard” was a composite of two dozen data-driven React components, each with its own lifecycle, rendering logic, and data dependencies. Snowflake’s QUERY_HISTORY view could tell us which queries were expensive, but it couldn’t tell us which specific UI element was responsible for triggering them. We were flying blind, unable to connect user-facing features to their direct financial impact.

The initial thought was to simply tag our queries. The Snowflake QUERY_TAG session parameter is built for this. But what should the tag be? A manually maintained string? A route name? These are too coarse. A single route could host multiple components, any one ofwhich could be the culprit. We needed a stable, unique, and automatically generated identifier at the component level. That’s when we looked at our front-end build process. We were already using CSS Modules for style isolation, which generates unique class names like DashboardWidget_revenue-chart__a3f4e. This hash is deterministic based on the component’s file path and class name. It was the perfect, granular correlation ID we were looking for.

Our architecture began to take shape:

Frontend Instrumentation: Capture front-end performance metrics (like Largest Contentful Paint) for specific components using a Prometheus client. Tag these metrics with the component’s unique CSS Module hash.
Backend Tagging: When a component fetches data, pass its CSS Module hash to the backend. The backend API will then set the QUERY_TAG on the Snowflake session before executing the query.
Data Correlation: Ingest Prometheus metrics into Snowflake. Join the performance data with Snowflake’s ACCOUNT_USAGE.QUERY_HISTORY view on the shared CSS Module hash.

This created a direct line of sight from a specific piece of the UI, through its performance characteristics, to its exact query cost.

graph TD
    subgraph Browser
        A[React Component with CSS Module hash] -->|1. Renders| B(PerformanceObserver captures LCP);
        A -->|2. Fetches Data| C{API Call with hash};
        B -->|3. Sends Metric| D[Prometheus Pushgateway];
    end

    subgraph Backend
        C --> E[API Server];
        E -->|4. Sets QUERY_TAG| F[Snowflake Session];
        F -->|5. Executes Query| G[Snowflake Warehouse];
    end

    subgraph Observability Pipeline
        D --> H[Prometheus Server];
        H -->|6. Scrapes Metrics| I(ETL Script);
        G -- "ACCOUNT_USAGE.QUERY_HISTORY" --> J[Snowflake Data Warehouse];
        I -->|7. Loads Data| K(frontend_metrics Table);
    end

    subgraph Analysis
        J --> L{Correlation Query};
        K --> L;
        L --> M[Cost & Performance Report];
    end

    style A fill:#f9f,stroke:#333,stroke-width:2px
    style G fill:#29b5e8,stroke:#333,stroke-width:2px
    style H fill:#e6522c,stroke:#333,stroke-width:2px

Phase 1: Front-End Instrumentation with Prometheus and CSS Modules

The first step is to reliably capture component-level performance and its identifier. We cannot just grab any class name; we need the one that uniquely identifies our data-driven component.

A typical data widget might look like this. Note the use of styles.container. This is the key.

// src/components/DataWidget/DataWidget.jsx
import React, { useEffect, useState } from 'react';
import styles from './DataWidget.module.css';
import { useComponentTracker } from '../../hooks/useComponentTracker';
import { fetchData } from '../../api';

const DataWidget = ({ queryId, title }) => {
  const [data, setData] = useState(null);
  const [isLoading, setIsLoading] = useState(true);
  
  // The ref will be attached to the component's root element.
  const containerRef = React.useRef(null);
  
  // Custom hook to handle performance tracking.
  // We pass the unique class name generated by CSS Modules.
  useComponentTracker({ 
    ref: containerRef, 
    componentName: 'DataWidget', 
    componentHash: styles.container 
  });

  useEffect(() => {
    let isMounted = true;
    const loadData = async () => {
      setIsLoading(true);
      try {
        // The CSS Module hash is passed to the API call.
        const result = await fetchData(queryId, styles.container);
        if (isMounted) {
          setData(result);
        }
      } catch (error) {
        console.error(`Failed to fetch data for ${title}:`, error);
        // In a real project, set an error state here.
      } finally {
        if (isMounted) {
          setIsLoading(false);
        }
      }
    };

    loadData();

    return () => {
      isMounted = false;
    };
  }, [queryId, title]);

  return (
    <div ref={containerRef} className={styles.container}>
      <h3 className={styles.title}>{title}</h3>
      {isLoading ? <p>Loading...</p> : <pre>{JSON.stringify(data, null, 2)}</pre>}
    </div>
  );
};

export default DataWidget;

The magic happens in the useComponentTracker hook. This hook uses the PerformanceObserver API to listen for largest-contentful-paint entries that belong to our tracked component. A common pitfall is observing the whole document, which is noisy. We must filter events to only those whose element is our component or a child of it.

// src/hooks/useComponentTracker.js
import { useEffect } from 'react';
import { sendMetricToGateway } from '../utils/prometheus';

export const useComponentTracker = ({ ref, componentName, componentHash }) => {
  useEffect(() => {
    if (!ref.current) {
      return;
    }
    
    // We must ensure the componentHash is a string and not undefined.
    // CSS Modules can sometimes return an empty object during server-side rendering.
    if (typeof componentHash !== 'string' || componentHash === '') {
        console.warn(`[useComponentTracker] Invalid componentHash for ${componentName}. Tracking disabled.`);
        return;
    }

    const observer = new PerformanceObserver((list) => {
      const entries = list.getEntries();
      entries.forEach((entry) => {
        // Critical check: Is the LCP element our component or inside it?
        if (ref.current && ref.current.contains(entry.element)) {
          const lcpTimeInSeconds = entry.startTime / 1000;
          
          console.log(`LCP for ${componentName} (${componentHash}): ${lcpTimeInSeconds.toFixed(2)}s`);

          // Send this metric to our Prometheus Pushgateway.
          sendMetricToGateway({
            metricName: 'frontend_component_lcp_seconds',
            help: 'Largest Contentful Paint time for a specific component',
            labels: {
              component_name: componentName,
              component_hash: componentHash,
              path: window.location.pathname,
            },
            value: lcpTimeInSeconds,
          }).catch(error => {
            // Do not crash the app if metrics fail to send. Just log it.
            console.error('Failed to send performance metric to gateway:', error);
          });
          
          // Once we've reported the LCP for this component, we can stop observing
          // to avoid sending redundant data and reduce performance overhead.
          observer.disconnect();
        }
      });
    });

    observer.observe({ type: 'largest-contentful-paint', buffered: true });

    return () => {
      observer.disconnect();
    };
  }, [ref, componentName, componentHash]);
};

The final piece of the front-end puzzle is sending the metric. We can’t have the browser talk directly to Prometheus. Instead, we use the Prometheus Pushgateway, which is designed for short-lived jobs.

// src/utils/prometheus.js

// In a real-world app, this would be in an environment variable.
const PUSHGATEWAY_URL = 'http://localhost:9091/metrics/job/frontend_app';

/**
 * Sends a single metric value to the Prometheus Pushgateway.
 * @param {object} metric - The metric object.
 * @param {string} metric.metricName - The name of the metric.
 * @param {string} metric.help - The help text for the metric.
 * @param {object} metric.labels - Key-value pairs of labels.
 * @param {number} metric.value - The numeric value of the metric.
 */
export const sendMetricToGateway = async ({ metricName, help, labels, value }) => {
  // Construct the label string for Prometheus format.
  const labelString = Object.entries(labels)
    .map(([key, val]) => `${key}="${val.replace(/"/g, '\\"')}"`) // Escape quotes
    .join(',');

  const prometheusMetricString = `# HELP ${metricName} ${help}\n`
                             + `# TYPE ${metricName} gauge\n`
                             + `${metricName}{${labelString}} ${value}\n`;

  try {
    const response = await fetch(PUSHGATEWAY_URL, {
      method: 'POST',
      headers: {
        'Content-Type': 'text/plain',
      },
      body: prometheusMetricString,
      mode: 'cors', // Important for cross-origin requests
    });

    if (!response.ok) {
        // A common mistake is not checking the response.
        // The Pushgateway can return errors for malformed metrics.
        const errorText = await response.text();
        throw new Error(`Pushgateway returned ${response.status}: ${errorText}`);
    }
  } catch (error) {
    // We re-throw so the caller can handle it, but provide context.
    throw new Error(`Failed to push metric to gateway: ${error.message}`);
  }
};

Phase 2: Backend Query Tagging

With the front-end reporting its performance, the backend must now tag the corresponding Snowflake query. The front-end API call fetchData(queryId, styles.container) now carries the crucial componentHash.

Our Node.js Express server receives this and uses the snowflake-sdk to set the QUERY_TAG.

// src/server/api.js
const express = require('express');
const snowflake = require('snowflake-sdk');

const app = express();
app.use(express.json());

// Snowflake connection configuration. Use environment variables in production.
const connection = snowflake.createConnection({
  account: process.env.SNOWFLAKE_ACCOUNT,
  username: process.env.SNOWFLAKE_USER,
  password: process.env.SNOWFLAKE_PASSWORD,
  warehouse: process.env.SNOWFLAKE_WAREHOUSE,
  database: process.env.SNOWFLAKE_DATABASE,
  schema: process.env.SNOWFLAKE_SCHEMA,
});

// A simple in-memory map for demo purposes.
// In reality, this would be a more robust query lookup system.
const queryRegistry = {
  'sales-by-region': 'SELECT region, SUM(sales) FROM sales_table GROUP BY region ORDER BY SUM(sales) DESC;',
  'monthly-active-users': 'SELECT DATE_TRUNC(\'month\', event_timestamp), COUNT(DISTINCT user_id) FROM user_events GROUP BY 1;',
};

app.post('/api/data', async (req, res) => {
  const { queryId, componentHash } = req.body;

  if (!queryId || !componentHash) {
    return res.status(400).json({ error: 'queryId and componentHash are required.' });
  }

  const sqlText = queryRegistry[queryId];
  if (!sqlText) {
    return res.status(404).json({ error: 'Query not found.' });
  }

  try {
    await new Promise((resolve, reject) => {
      connection.connect((err, conn) => {
        if (err) {
          console.error('Unable to connect to Snowflake: ' + err.message);
          return reject(err);
        }
        console.log('Successfully connected to Snowflake.');
        resolve(conn);
      });
    });

    const tagPayload = JSON.stringify({
      componentHash,
      source: 'dashboard-api',
      requestId: require('crypto').randomUUID(), // For tracing
    });

    // The core logic: set the query tag for the session.
    // A pitfall is forgetting that this is session-specific.
    // If you're using a connection pool, you must ensure this is set
    // before each transaction that needs tagging.
    await executeSnowflakeQuery(`ALTER SESSION SET QUERY_TAG = '${tagPayload.replace(/'/g, "''")}';`);
    
    console.log(`Executing query for ${componentHash} with tag: ${tagPayload}`);
    const results = await executeSnowflakeQuery(sqlText);

    // It's good practice to unset the tag after use to avoid accidental tagging of other queries.
    await executeSnowflakeQuery('ALTER SESSION UNSET QUERY_TAG;');

    res.json(results);
  } catch (error) {
    console.error('Snowflake query execution failed:', error);
    res.status(500).json({ error: 'Internal server error during data retrieval.' });
  }
});

function executeSnowflakeQuery(sqlText) {
  return new Promise((resolve, reject) => {
    connection.execute({
      sqlText,
      complete: (err, stmt, rows) => {
        if (err) {
          // Provide context in the error message.
          return reject(new Error(`Snowflake error for query "${sqlText.substring(0, 50)}...": ${err.message}`));
        }
        resolve(rows);
      },
    });
  });
}

const PORT = 3001;
app.listen(PORT, () => {
  console.log(`Server running on http://localhost:${PORT}`);
});

Phase 3: Correlation in the Data Warehouse

Now we have two separate data streams:

Performance metrics in Prometheus, tagged with component_hash.
Query execution history in Snowflake, tagged with a JSON payload containing componentHash.

The final step is to bring them together. We’ll write a simple Python script that runs on a schedule (e.g., hourly via a cron job or Airflow DAG). This script queries the Prometheus HTTP API, formats the data, and loads it into a dedicated Snowflake table.

# scripts/prometheus_to_snowflake_etl.py
import os
import requests
import snowflake.connector
import json
from datetime import datetime

# --- Configuration (use environment variables) ---
PROM_URL = os.getenv('PROMETHEUS_URL', 'http://localhost:9090')
SNOWFLAKE_USER = os.getenv('SNOWFLAKE_USER')
SNOWFLAKE_PASSWORD = os.getenv('SNOWFLAKE_PASSWORD')
SNOWFLAKE_ACCOUNT = os.getenv('SNOWFLAKE_ACCOUNT')
SNOWFLAKE_WAREHOUSE = os.getenv('SNOWFLAKE_WAREHOUSE')
SNOWFLAKE_DATABASE = 'ANALYTICS'
SNOWFLAKE_SCHEMA = 'OBSERVABILITY'
TARGET_TABLE = 'FRONTEND_PERFORMANCE_METRICS'
METRIC_NAME = 'frontend_component_lcp_seconds'

def query_prometheus():
    """Queries Prometheus for the latest value of our custom metric."""
    query = f'query?query={METRIC_NAME}'
    try:
        response = requests.get(f'{PROM_URL}/api/v1/{query}')
        response.raise_for_status() # Raises HTTPError for bad responses (4xx or 5xx)
        data = response.json()
        if data['status'] != 'success':
            raise Exception(f"Prometheus query failed: {data.get('error', 'Unknown error')}")
        return data['data']['result']
    except requests.exceptions.RequestException as e:
        print(f"Error querying Prometheus: {e}")
        return []

def load_data_to_snowflake(data):
    """Loads formatted data into a Snowflake table."""
    if not data:
        print("No data to load.")
        return

    ctx = None
    try:
        ctx = snowflake.connector.connect(
            user=SNOWFLAKE_USER,
            password=SNOWFLAKE_PASSWORD,
            account=SNOWFLAKE_ACCOUNT,
            warehouse=SNOWFLAKE_WAREHOUSE,
            database=SNOWFLAKE_DATABASE,
            schema=SNOWFLAKE_SCHEMA
        )
        cs = ctx.cursor()

        # A common mistake is not preparing the schema. Ensure table exists.
        cs.execute(f"""
            CREATE TABLE IF NOT EXISTS {TARGET_TABLE} (
                TIMESTAMP_UTC TIMESTAMP_NTZ,
                COMPONENT_NAME VARCHAR,
                COMPONENT_HASH VARCHAR,
                PATH VARCHAR,
                LCP_SECONDS FLOAT
            );
        """)
        
        # Using bind variables is critical to prevent SQL injection.
        insert_sql = f"INSERT INTO {TARGET_TABLE} (TIMESTAMP_UTC, COMPONENT_NAME, COMPONENT_HASH, PATH, LCP_SECONDS) VALUES (%s, %s, %s, %s, %s)"
        
        # Transform Prometheus data into rows
        rows_to_insert = []
        for item in data:
            metric = item['metric']
            value = item['value'] # [timestamp, value]
            rows_to_insert.append((
                datetime.utcfromtimestamp(float(value[0])),
                metric.get('component_name'),
                metric.get('component_hash'),
                metric.get('path'),
                float(value[1])
            ))
        
        cs.executemany(insert_sql, rows_to_insert)
        print(f"Successfully inserted {len(rows_to_insert)} rows into Snowflake.")

    except snowflake.connector.Error as e:
        print(f"Snowflake error: {e}")
    finally:
        if ctx:
            ctx.close()

def main():
    print("Starting Prometheus to Snowflake ETL process...")
    prom_data = query_prometheus()
    load_data_to_snowflake(prom_data)
    print("ETL process finished.")

if __name__ == "__main__":
    main()

With the FRONTEND_PERFORMANCE_METRICS table populated, we can now run the final correlation query in Snowflake. This query joins our performance data with the QUERY_HISTORY view, parsing the JSON from the QUERY_TAG to link everything together.

-- cost_attribution_analysis.sql

USE WAREHOUSE ANALYTICS_WH;
USE DATABASE ANALYTICS;
USE SCHEMA OBSERVABILITY;

-- This query joins backend query costs with frontend performance metrics
-- at the component level.

WITH ParsedQueryHistory AS (
  -- First, we parse the JSON from the QUERY_TAG. A TRY_PARSE_JSON is safer
  -- as it returns NULL on failure instead of erroring out.
  SELECT
    QUERY_ID,
    TRY_PARSE_JSON(QUERY_TAG) AS TAG_JSON,
    WAREHOUSE_NAME,
    TOTAL_ELAPSED_TIME / 1000 AS ELAPSED_SECONDS,
    -- This is a simplified cost calculation. In production, you would use
    -- your specific warehouse credit cost.
    (TOTAL_ELAPSED_TIME / 1000 / 3600) * 3.00 AS ESTIMATED_COST_USD -- Assuming $3/credit/hour
  FROM
    "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
  WHERE
    START_TIME >= DATEADD('day', -7, CURRENT_TIMESTAMP())
    AND TAG_JSON IS NOT NULL
    AND TAG_JSON:componentHash IS NOT NULL
),
AggregatedFrontendMetrics AS (
  -- Next, we aggregate the frontend performance data to get averages
  -- per component hash.
  SELECT
    COMPONENT_HASH,
    COMPONENT_NAME,
    AVG(LCP_SECONDS) AS AVG_LCP_SECONDS,
    COUNT(*) AS METRIC_COUNT
  FROM
    FRONTEND_PERFORMANCE_METRICS
  WHERE
    TIMESTAMP_UTC >= DATEADD('day', -7, CURRENT_TIMESTAMP())
  GROUP BY
    COMPONENT_HASH, COMPONENT_NAME
)
-- Finally, join them on the component hash.
SELECT
  fm.COMPONENT_NAME,
  fm.COMPONENT_HASH,
  fm.AVG_LCP_SECONDS,
  COUNT(qh.QUERY_ID) AS TOTAL_QUERIES,
  SUM(qh.ESTIMATED_COST_USD) AS TOTAL_ESTIMATED_COST,
  AVG(qh.ELAPSED_SECONDS) AS AVG_QUERY_DURATION_SECONDS,
  fm.METRIC_COUNT
FROM
  ParsedQueryHistory qh
JOIN
  AggregatedFrontendMetrics fm
  ON qh.TAG_JSON:componentHash::STRING = fm.COMPONENT_HASH
GROUP BY
  fm.COMPONENT_NAME,
  fm.COMPONENT_HASH,
  fm.AVG_LCP_SECONDS,
  fm.METRIC_COUNT
ORDER BY
  TOTAL_ESTIMATED_COST DESC;

Running this query gave us the exact data we needed. We discovered that a rarely used “Year-over-Year Growth Projection” component, identified by its hash DataWidget_projection-chart__b81c2, was responsible for over 30% of the dashboard’s total query cost. Its average LCP was also a dismal 8.5 seconds. The data was clear: this component provided low user value at a high performance and financial cost. We immediately moved to replace it with a lighter, pre-aggregated summary, which cut our warehouse costs for that page by a quarter and improved its load time.

The system isn’t without its limitations. The ETL job from Prometheus to Snowflake introduces a latency of up to an hour, meaning our analysis is not real-time. A streaming solution using Kafka and the Snowflake Kafka Connector would be a more robust, albeit complex, alternative. Furthermore, CSS Module hashes change with every build if the file path or class name is altered, which can complicate long-term historical analysis. A more permanent solution might involve adding a static data-component-id attribute to our components, but using the CSS Module hash provided a zero-friction way to prove the concept’s value. The current model also attributes cost on a one-to-one basis; it doesn’t elegantly handle costs from shared data sources or complex state management libraries that decouple data fetching from rendering, which remains an area for future exploration.

React Prometheus TSDB Snowflake CSS Modules FinOps

Implementing Full-Stack Distributed Tracing for JavaScript ISR Builds Using SkyWalking and BDD

2023-10-27 Observability

OpenTelemetry SkyWalking JavaScript BDD ISR Code Review

Implementing a Type-Safe Asynchronous NLP Pipeline with tRPC, Java, spaCy, and AWS SQS

2023-10-27 Distributed Systems

Microservices AWS SQS spaCy tRPC Java Polyglot