Engineering a Containerized Monitoring Service with Server-Side Matplotlib Rendering for NoSQL Time-Series Data

Backend Development

Word Count: 2.5k

Read Times: 15 Min

The initial requirement seemed straightforward: build a lightweight, internal dashboard to visualize high-frequency performance metrics from our distributed services. The key challenge was the nature of the data—it wasn’t simple counters. We needed to visualize distributions, percentiles, and correlations, which meant our existing client-side charting libraries (like Chart.js) were inadequate. Sending raw data points, sometimes hundreds of thousands per minute, to the browser to be processed and rendered was a non-starter. It would cripple the user’s browser and create an unacceptable network load.

This reality forced us toward a server-side rendering architecture. The concept was simple: a backend service would query the raw data, perform the necessary aggregations and statistical calculations, generate a plot as an image, and serve that static image to the UI. The browser’s only job would be to display it. This approach keeps the client lightweight and leverages the server’s computational power.

Our technology stack selection was driven by pragmatism. We needed a database that could handle high-throughput writes of semi-structured metric data. A relational schema felt too rigid. MongoDB, a document-based NoSQL database, was a natural fit. For the backend, Python’s data science ecosystem is unparalleled. Matplotlib stood out as the tool for generating the complex, publication-quality plots we required. To serve this, FastAPI was chosen for its asynchronous capabilities and high performance, crucial for a service intended to be low-latency. Finally, to ensure the entire system was reproducible and easy to deploy, Docker and Docker Compose were non-negotiable.

Here is the system architecture we set out to build:

graph TD
    subgraph "Docker Environment"
        subgraph "Metrics Producer (Mock)"
            A[Python Script] -- metrics (JSON) --> B
        end

        subgraph "Monitoring Service (FastAPI)"
            B(API: /ingest) -- writes --> C[MongoDB]
            D(API: /plot) -- reads --> C
            D -- generates plot --> E[Matplotlib Engine]
            E -- image bytes --> D
            F[WebSocket] -- notifies --> G
        end

        C

        subgraph "Web UI"
            G[Browser] -- HTTP GET --> D
            G -- receives --> H{Plot Image}
            G -- establishes connection --> F
        end
    end

The plan was to containerize the FastAPI application and the MongoDB instance, linking them together with Docker Compose. A separate script would act as a mock metrics producer, bombarding our /ingest endpoint with data to simulate a real-world load.

The First Implementation: A Naive but Functional Baseline

The first step was to get a working prototype. We defined our docker-compose.yml to orchestrate the services. In a real-world project, you’d never hardcode credentials like this, but for a local development setup, it’s a necessary simplification.

docker-compose.yml

version: '3.8'

services:
  mongodb:
    image: mongo:6.0
    container_name: metrics_db
    ports:
      - "27017:27017"
    volumes:
      - mongo_data:/data/db
    environment:
      - MONGO_INITDB_ROOT_USERNAME=admin
      - MONGO_INITDB_ROOT_PASSWORD=password
    restart: unless-stopped

  monitoring_service:
    build: .
    container_name: monitoring_api
    ports:
      - "8000:8000"
    volumes:
      - ./app:/app
    environment:
      - MONGO_URL=mongodb://admin:password@mongodb:27017/
      - DB_NAME=metrics_db
    depends_on:
      - mongodb
    restart: unless-stopped

volumes:
  mongo_data:

The Dockerfile for our FastAPI application is standard. A common mistake is to install dependencies on every build; using a multi-stage build or simply copying the requirements.txt first helps leverage Docker’s layer caching.

Dockerfile

# Use an official Python runtime as a parent image
FROM python:3.11-slim

# Set the working directory in the container
WORKDIR /app

# Install system dependencies needed for Matplotlib's backends
RUN apt-get update && apt-get install -y \
    build-essential \
    libpq-dev \
    && rm -rf /var/lib/apt/lists/*

# Copy the dependency file and install dependencies
COPY ./requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt

# Copy the content of the local src directory to the working directory
COPY ./app /app

# Command to run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

The application code itself was split into a few key components. First, configuration management. Reading from environment variables is critical for containerized applications.

app/config.py

import os
from functools import lru_cache

class Settings:
    MONGO_URL: str = os.getenv("MONGO_URL", "mongodb://localhost:27017/")
    DB_NAME: str = os.getenv("DB_NAME", "metrics_db")

@lru_cache()
def get_settings():
    return Settings()

Next, the database connection logic using motor, the asynchronous driver for MongoDB. A singleton pattern for the client ensures we’re not creating new connections for every request, which is a significant performance pitfall.

app/database.py

from motor.motor_asyncio import AsyncIOMotorClient
from .config import get_settings

class MongoDB:
    client: AsyncIOMotorClient = None

db = MongoDB()

async def connect_to_mongo():
    settings = get_settings()
    db.client = AsyncIOMotorClient(settings.MONGO_URL)
    # This is critical. Without creating an index, queries on large collections
    # will be disastrously slow. We create an index on timestamp for time-series queries.
    collection = db.client[settings.DB_NAME]["metrics"]
    await collection.create_index([("timestamp", -1)])
    await collection.create_index([("service_name", 1)])
    print("Connected to MongoDB and ensured indexes exist.")

async def close_mongo_connection():
    db.client.close()
    print("Closed MongoDB connection.")

def get_database():
    settings = get_settings()
    return db.client[settings.DB_NAME]

The core application logic resided in main.py. We defined Pydantic models for data validation and set up two endpoints: /ingest for receiving metrics and /plot/histogram/{service_name} for generating and returning a plot.

app/main.py (Initial Version)

import io
import logging
from datetime import datetime, timedelta
from fastapi import FastAPI, Depends, HTTPException
from fastapi.responses import Response
from pydantic import BaseModel, Field
from motor.motor_asyncio import AsyncIOMotorDatabase
import matplotlib
matplotlib.use('Agg')  # Use non-interactive backend
import matplotlib.pyplot as plt
import numpy as np

from .database import connect_to_mongo, close_mongo_connection, get_database

# Setup logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

app = FastAPI()

app.add_event_handler("startup", connect_to_mongo)
app.add_event_handler("shutdown", close_mongo_connection)

class Metric(BaseModel):
    service_name: str
    metric_name: str
    value: float
    timestamp: datetime = Field(default_factory=datetime.utcnow)

@app.post("/ingest")
async def ingest_metric(metric: Metric, db: AsyncIOMotorDatabase = Depends(get_database)):
    try:
        await db["metrics"].insert_one(metric.dict())
        return {"status": "ok"}
    except Exception as e:
        logger.error(f"Failed to ingest metric: {e}")
        raise HTTPException(status_code=500, detail="Internal server error")

@app.get("/plot/histogram/{service_name}")
async def get_latency_histogram(service_name: str, db: AsyncIOMotorDatabase = Depends(get_database)):
    """
    Generates a histogram of latency values for a given service over the last 10 minutes.
    """
    try:
        # Query for data in the last 10 minutes
        end_time = datetime.utcnow()
        start_time = end_time - timedelta(minutes=10)
        
        cursor = db["metrics"].find({
            "service_name": service_name,
            "metric_name": "latency_ms",
            "timestamp": {"$gte": start_time, "$lt": end_time}
        })
        
        values = [doc['value'] for doc in await cursor.to_list(length=100000)]
        
        if not values:
            raise HTTPException(status_code=404, detail="No data found for the specified service and time range.")

        # --- Matplotlib Plotting Logic ---
        fig, ax = plt.subplots(figsize=(10, 6))
        
        # Calculate statistics
        p95 = np.percentile(values, 95)
        p99 = np.percentile(values, 99)
        mean_val = np.mean(values)

        ax.hist(values, bins=50, alpha=0.7, label='Latency Distribution')
        ax.axvline(mean_val, color='r', linestyle='--', linewidth=2, label=f'Mean: {mean_val:.2f}ms')
        ax.axvline(p95, color='orange', linestyle='--', linewidth=2, label=f'P95: {p95:.2f}ms')
        ax.axvline(p99, color='purple', linestyle='--', linewidth=2, label=f'P99: {p99:.2f}ms')
        
        ax.set_title(f'Latency Distribution for "{service_name}" (Last 10 Mins)')
        ax.set_xlabel('Latency (ms)')
        ax.set_ylabel('Frequency')
        ax.legend()
        ax.grid(True, which='both', linestyle='--', linewidth=0.5)
        fig.tight_layout()
        # --- End Plotting Logic ---

        # Save plot to an in-memory buffer
        buf = io.BytesIO()
        fig.savefig(buf, format='png', dpi=100)
        plt.close(fig) # Important: close the figure to free up memory
        buf.seek(0)
        
        return Response(content=buf.getvalue(), media_type="image/png")

    except HTTPException as http_exc:
        raise http_exc
    except Exception as e:
        logger.error(f"Error generating plot for {service_name}: {e}")
        raise HTTPException(status_code=500, detail="Failed to generate plot.")

With this in place, we could build and run the stack (docker-compose up --build). A simple HTML file with some JavaScript allowed us to test the UI.

ui/index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Monitoring Dashboard</title>
</head>
<body>
    <h1>Service Latency</h1>
    <img id="plot-image" src="http://localhost:8000/plot/histogram/payment-service" alt="Loading plot...">
</body>
</html>

It worked. But it was slow. Refreshing the page took several hundred milliseconds, even with a modest amount of data. The problem was obvious: every single request involved a database query and a full re-rendering of the plot by Matplotlib. This architecture would not scale to multiple users or frequent refreshes.

Iteration Two: Introducing Caching and Real-Time Updates

The first performance bottleneck to tackle was the redundant plot generation. A real-world project would use Redis for this, but to keep the stack lean, Python’s cachetools library provided a simple, in-memory, time-aware LRU cache. This was a pragmatic trade-off for our internal tool.

We modified the plotting endpoint to cache the generated PNG bytes. A cache hit would return the stored image almost instantly, bypassing both the database query and the CPU-intensive Matplotlib rendering.

app/main.py (with Caching)

# ... (imports)
from cachetools import TTLCache
from functools import wraps

# ... (app setup)

# Cache for 10 seconds, max 128 items
plot_cache = TTLCache(maxsize=128, ttl=10)

def cache_plot(func):
    @wraps(func)
    async def wrapper(service_name: str, db: AsyncIOMotorDatabase = Depends(get_database)):
        cache_key = service_name
        if cache_key in plot_cache:
            logger.info(f"Cache hit for {service_name}")
            return Response(content=plot_cache[cache_key], media_type="image/png")
        
        logger.info(f"Cache miss for {service_name}")
        response = await func(service_name, db)
        
        # Only cache successful responses
        if response.status_code == 200:
            plot_cache[cache_key] = response.body
            
        return response
    return wrapper

@app.get("/plot/histogram/{service_name}")
@cache_plot
async def get_latency_histogram(service_name: str, db: AsyncIOMotorDatabase = Depends(get_database)):
    # ... (The exact same database query and Matplotlib logic as before)
    # ...
    # This core logic is now only executed on a cache miss.
    # ...

This change dramatically improved perceived performance for repeated requests. However, it created a new problem: the data was now stale for up to 10 seconds. The dashboard was no longer “live.” To fix this, we introduced WebSockets. The plan was to have the UI establish a WebSocket connection. The backend would then push a simple notification message whenever a plot was updated (i.e., after a cache miss and successful re-generation). The client-side JavaScript, upon receiving this message, would then reload the image source, forcing a fetch of the newly cached plot.

app/main.py (with WebSocket)

# ... (imports)
from fastapi import WebSocket
from typing import List

# ... (app setup and caching)

class ConnectionManager:
    def __init__(self):
        self.active_connections: List[WebSocket] = []

    async def connect(self, websocket: WebSocket):
        await websocket.accept()
        self.active_connections.append(websocket)

    def disconnect(self, websocket: WebSocket):
        self.active_connections.remove(websocket)

    async def broadcast(self, message: str):
        for connection in self.active_connections:
            await connection.send_text(message)

manager = ConnectionManager()

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await manager.connect(websocket)
    try:
        while True:
            await websocket.receive_text() # Just keep the connection alive
    except Exception:
        manager.disconnect(websocket)

# We must modify the cache decorator to broadcast an update
def cache_plot_and_notify(func):
    @wraps(func)
    async def wrapper(service_name: str, db: AsyncIOMotorDatabase = Depends(get_database)):
        cache_key = service_name
        if cache_key in plot_cache:
            logger.info(f"Cache hit for {service_name}")
            return Response(content=plot_cache[cache_key], media_type="image/png")
        
        logger.info(f"Cache miss for {service_name}")
        response = await func(service_name, db)
        
        if response.status_code == 200:
            plot_cache[cache_key] = response.body
            # Notify connected clients that a new plot is available
            await manager.broadcast(f"update:{service_name}")
            
        return response
    return wrapper

@app.get("/plot/histogram/{service_name}")
@cache_plot_and_notify # Use the new decorator
async def get_latency_histogram(service_name: str, db: AsyncIOMotorDatabase = Depends(get_database)):
    # ... (same logic)

The UI JavaScript needed a corresponding update to handle the WebSocket messages.

ui/index.html (with WebSocket logic)

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Monitoring Dashboard</title>
</head>
<body>
    <h1>Service Latency</h1>
    <img id="plot-image" src="http://localhost:8000/plot/histogram/payment-service" alt="Loading plot...">

    <script>
        const img = document.getElementById('plot-image');
        const serviceName = 'payment-service'; // Or get dynamically
        const socket = new WebSocket('ws://localhost:8000/ws');

        socket.onmessage = function(event) {
            console.log('Received message:', event.data);
            const [action, updatedService] = event.data.split(':');
            
            if (action === 'update' && updatedService === serviceName) {
                // Add a cache-busting query parameter to force reload
                img.src = `http://localhost:8000/plot/histogram/${serviceName}?t=` + new Date().getTime();
            }
        };

        socket.onopen = function(event) {
            console.log('WebSocket connection established.');
        };

        socket.onclose = function(event) {
            console.log('WebSocket connection closed.');
        };
    </script>
</body>
</html>

This combination of caching and WebSocket notifications provided a responsive, near-real-time user experience while protecting the backend from being overwhelmed.

Final Polish: Production-Grade Considerations

Before calling this solution “done,” a few more production-oriented details were necessary.

First, testing. A system without tests is a system waiting to fail. We added a basic unit test for the plotting logic using pytest and mongomock. This ensures that the core data processing and image generation logic remains correct during refactoring.

app/test_main.py

import pytest
from mongomock_motor import AsyncMongoMockClient
from .main import app, get_database
from fastapi.testclient import TestClient

# Mock the database dependency
async def override_get_database():
    return AsyncMongoMockClient()["test_db"]

app.dependency_overrides[get_database] = override_get_database

client = TestClient(app)

@pytest.mark.asyncio
async def test_plot_generation():
    # Setup: Ingest mock data into the mock DB
    db = await override_get_database()
    mock_data = [
        {"service_name": "test-service", "metric_name": "latency_ms", "value": 100 + i}
        for i in range(50)
    ]
    await db["metrics"].insert_many(mock_data)

    # Test the endpoint
    response = client.get("/plot/histogram/test-service")
    assert response.status_code == 200
    assert response.headers['content-type'] == 'image/png'
    # Check if the response body looks like a PNG file
    assert response.content.startswith(b'\x89PNG\r\n\x1a\n')

@pytest.mark.asyncio
async def test_plot_no_data():
    response = client.get("/plot/histogram/non-existent-service")
    assert response.status_code == 404

This test suite, though simple, provides a safety net. It verifies that the endpoint produces a valid PNG image when data exists and handles the “no data” case gracefully.

The final lingering issue is that Matplotlib’s rendering, even when fast, is a synchronous, CPU-bound operation. In an asynchronous framework like FastAPI, long-running synchronous calls can block the event loop. For very complex plots, this could still be a problem. A more robust solution would be to run the Matplotlib code in a separate thread pool using FastAPI‘s run_in_threadpool. For our current load, this was deemed an over-optimization, but it’s a known limitation of this architecture. The single-threaded nature of the Matplotlib plotting engine remains the primary potential bottleneck for horizontal scaling under extreme concurrent user load. Future iterations might explore offloading this to a dedicated fleet of worker processes managed by a queue like Celery, completely decoupling the plotting from the API request-response cycle.

FastAPI Observability MongoDB NoSQL UI Docker Matplotlib

Projecting MLOps Event Sourcing Streams into a Recoil-Managed Micro-frontend within an Angular Application

2023-10-27 Software Architecture

Recoil MLOps Event Sourcing Angular Micro-frontends

Automating Canary Deployments of an Angular Application on DigitalOcean Kubernetes with Spinnaker and Playwright

2023-10-27 DevOps

Kubernetes DigitalOcean Spinnaker UnoCSS Angular Playwright CI/CD Canary Release