Propagating Zipkin Trace Contexts into a ModSecurity WAF via Nginx for Correlated Security Auditing


We hit a wall during a production incident. A Web Application Firewall (WAF) alert fired for a suspected cross-site scripting attack. The security team escalated, providing a source IP, a timestamp, and a generic alert signature. The on-call engineer was then tasked with finding the context. This meant manually correlating that sparse information with application logs across a dozen Java microservices. By the time we pieced together the user session and the specific business transaction that was targeted, hours had passed. The fundamental disconnect was clear: our security monitoring and our application observability were two separate worlds. The WAF knew about the attack vector, but the application tracing system (Zipkin) knew about the business context. There was no bridge between them.

The initial concept was to force these two worlds to collide. If every WAF log entry, especially for a blocked request, could contain the corresponding distributed trace ID, the investigation process would shrink from hours to seconds. An engineer could take the traceId from the WAF alert, paste it into Zipkin, and immediately see the full (albeit incomplete, since it was blocked) trace, including the service, endpoint, and all available request parameters. This correlation would be a game-changer for our incident response.

Our stack was fairly standard: Java services built with Spring Boot, using Spring Cloud Sleuth for Zipkin-compatible trace generation, all sitting behind an Nginx ingress layer. The WAF of choice was ModSecurity, running as a dynamic module within Nginx. The entire deployment process was managed by Jenkins pipelines. The challenge was to make ModSecurity, a component focused purely on request content inspection, aware of a high-level application concept like a trace ID.

The core of the implementation breaks down into four parts:

  1. Ensuring the Java application properly generates and propagates trace headers.
  2. Configuring Nginx to not only proxy the request but also to preserve and handle these trace headers.
  3. Modifying the ModSecurity configuration to inspect, capture, and log the trace ID from the request headers.
  4. Automating the entire build and deployment of this integrated stack using a Jenkins pipeline to ensure consistency and reliability.

1. The Java Service: Establishing the Source of Truth

Before anything else, the trace context must exist. Our Java services use Spring Boot, which makes trace instrumentation straightforward with the Micrometer Tracing library (the successor to Spring Cloud Sleuth).

The key dependencies in the pom.xml are minimal:

<dependencies>
    <!-- Standard Spring Boot Web dependency -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <!-- Actuator for health checks and other management endpoints -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>

    <!-- Micrometer Tracing with the Brave Bridge -->
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-tracing-bridge-brave</artifactId>
    </dependency>

    <!-- Zipkin Reporter to send traces to a Zipkin server -->
    <dependency>
        <groupId>io.zipkin.reporter2</groupId>
        <artifactId>zipkin-reporter-brave</artifactId>
    </dependency>
</dependencies>

The configuration in application.properties connects the application to the Zipkin collector and defines the sampling rate. For a development or testing environment, sampling 100% of requests is acceptable. In a real-world project, this would be a much lower fraction.

src/main/resources/application.properties:

# Server port
server.port=8090

# Application Name used in Zipkin
spring.application.name=user-service

# Management endpoints configuration
management.endpoints.web.exposure.include=*
management.endpoint.health.show-details=always

# Tracing Configuration
# This enables tracing and sets the probability of a trace being sampled. '1.0' means every request.
management.tracing.sampling.probability=1.0

# Zipkin endpoint where the application will send its trace data.
# The default is http://localhost:9411/api/v2/spans
management.zipkin.tracing.endpoint=http://zipkin:9411/api/v2/spans

The application code itself is a simple REST controller. The tracing instrumentation is entirely automatic; no special code is needed to create spans for web requests. The framework intercepts incoming requests, checks for B3 propagation headers (X-B3-TraceId, X-B3-SpanId, etc.), creates a new span if they don’t exist, and adds them to all outgoing requests.

src/main/java/com/example/userservice/UserController.java:

package com.example.userservice;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;

import java.util.Map;
import java.util.HashMap;

@RestController
public class UserController {

    private static final Logger logger = LoggerFactory.getLogger(UserController.class);

    @GetMapping("/api/user")
    public Map<String, String> getUser(@RequestParam("id") String userId) {
        // The logging framework, if configured correctly, will automatically
        // include the traceId and spanId in the log output.
        logger.info("Fetching user details for user ID: {}", userId);

        // Simulate some business logic
        Map<String, String> userDetails = new HashMap<>();
        userDetails.put("userId", userId);
        userDetails.put("name", "John Doe");
        userDetails.put("status", "active");

        return userDetails;
    }
}

With this setup, any HTTP request to /api/user?id=123 will result in a trace being generated and sent to Zipkin. The crucial part for our goal is that the application’s HTTP response does not contain the trace headers, but the context is maintained internally for any downstream calls it might make. The headers we care about are on the incoming request to Nginx.

2. Nginx and ModSecurity: The Edge Enforcement Point

Our Nginx instance acts as a reverse proxy. It needs to be configured to enable ModSecurity and proxy requests to our Java service. The most complex part is getting ModSecurity to log the X-B3-TraceId header when it blocks a request.

We’ll use a Docker-based setup for reproducibility. The Nginx configuration file is the control center for this integration.

nginx/nginx.conf:

worker_processes auto;

events {
    worker_connections 1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    # Log format definition
    # We include the B3 trace ID here for access logging, which is separate
    # from the ModSecurity audit log but useful for general request tracing.
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for" '
                      'trace_id=$http_x_b3_traceid';

    access_log  /var/log/nginx/access.log  main;
    error_log   /var/log/nginx/error.log   warn;

    sendfile        on;
    keepalive_timeout  65;

    server {
        listen 80;
        server_name localhost;

        # Enable ModSecurity. The rules file path is critical.
        modsecurity on;
        modsecurity_rules_file /etc/nginx/modsec/main.conf;

        location / {
            # Pass all requests to the upstream Java application
            proxy_pass http://user-service:8090;

            # --- Critical Header Propagation ---
            # Ensure that the trace context headers are passed to the upstream service.
            # If these are not present, Spring Boot will generate new ones.
            # If they are present, Spring Boot will join the existing trace.
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            # B3 Propagation Headers for Zipkin/Brave
            proxy_set_header X-B3-TraceId $http_x_b3_traceid;
            proxy_set_header X-B3-SpanId $http_x_b3_spanid;
            proxy_set_header X-B3-ParentSpanId $http_x_b3_parentspanid;
            proxy_set_header X-B3-Sampled $http_x_b3_sampled;
            proxy_set_header X-B3-Flags $http_x_b3_flags;
        }
    }
}

This configuration does two things: it sets up a custom log_format for the Nginx access.log that includes the trace ID, and it correctly forwards all B3 headers to the upstream Java application. However, this does not solve the core problem: getting the trace ID into the separate modsec_audit.log when a request is blocked.

The ModSecurity configuration itself consists of several files. The main file enables the OWASP Core Rule Set (CRS) and loads our custom rule.

nginx/modsec/main.conf:

# Load the main ModSecurity configuration file.
Include /etc/nginx/modsec/modsecurity.conf

# Load the OWASP Core Rule Set.
# This provides generic protection against common attack vectors.
Include /etc/nginx/modsec/owasp-crs/crs-setup.conf
Include /etc/nginx/modsec/owasp-crs/rules/*.conf

# Load our custom rules file. This is where the magic happens.
Include /etc/nginx/modsec/custom-rules.conf

nginx/modsec/modsecurity.conf:

# This is a standard ModSecurity configuration file.
# The most important directive for our purposes is SecAuditEngine.

# Turn on the audit engine. It can be On, Off, or RelevantOnly.
# RelevantOnly logs transactions that trigger a warning or error.
SecAuditEngine RelevantOnly

# Log file for audit events
SecAuditLog /var/log/nginx/modsec_audit.log

# Specify which parts of the transaction to log.
SecAuditLogParts ABCIJDEFHZ

# Use JSON format for audit logs, which is easier for machines to parse.
SecAuditLogType JSON

Now for the critical piece. The standard ModSecurity audit log format does not have a “part” for logging arbitrary request headers. A common mistake is to assume you can just add a variable to the log format string, but this doesn’t work with the Nginx connector. The solution is a clever workaround: we create a passive rule that always matches, does nothing to the request (pass), but uses the logdata action to embed the trace ID into the log message.

nginx/modsec/custom-rules.conf:

# Rule ID: 1000
# Phase: 1 (runs after request headers are received)
# Description: This rule's sole purpose is to capture the X-B3-TraceId header
#              and embed it into the audit log for correlation.
#
# Actions:
#   - pass: Never block the request. This is purely for logging.
#   - nolog: Don't create a standard log entry for this rule match itself.
#   - auditlog: Ensure this rule's data is written to the audit log if
#               any other rule in the transaction causes it to be logged.
#   - msg: A descriptive message for the log.
#   - logdata: The key part. We capture the header value here.
#              The format 'TraceID:%{...}' makes it easily parsable.
SecRule REQUEST_HEADERS:X-B3-TraceId "@rx ." \
    "id:1000,phase:1,pass,nolog,auditlog,msg:'Trace Context Captured',logdata:'TraceID:%{REQUEST_HEADERS.X-B3-TraceId}'"

# Rule ID: 1001
# Description: This rule logs a marker when a trace ID is missing. This helps
#              identify observability gaps or misconfigured clients.
SecRule &REQUEST_HEADERS:X-B3-TraceId "@eq 0" \
    "id:1001,phase:1,pass,nolog,auditlog,msg:'Observability Gap Detected',logdata:'TraceID:Not-Provided'"

This configuration is robust. If a X-B3-TraceId header exists, Rule 1000 captures it. If it doesn’t, Rule 1001 logs that it was missing. When another rule (like an SQL injection rule from the OWASP CRS) blocks the request, the logdata from our custom rule is included in the final audit entry because the transaction is marked for logging.

3. Automation with Jenkins: Building the Immutable Stack

Manually configuring this environment is prone to error. We need to package the entire setup into Docker images and use a Jenkins pipeline to build and deploy them. This ensures every environment is identical.

First, the Dockerfile for our Java service:
user-service/Dockerfile:

FROM openjdk:17-slim

ARG JAR_FILE=target/*.jar
COPY ${JAR_FILE} app.jar

ENTRYPOINT ["java", "-jar", "/app.jar"]

Next, the more complex Dockerfile for our custom Nginx image. This image will bundle our specific nginx.conf, modsecurity.conf, the OWASP CRS, and our custom rule.
nginx/Dockerfile:

# Use an official Nginx image that includes the ModSecurity v3 connector
FROM owasp/modsecurity-nginx:1.21.0-3.0.8

# Remove default configurations
RUN rm /etc/nginx/conf.d/default.conf

# Copy our custom Nginx configuration
COPY nginx.conf /etc/nginx/nginx.conf

# Copy ModSecurity base configuration and our custom rules
COPY modsec/modsecurity.conf /etc/nginx/modsec/modsecurity.conf
COPY modsec/main.conf /etc/nginx/modsec/main.conf
COPY modsec/custom-rules.conf /etc/nginx/modsec/custom-rules.conf

# The base image already contains the OWASP Core Rule Set at /etc/nginx/modsec/owasp-crs
# We just need to ensure our main.conf points to it correctly.

To orchestrate the services, we use docker-compose.yml:
docker-compose.yml:

version: '3.8'

services:
  zipkin:
    image: openzipkin/zipkin:latest
    ports:
      - "9411:9411"

  user-service:
    build:
      context: ./user-service
    ports:
      - "8090:8090"
    depends_on:
      - zipkin
    environment:
      # Ensure the service can find Zipkin using the service name
      - MANAGEMENT_ZIPKIN_TRACING_ENDPOINT=http://zipkin:9411/api/v2/spans

  nginx-waf:
    build:
      context: ./nginx
    ports:
      - "80:80"
    depends_on:
      - user-service
    volumes:
      # Mount log directories to the host for easier inspection
      - ./logs/nginx:/var/log/nginx

Finally, the Jenkinsfile automates the build and deployment process. This declarative pipeline defines the stages for building each component and deploying the stack.

Jenkinsfile:

pipeline {
    agent any

    environment {
        DOCKER_REGISTRY = 'your-docker-registry'
        APP_NAME = 'user-service'
        WAF_NAME = 'nginx-waf'
        BUILD_NUMBER = "build-${env.BUILD_ID}"
    }

    stages {
        stage('Checkout') {
            steps {
                checkout scm
            }
        }

        stage('Build Java Service') {
            steps {
                script {
                    // Use a Maven container to build the project to avoid host dependencies
                    docker.image('maven:3.8-openjdk-17').inside {
                        sh 'mvn clean package'
                    }
                }
            }
        }

        stage('Build and Push Service Image') {
            steps {
                script {
                    def serviceImage = docker.build("${DOCKER_REGISTRY}/${APP_NAME}:${BUILD_NUMBER}", './user-service')
                    docker.withRegistry("https://#{DOCKER_REGISTRY}", 'docker-registry-credentials') {
                        serviceImage.push()
                    }
                }
            }
        }

        stage('Build and Push Nginx WAF Image') {
            steps {
                script {
                    def wafImage = docker.build("${DOCKER_REGISTRY}/${WAF_NAME}:${BUILD_NUMBER}", './nginx')
                    docker.withRegistry("https://#{DOCKER_REGISTRY}", 'docker-registry-credentials') {
                        wafImage.push()
                    }
                }
            }
        }

        stage('Deploy') {
            steps {
                echo 'Deployment step would be here.'
                echo 'In a real-world project, this would use kubectl, Ansible, or'
                echo 'simply docker-compose up on the target node after pulling the new images.'
                // Example:
                // sh 'docker-compose pull'
                // sh 'docker-compose up -d --force-recreate'
            }
        }
    }
}

4. Verifying the Solution

With the stack deployed, we can test the correlation. First, a legitimate request:

# Generate a trace ID for the request
TRACE_ID=$(openssl rand -hex 16)

curl -H "X-B3-TraceId: ${TRACE_ID}" \
     -H "X-B3-SpanId: ${TRACE_ID}" \
     "http://localhost/api/user?id=123"

This request will pass through the WAF, hit the Java service, and a full trace will appear in Zipkin associated with the provided TRACE_ID. The Nginx access.log will show the request with trace_id=..., but the modsec_audit.log will be empty because no rules were triggered.

Now, a malicious request simulating an SQL injection:

# Generate a new trace ID for the malicious request
MALICIOUS_TRACE_ID=$(openssl rand -hex 16)

curl -i -H "X-B3-TraceId: ${MALICIOUS_TRACE_ID}" \
        -H "X-B3-SpanId: ${MALICIOUS_TRACE_ID}" \
        "http://localhost/api/user?id=1'%20or%20'1'='1"

The response will be an immediate HTTP/1.1 403 Forbidden. The Java service is never reached.

Now we check the logs. Tailing the ModSecurity audit log (/logs/nginx/modsec_audit.log) will reveal a JSON entry similar to this:

{
  "transaction": {
    "client_ip": "172.21.0.1",
    "time_stamp": "Fri Oct 27 11:30:00 2023",
    "server_id": "...",
    "request": {
      "method": "GET",
      "uri": "/api/user?id=1' or '1'='1",
      "http_version": 1.1,
      "headers": {
        "Host": "localhost",
        "X-B3-TraceId": "..."
      }
    },
    "response": {
      "http_code": 403,
      ...
    },
    "producer": { ... },
    "messages": [
      {
        "message": "Trace Context Captured",
        "details": {
          "match": "...",
          "ruleId": "1000",
          ...
          "data": "TraceID:a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"
        }
      },
      {
        "message": "SQL Injection Attack Detected via libinjection.",
        "details": {
          "match": "...",
          "ruleId": "942100",
          ...
          "data": "Matched Data: sqli_fingerprint..."
        }
      }
    ]
  }
}

The key is in the "messages" array. We see the OWASP rule that fired (ruleId: 942100) and, right alongside it, the message from our custom rule (ruleId: 1000) containing the exact TraceID. We have successfully bridged the gap. An automated alert from this log can now include the trace ID, directly linking the security event to the application context.

A quick search in Zipkin for this traceId will show nothing, which is itself valuable information. An empty trace result confirms the request was blocked at the edge and never reached the application code, precisely matching the evidence from the WAF log.

This architecture provides a complete audit trail from the network edge to the application logic. The visualization of this flow is critical for understanding where the request was stopped.

sequenceDiagram
    participant Client
    participant Nginx_WAF as Nginx (with ModSecurity)
    participant JavaApp as Java Application
    participant Zipkin
    participant SIEM as Log Aggregator / SIEM

    Client->>+Nginx_WAF: GET /api/user?id=1' or '1'='1' (Header: X-B3-TraceId: abc)
    Note over Nginx_WAF: ModSecurity Rule 1000 captures 'TraceID:abc'.
    Note over Nginx_WAF: ModSecurity Rule 942100 detects SQLi.
    Nginx_WAF-->>SIEM: Audit Log: { rule:942100, data:'SQLi...', messages:['...TraceID:abc...'] }
    Nginx_WAF-->>-Client: HTTP 403 Forbidden

    Note right of Client: Investigation starts.
    SIEM-->>Engineer: Alert: SQLi detected on Nginx. TraceID=abc.
    Engineer->>+Zipkin: Search for trace 'abc'
    Zipkin-->>-Engineer: No trace found.
    Note over Engineer: Conclusion: Request blocked at WAF,
never reached application layer.

This solution isn’t without its limitations. The logdata approach, while effective, is a workaround. It pollutes the messages field of the audit log, which might complicate log parsing if not handled carefully. A more native solution would involve a custom-built ModSecurity logger or a different WAF that has first-class support for custom header logging. Furthermore, the performance impact of running ModSecurity with the full OWASP CRS needs careful benchmarking in a production environment. The added latency, while typically small, can be significant under high load. Future iterations could explore using the OpenTelemetry Nginx module for more structured and comprehensive telemetry data emission directly from the proxy layer, potentially unifying access logs, metrics, and trace propagation into a single, cohesive framework.


  TOC