Securing a GraphQL Strangler Endpoint for a MyBatis Monolith Using a CI-Driven WAF

Architecture

Word Count: 2.8k

Read Times: 17 Min

The core of our reporting engine was a battle-hardened Java 8 monolith. Its persistence layer was a sprawling collection of MyBatis XML mappers, containing thousands of lines of hand-optimized SQL for complex, multi-table joins. This system was stable but brittle. Any change was a high-risk endeavor. The business, however, needed a new, dynamic dashboard—a React application using Relay for its data fetching. Relay’s dependency on GraphQL was a non-starter for our existing REST endpoints, which were rigid, over-fetching, and deeply coupled to the monolith’s internal models.

A full rewrite was out of the question due to risk and time constraints. The only viable path forward was the Strangler Fig pattern. We decided to introduce a new service, written in Go using the Echo framework, to act as a GraphQL facade. This new service would sit in front of the monolith, intercepting requests for new functionality. The initial implementation would simply translate GraphQL queries into calls to the monolith’s existing REST APIs. Over time, we would migrate query logic directly into the Go service, “strangling” the monolith’s reporting responsibilities one query at a time.

This approach introduced a new set of problems, primarily around security and operational complexity. How do we secure a public-facing GraphQL endpoint? How do we manage a hybrid CI/CD process for both a Go microservice and a Java monolith without slowing down development? In a real-world project, the answer always comes down to automation. This is the story of how we built a secure, CI-driven pipeline using CircleCI to manage the deployment of the services and the configuration of an AWS WAF to protect our new GraphQL endpoint.

The Initial Architectural Layout

The target architecture had to accommodate the coexistence of the old and new stacks. The flow was designed to minimize initial disruption.

graph TD
    subgraph "Browser"
        A[React App with Relay]
    end

    subgraph "AWS Infrastructure"
        B[AWS WAF]
        C[ALB - Application Load Balancer]
        D[Go/Echo GraphQL Service]
        E[Legacy Java/MyBatis Monolith]
        F[Shared PostgreSQL Database]
    end

    A -- GraphQL Queries --> B
    B -- Filtered Traffic --> C
    C -- /graphql --> D
    D -- REST API Calls --> E
    D -- Read-Only SQL --> F
    E -- MyBatis SQL --> F

    style A fill:#D6EAF8
    style D fill:#D5F5E3
    style E fill:#FDEDEC

A key decision was to allow the new Go service read-only access to the shared database from day one. This was a calculated risk. While it creates coupling at the data layer, it provides an immediate performance escape hatch for simple queries that don’t require the monolith’s complex business logic, avoiding the latency of an extra network hop through the monolith’s antiquated REST layer. All write operations, without exception, had to go through the monolith to preserve transactional integrity managed by its domain logic. A common mistake is to start writing to the same database from two different services without a distributed transaction manager, which is a direct path to data corruption.

Building the Go GraphQL Façade with Echo

The Go service’s main responsibility was to expose a GraphQL schema and resolve queries. We chose Echo for its speed and middleware-first design. graphql-go/graphql was a pragmatic choice for the GraphQL implementation.

The initial resolver for a user’s profile illustrates the façade pattern. It calls the legacy monolith’s REST endpoint.

// internal/resolver/user_resolver.go
package resolver

import (
	"context"
	"encoding/json"
	"fmt"
	"io/ioutil"
	"net/http"
	"time"

	"github.com/my-org/graphql-strangler/internal/model"
	"github.com/my-org/graphql-strangler/internal/config"
	log "github.com/sirupsen/logrus"
)

// MonolithClient is a dedicated client for communicating with the legacy system.
type MonolithClient struct {
	Client  *http.Client
	BaseURL string
}

func NewMonolithClient(cfg *config.Config) *MonolithClient {
	return &MonolithClient{
		Client: &http.Client{
			Timeout: time.Duration(cfg.Monolith.TimeoutSeconds) * time.Second,
			Transport: &http.Transport{
				MaxIdleConns:        100,
				MaxIdleConnsPerHost: 100,
			},
		},
		BaseURL: cfg.Monolith.ApiURL,
	}
}

// UserResolver handles GraphQL queries for the User type.
type UserResolver struct {
	Monolith *MonolithClient
}

// GetUserByID fetches a user by making a REST call to the legacy monolith.
// This demonstrates the core of the Strangler Fig façade pattern.
func (r *UserResolver) GetUserByID(ctx context.Context, id string) (*model.User, error) {
	reqURL := fmt.Sprintf("%s/v1/users/%s", r.Monolith.BaseURL, id)
	
	req, err := http.NewRequestWithContext(ctx, "GET", reqURL, nil)
	if err != nil {
		log.WithFields(log.Fields{"error": err, "url": reqURL}).Error("Failed to create request for monolith")
		return nil, fmt.Errorf("internal error creating request")
	}
	
	// In a real-world project, you must propagate tracing headers.
	// e.g., req.Header.Set("X-Request-ID", tracing.GetRequestID(ctx))
	req.Header.Set("Accept", "application/json")

	log.WithField("url", reqURL).Info("Calling legacy monolith for user data")
	
	resp, err := r.Monolith.Client.Do(req)
	if err != nil {
		log.WithFields(log.Fields{"error": err, "url": reqURL}).Error("Error response from monolith")
		return nil, fmt.Errorf("failed to communicate with legacy system")
	}
	defer resp.Body.Close()

	if resp.StatusCode != http.StatusOK {
		log.WithFields(log.Fields{
			"status_code": resp.StatusCode,
			"url":         reqURL,
		}).Warn("Non-200 response from monolith")
		return nil, fmt.Errorf("legacy system returned status %d", resp.StatusCode)
	}

	body, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		log.WithField("error", err).Error("Failed to read monolith response body")
		return nil, fmt.Errorf("error reading response")
	}
	
	var user model.User
	if err := json.Unmarshal(body, &user); err != nil {
		log.WithField("error", err).Error("Failed to unmarshal user from monolith response")
		return nil, fmt.Errorf("error decoding legacy user data")
	}

	return &user, nil
}

The server setup in Echo is straightforward. We defined a single /graphql endpoint.

// cmd/server/main.go
package main

import (
	"github.com/labstack/echo/v4"
	"github.com/labstack/echo/v4/middleware"
	"github.com/my-org/graphql-strangler/internal/handler"
	"github.com/my-org/graphql-strangler/internal/config"
	"github.com/my-org/graphql-strangler/internal/resolver"
	log "github.com/sirupsen/logrus"
)

func main() {
	cfg, err := config.Load()
	if err != nil {
		log.Fatalf("Failed to load configuration: %v", err)
	}
	
	e := echo.New()

	// Standard middleware
	e.Use(middleware.Logger())
	e.Use(middleware.Recover())
	e.Use(middleware.RequestID())

	// Initialize dependencies
	monolithClient := resolver.NewMonolithClient(cfg)
	// db, err := database.Connect(cfg.Database) ...
	
	// Setup GraphQL handler
	rootResolver := resolver.NewRootResolver(monolithClient /*, db */)
	schema := schema.MustNewSchema(rootResolver)
	graphqlHandler := handler.NewGraphQLHandler(schema)

	e.POST("/graphql", graphqlHandler.ServeHTTP)
	e.GET("/health", func(c echo.Context) error {
		return c.String(http.StatusOK, "OK")
	})

	log.Infof("Starting server on port %s", cfg.Server.Port)
	if err := e.Start(":" + cfg.Server.Port); err != nil {
		log.Fatalf("Server failed to start: %v", err)
	}
}

This gave us a functioning GraphQL service, but it was sitting unprotected, and its deployment was a manual process.

Automating the Hybrid Build with CircleCI

The next step was to create a unified CI/CD pipeline. The pipeline needed to:

Build and test the legacy Java/MyBatis monolith.
Build and test the new Go/Echo GraphQL service.
Build Docker images for both services.
Push the images to a container registry (ECR).
Trigger a deployment (e.g., update an ECS service).
Update AWS WAF rules to protect the newly deployed endpoint.

This is where CircleCI’s workflows and orbs became critical.

# .circleci/config.yml
version: 2.1

orbs:
  aws-cli: circleci/aws-[email protected]
  aws-ecr: circleci/aws-[email protected]
  maven: circleci/[email protected]
  golang: circleci/[email protected]

# Reusable command for AWS authentication
commands:
  configure-aws:
    description: "Configure AWS credentials"
    steps:
      - aws-cli/setup:
          aws-access-key-id: AWS_ACCESS_KEY_ID
          aws-secret-access-key: AWS_SECRET_ACCESS_KEY
          aws-region: AWS_REGION

jobs:
  # == Job for Legacy Java/MyBatis Monolith ==
  build-and-test-monolith:
    docker:
      - image: cimg/openjdk:8.0-node
    executor: maven/default
    steps:
      - checkout
      - maven/with_cache:
          steps:
            - run:
                name: "Build and Test Monolith"
                command: mvn clean install
      - persist_to_workspace:
          root: ./monolith-app/target
          paths:
            - monolith-app-*.jar

  # == Job for New Go/GraphQL Service ==
  build-and-test-graphql-service:
    executor: golang/default
    steps:
      - checkout
      - golang/load-cache
      - run:
          name: "Vet and Lint"
          command: |
            go vet ./...
            go install honnef.co/go/tools/cmd/staticcheck@latest
            staticcheck ./...
      - run:
          name: "Run Unit Tests"
          command: go test -v -race -coverprofile=coverage.txt ./...
      - golang/save-cache
      - persist_to_workspace:
          root: .
          paths:
            - graphql-service

  # == Job for Building and Pushing Docker Images ==
  build-and-push-images:
    docker:
      - image: cimg/base:stable
    steps:
      - setup_remote_docker:
          version: 20.10.18
      - checkout
      - attach_workspace:
          at: /tmp/workspace
      - configure-aws
      - aws-ecr/build-and-push-image:
          repo: 'monolith-repo'
          tag: 'latest,${CIRCLE_SHA1}'
          dockerfile: 'monolith-app/Dockerfile'
          path: '.'
      - aws-ecr/build-and-push-image:
          repo: 'graphql-strangler-repo'
          tag: 'latest,${CIRCLE_SHA1}'
          dockerfile: 'graphql-service/Dockerfile'
          path: '.'

  # == Job for Updating WAF Rules ==
  update-waf-rules:
    docker:
      - image: cimg/base:stable
    environment:
      WAF_RULE_GROUP_NAME: "GraphQL-Protection-Rules"
      WAF_SCOPE: "REGIONAL"
    steps:
      - checkout
      - configure-aws
      - run:
          name: "Fetch current WAF Rule Group lock token"
          command: |
            # You MUST fetch the LockToken before any update to prevent race conditions.
            # This is a critical step in production automation.
            TOKEN=$(aws wafv2 get-rule-group \
              --name ${WAF_RULE_GROUP_NAME} \
              --scope ${WAF_SCOPE} \
              --id $(aws wafv2 list-rule-groups --scope ${WAF_SCOPE} --query "RuleGroups[?Name=='${WAF_RULE_GROUP_NAME}'].Id" --output text) \
              --query 'LockToken' --output text)
            echo "export WAF_LOCK_TOKEN=${TOKEN}" >> $BASH_ENV
      - run:
          name: "Apply GraphQL WAF rules"
          command: |
            aws wafv2 update-rule-group \
              --name ${WAF_RULE_GROUP_NAME} \
              --scope ${WAF_SCOPE} \
              --id $(aws wafv2 list-rule-groups --scope ${WAF_SCOPE} --query "RuleGroups[?Name=='${WAF_RULE_GROUP_NAME}'].Id" --output text) \
              --rules file://./infrastructure/waf-rules.json \
              --visibility-config SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName=${WAF_RULE_GROUP_NAME} \
              --lock-token ${WAF_LOCK_TOKEN}

workflows:
  build-deploy-secure:
    jobs:
      - build-and-test-monolith
      - build-and-test-graphql-service

      - build-and-push-images:
          requires:
            - build-and-test-monolith
            - build-and-test-graphql-service
          filters:
            branches:
              only:
                - main

      # The pitfall here is making security an afterthought. The WAF update
      # should be part of the same atomic deployment workflow.
      - update-waf-rules:
          requires:
            - build-and-push-images
          filters:
            branches:
              only:
                - main

Securing the GraphQL Endpoint with a CI-Driven WAF

GraphQL endpoints are vulnerable to a unique class of attacks that generic WAFs might miss.

Introspection Queries: Malicious actors can use introspection to map out your entire schema, discovering private types and fields. While useful for development tools, it should be disabled or restricted in production.
Denial of Service (DoS): A deeply nested or circular query can exhaust server resources (CPU, memory) very quickly. For example, query { user { friends { friends { friends ... } } } }.
Field Suggestion Attacks: Requesting non-existent fields can trigger expensive lookup logic or reveal information through error messages.

Our WAF strategy was to create a specific AWS WAF v2 rule group managed via our CircleCI pipeline. This ensures our security posture evolves with our application code.

Here is the waf-rules.json file referenced in the CircleCI job. It contains two main rules: one to block common introspection query patterns and another to limit query depth by inspecting the request body.

// infrastructure/waf-rules.json
[
  {
    "Name": "BlockGraphQLIntrospection",
    "Priority": 10,
    "Action": {
      "Block": {}
    },
    "Statement": {
      "ByteMatchStatement": {
        "SearchString": "__schema",
        "FieldToMatch": {
          "Body": {
            "OversizeHandling": "CONTINUE"
          }
        },
        "TextTransformations": [
          {
            "Priority": 0,
            "Type": "NONE"
          }
        ],
        "PositionalConstraint": "CONTAINS"
      }
    },
    "VisibilityConfig": {
      "SampledRequestsEnabled": true,
      "CloudWatchMetricsEnabled": true,
      "MetricName": "BlockGraphQLIntrospectionRule"
    }
  },
  {
    "Name": "LimitQueryDepth",
    "Priority": 20,
    "Action": {
      "Block": {}
    },
    "Statement": {
      "RegexMatchStatement": {
        "RegexString": "([\\s]*(?:query|mutation|subscription)[^\\{]*\\{[^\\{]*){8,}",
        "FieldToMatch": {
           "Body": {
            "OversizeHandling": "CONTINUE"
          }
        },
        "TextTransformations": [
          {
            "Priority": 0,
            "Type": "NONE"
          }
        ]
      }
    },
    "VisibilityConfig": {
      "SampledRequestsEnabled": true,
      "CloudWatchMetricsEnabled": true,
      "MetricName": "LimitQueryDepthRule"
    }
  }
]

The regex ([\\s]*(?:query|mutation|subscription)[^\\{]*\\{[^\\{]*){8,} is a pragmatic approach to depth limiting at the WAF layer. It counts the occurrences of { characters that aren’t part of the initial query definition, blocking any query with a depth greater than approximately 7. While not foolproof, it effectively stops pathologically deep queries before they hit the Go application server. The real solution for query complexity analysis belongs inside the application, but the WAF provides a crucial first line of defense.

The most critical part of the CircleCI job is fetching the LockToken before applying an update. AWS WAF uses this token for optimistic locking to prevent concurrent modifications from clobbering each other. Forgetting this in an automated pipeline will lead to intermittent and frustrating WAFOptimisticLockException failures.

The Reality of Data Access

With the CI/CD pipeline and WAF in place, we began migrating queries off the monolith’s REST APIs and directly to the database from the Go service. This is where the true complexity of the Strangler Fig pattern reveals itself.

Consider a report that requires data from both a legacy table managed by MyBatis and a new table owned by the GraphQL service.

// internal/resolver/report_resolver.go
package resolver

import (
    "context"
    "database/sql"
    
    "github.com/my-org/graphql-strangler/internal/model"
)

type ReportResolver struct {
    DB *sql.DB // Connection to the shared database
    Monolith *MonolithClient
}

// GetConsolidatedReport demonstrates a hybrid data fetching strategy.
func (r *ReportResolver) GetConsolidatedReport(ctx context.Context, reportID string) (*model.ConsolidatedReport, error) {
    tx, err := r.DB.BeginTx(ctx, &sql.TxOptions{ReadOnly: true})
    if err != nil {
        return nil, err
    }
    defer tx.Rollback() // Best practice, even for read-only transactions

    // Step 1: Fetch core data directly from the database.
    // This query is new and optimized for the GraphQL service.
    var coreData model.CoreReportData
    err = tx.QueryRowContext(ctx, "SELECT id, name, created_at FROM reports WHERE id = $1", reportID).Scan(
        &coreData.ID, &coreData.Name, &coreData.CreatedAt,
    )
    if err != nil {
        // Handle sql.ErrNoRows, etc.
        return nil, err
    }

    // Step 2: Fetch complex, legacy summary data by calling the monolith.
    // Rewriting the complex MyBatis query is too risky at this stage.
    // This call does NOT participate in the Go service's transaction.
    legacySummary, err := r.Monolith.GetLegacySummaryForReport(ctx, reportID)
    if err != nil {
        return nil, err
    }

    // Combine the results into a single GraphQL response model
    consolidatedReport := &model.ConsolidatedReport{
        ID:          coreData.ID,
        ReportName:  coreData.Name,
        GeneratedAt: coreData.CreatedAt,
        Summary:     legacySummary.Data, // Data from monolith
    }

    // No commit is needed for a read-only transaction.
    return consolidatedReport, nil
}

This hybrid resolver highlights the architectural trade-off. We gain performance for the CoreReportData query, but we lose transactional consistency between the two data sources. The call to r.Monolith.GetLegacySummaryForReport is a separate, atomic operation. If the data it relies on changes between our direct DB read and the monolith’s execution, we could serve a report with inconsistent data. In our use case—a non-financial reporting dashboard—this transient inconsistency was an acceptable risk, weighed against the significant development effort of rewriting the legacy logic. For a transactional system, this approach would be untenable.

The monolith’s MyBatis XML mapper for that legacy summary might look something like this, explaining why we were so hesitant to touch it:

<!-- com/my-org/legacy/persistence/ReportMapper.xml -->
<select id="getLegacySummaryForReport" resultType="com.myorg.LegacySummary" parameterType="string">
    SELECT
        r.id as reportId,
        (SELECT COUNT(DISTINCT s.user_id)
         FROM sales_transactions s
         WHERE s.report_id = r.id AND s.status = 'COMPLETED'
           AND s.created_at > r.start_date
           AND s.created_at < r.end_date) as unique_purchasers,
        (SELECT SUM(li.price * li.quantity)
         FROM line_items li
         JOIN sales_transactions s ON s.id = li.transaction_id
         WHERE s.report_id = r.id AND s.status = 'COMPLETED'
           AND s.is_refunded = false
           AND li.product_id NOT IN (SELECT id FROM excluded_products)) as total_revenue
    FROM
        reports r
    WHERE
        r.id = #{reportId}
        AND r.is_active = true;
</select>

The logic embedded in these subqueries represents business rules that are difficult to untangle and safely replicate. The pragmatic path was to encapsulate it behind the monolith’s API and call it from the Go service.

This entire architecture, a delicate balance of new and old, is only manageable because of the robust, automated pipeline. The CircleCI workflow ensures that every change, whether in the Go service, the Java monolith, or the WAF rules, is tested, built, and deployed in a consistent, repeatable manner. It provides the safety net required to perform this kind of high-stakes architectural surgery on a living system.

The current implementation is not a final state. The tight coupling at the database level remains a significant source of technical debt. Future iterations will likely involve introducing a Change Data Capture (CDC) pipeline with Debezium to stream data from the legacy database into a new data store owned exclusively by the GraphQL service, achieving true service independence. Furthermore, the WAF rules are static; a more sophisticated implementation could involve a Lambda function that analyzes GraphQL query logs and dynamically updates WAF IP sets to block malicious actors in near real-time. The current system, however, provided a stable and secure bridge, allowing us to deliver immediate value to the business while paving a clear, incremental path away from the monolith.

MyBatis GraphQL Echo WAF CircleCI Relay Strangler Fig

Implementing a Multi-Store RAG Federation Service in Rust with esbuild for UI Tooling

2023-10-27 Data Engineering

Milvus Dgraph Couchbase Rust esbuild

Building a Unified Telemetry Pipeline Across SwiftUI RabbitMQ and ActiveMQ with Fluentd

2023-10-27 Observability

SwiftUI Observability DevOps RabbitMQ Fluentd ActiveMQ