Architecting a Secure Multi-Tenant Model Serving Gateway with OIDC and JPA-Driven Authorization

Architecture

Word Count: 3k

Read Times: 18 Min

The core challenge is building a unified model serving infrastructure for a multi-tenant SaaS platform. Each tenant possesses their own fine-tuned models and proprietary data. The architecture must guarantee absolute data isolation and access control at every layer—from the user interface down to the machine learning inference endpoint. Concurrently, it must maintain strict ACID compliance for all metadata operations, such as model promotions, versioning, or tenant onboarding, which are non-negotiable transactional boundaries in a production system.

A naive approach would be to treat each service as a distinct security domain. For instance, the ML serving layer could use static API keys, managed by a primary backend application. This is a common pattern but introduces significant operational friction and security vulnerabilities. A compromised API key provides long-lived access, revocation is a manual and error-prone process, and there is no unified mechanism for managing user sessions or permissions across the frontend, backend, and ML services. This fractures the security posture and creates an untenable management burden as the number of tenants and models scales. The risk of cross-tenant data leakage via a misconfigured or stolen key is unacceptably high.

A superior architectural choice is a federated identity model using OpenID Connect (OIDC). In this design, a central Identity Provider (IdP) becomes the single source of truth for authentication. All services—the Chakra UI frontend, the JPA/Hibernate-based backend, and the BentoML model serving API—are configured as OIDC clients or resource servers. They do not manage credentials. Instead, they consume and validate JSON Web Tokens (JWTs) issued by the IdP. This approach centralizes security policy, enables single sign-on (SSO), and uses short-lived, stateless tokens that drastically reduce the attack surface. More importantly, custom claims within the JWT, such as a tenant_id, become the immutable and cryptographically verifiable basis for enforcing data access policies across the entire distributed system.

The trade-off for this enhanced security and coherence is increased implementation complexity and a dependency on the IdP’s availability. However, for any serious multi-tenant application, the security guarantees and operational simplicity of a federated identity model are not a luxury but a foundational requirement. The risk of data leakage in a decoupled, key-based system was deemed a critical failure point, making the OIDC-based architecture the only viable path forward.

Core Architectural Flow

The interaction model is orchestrated around the OIDC JWT. The token acts as a secure passport, carrying the necessary context (user identity, tenant ID, roles) for each service to perform its function without needing to trust any other service directly, only the IdP.

sequenceDiagram
    participant User
    participant ChakraUI as Chakra UI Frontend
    participant IdP as OIDC Provider (e.g., Keycloak)
    participant Backend as Spring Boot Backend (JPA/Hibernate)
    participant BentoML as BentoML Service
    participant Database as Relational DB

    User->>ChakraUI: Initiates Login
    ChakraUI->>IdP: Redirects for Authentication
    IdP-->>User: Presents Login Form
    User->>IdP: Submits Credentials
    IdP-->>ChakraUI: Returns Authorization Code
    ChakraUI->>IdP: Exchanges Code for JWT
    IdP-->>ChakraUI: Responds with JWT (containing tenant_id)

    Note over ChakraUI: Stores JWT securely

    ChakraUI->>Backend: API Request with "Authorization: Bearer "
    Backend->>IdP: Fetches JWKS (once, then caches)
    Backend->>Backend: Validates JWT signature and claims
    Backend->>Backend: Extracts tenant_id from JWT
    Note over Backend: Applies Hibernate Interceptor for Tenant Isolation
    Backend->>Database: Executes SQL with "WHERE tenant_id = ?"
    Database-->>Backend: Returns Tenant-Scoped Data
    Backend-->>ChakraUI: API Response

    ChakraUI->>BentoML: Inference Request with "Authorization: Bearer "
    BentoML->>IdP: Fetches JWKS (once, then caches)
    BentoML->>BentoML: Validates JWT signature and claims
    BentoML->>BentoML: Extracts tenant_id from JWT
    Note over BentoML: Loads model specific to tenant_id
    BentoML->>BentoML: Performs Inference
    BentoML-->>ChakraUI: Inference Result

This sequence ensures that both the data persistence layer and the model inference layer are independently secured and scoped by the same verifiable identity token. The Hibernate Interceptor in the backend provides a transparent, application-wide enforcement of data boundaries, while the BentoML middleware performs an analogous check at the ML gateway.

Backend Implementation: JPA/Hibernate with Tenant-Scoped ACID Transactions

The backend, built with Spring Boot and Spring Security, acts as an OIDC Resource Server. Its primary responsibilities are to manage tenant and model metadata, serve data to the frontend, and ensure all database interactions are strictly isolated by tenant.

First, the Spring Security configuration is established to validate incoming JWTs.

application.yml

spring:
  security:
    oauth2:
      resourceserver:
        jwt:
          # The issuer-uri is used to auto-discover OIDC provider configuration,
          # including the jwks-uri for token signature validation.
          issuer-uri: https://your-oidc-provider.com/realms/your-realm
  jpa:
    hibernate:
      ddl-auto: update
    properties:
      hibernate:
        # We will register our custom interceptor programmatically.
        # This property is shown for awareness of Hibernate's capabilities.
        # session_factory.interceptor: com.example.multitenant.config.TenantInterceptor
        show_sql: true

The core of the data isolation mechanism is a Hibernate Interceptor. Unlike Hibernate Filters, which must be explicitly enabled per session, an interceptor can be configured globally to inspect and modify every single SQL statement before it is sent to the database. This provides a robust, low-level guarantee of tenant separation that developers cannot easily bypass.

TenantContext.java

package com.example.multitenant.config;

// A ThreadLocal is used to hold the tenant identifier for the duration of a single request.
// This is a standard pattern for propagating request-scoped context without passing it
// as a method parameter through every layer of the application.
public final class TenantContext {

    private static final ThreadLocal<String> currentTenant = new ThreadLocal<>();

    public static String getCurrentTenant() {
        return currentTenant.get();
    }

    public static void setCurrentTenant(String tenantId) {
        if (tenantId == null || tenantId.trim().isEmpty()) {
            throw new IllegalArgumentException("Tenant ID cannot be null or empty.");
        }
        currentTenant.set(tenantId);
    }

    public static void clear() {
        currentTenant.remove();
    }
}

JwtTenantFilter.java
This servlet filter intercepts every incoming request, validates the JWT, extracts the tenant_id claim, and populates the TenantContext.

package com.example.multitenant.config;

import jakarta.servlet.FilterChain;
import jakarta.servlet.ServletException;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.security.core.context.SecurityContextHolder;
import org.springframework.security.oauth2.jwt.Jwt;
import org.springframework.stereotype.Component;
import org.springframework.web.filter.OncePerRequestFilter;

import java.io.IOException;

@Component
public class JwtTenantFilter extends OncePerRequestFilter {

    private static final Logger log = LoggerFactory.getLogger(JwtTenantFilter.class);
    private static final String TENANT_ID_CLAIM = "tenant_id"; // Custom claim in your JWT

    @Override
    protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain filterChain)
            throws ServletException, IOException {
        try {
            var authentication = SecurityContextHolder.getContext().getAuthentication();
            if (authentication != null && authentication.isAuthenticated() && authentication.getPrincipal() instanceof Jwt jwt) {
                String tenantId = jwt.getClaimAsString(TENANT_ID_CLAIM);

                if (tenantId != null) {
                    log.debug("Setting TenantContext for tenant: {}", tenantId);
                    TenantContext.setCurrentTenant(tenantId);
                } else {
                    // In a real project, this should probably result in an error response.
                    // Access without a tenant ID claim is an invalid state for a multi-tenant system.
                    log.warn("Authenticated request is missing '{}' claim.", TENANT_ID_CLAIM);
                }
            }
            filterChain.doFilter(request, response);
        } finally {
            // Crucial: Always clear the ThreadLocal after the request is complete
            // to prevent memory leaks and cross-request contamination in thread pools.
            log.debug("Clearing TenantContext.");
            TenantContext.clear();
        }
    }
}

TenantInterceptor.java
This is the Hibernate Interceptor that transparently adds the WHERE clause.

package com.example.multitenant.config;

import org.hibernate.EmptyInterceptor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;

// The interceptor modifies the SQL just before it's prepared.
// This is a powerful, low-level way to enforce data boundaries.
@Component
public class TenantInterceptor extends EmptyInterceptor {

    private static final Logger log = LoggerFactory.getLogger(TenantInterceptor.class);

    @Override
    public String onPrepareStatement(String sql) {
        String tenantId = TenantContext.getCurrentTenant();
        if (tenantId == null) {
            // If no tenant is set, we proceed without modification. This might be
            // necessary for system-level operations or public tables. A real-world
            // implementation would have more sophisticated logic here.
            return super.onPrepareStatement(sql);
        }

        // A naive implementation might just append "WHERE tenant_id = ...".
        // A more robust implementation must correctly handle existing WHERE clauses,
        // subqueries, and different SQL statement types (SELECT, UPDATE, DELETE).
        // This example uses a simplified but effective string manipulation approach.
        if (isSelectUpdateOrDelete(sql) && !sql.contains("tenant_id =")) {
            if (sql.contains(" where ")) {
                sql = sql.replace(" where ", " where tenant_id = '" + tenantId + "' and ");
            } else if (sql.contains(" group by ")) {
                sql = sql.replace(" group by ", " where tenant_id = '" + tenantId + "' group by ");
            } else if (sql.contains(" order by ")) {
                sql = sql.replace(" order by ", " where tenant_id = '" + tenantId + "' order by ");
            } else {
                // For statements without a WHERE, GROUP BY, or ORDER BY clause
                sql += " where tenant_id = '" + tenantId + "'";
            }
            log.trace("Modified SQL: {}", sql);
        }
        
        return super.onPrepareStatement(sql);
    }

    private boolean isSelectUpdateOrDelete(String sql) {
        String lowerCaseSql = sql.trim().toLowerCase();
        return lowerCaseSql.startsWith("select") || lowerCaseSql.startsWith("update") || lowerCaseSql.startsWith("delete");
    }
}

Finally, the interceptor and filter need to be registered.

HibernateConfig.java and SecurityConfig.java

// In a @Configuration class for Hibernate
@Configuration
public class HibernateConfig {

    @Autowired
    private TenantInterceptor tenantInterceptor;

    @Bean
    public JpaPropertiesCustomizer jpaPropertiesCustomizer() {
        return properties -> properties.put("hibernate.session_factory.interceptor", tenantInterceptor);
    }
}

// In your Spring Security @Configuration class
@Configuration
@EnableWebSecurity
public class SecurityConfig {
    @Autowired
    private JwtTenantFilter jwtTenantFilter;

    @Bean
    public SecurityFilterChain securityFilterChain(HttpSecurity http) throws Exception {
        http
            .authorizeHttpRequests(authorize -> authorize.anyRequest().authenticated())
            .oauth2ResourceServer(oauth2 -> oauth2.jwt(Customizer.withDefaults()))
            // Add our custom filter after the authentication filter to ensure
            // the SecurityContext is populated before we try to read it.
            .addFilterAfter(jwtTenantFilter, BearerTokenAuthenticationFilter.class);
        return http.build();
    }
}

With this in place, any entity that has a tenant_id column will be automatically filtered. A service method annotated with @Transactional will now execute within a transaction that is guaranteed to be ACID-compliant and tenant-scoped.

@Entity
public class ModelMetadata {
    @Id @GeneratedValue
    private Long id;
    private String modelName;
    private String version;
    private String tenantId; // This column is crucial
    // getters and setters
}

@Service
public class ModelService {
    @Autowired
    private ModelRepository repository;

    @Transactional // Ensures ACID properties for this operation
    public void promoteModel(String modelName, String newVersion) {
        // The interceptor ensures this query will implicitly become:
        // "SELECT ... FROM model_metadata WHERE model_name = ? AND tenant_id = 'current_tenant'"
        ModelMetadata currentModel = repository.findByModelName(modelName);
        
        // Any modification is also scoped and part of the same transaction.
        currentModel.setVersion(newVersion);
        repository.save(currentModel);
    }
}

ML Service Implementation: BentoML with JWT Middleware

The BentoML service must perform the same JWT validation as the backend. This prevents unauthorized access to the inference endpoints and ensures it loads the correct model for the requesting tenant. This is achieved by implementing custom BentoML Middleware.

service.py

import bentoml
import jwt
import requests
import logging
from functools import lru_cache
from bentoml.io import NumpyNdarray
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import JSONResponse

# --- Configuration ---
# In a real application, these should come from environment variables or a config file.
OIDC_ISSUER_URI = "https://your-oidc-provider.com/realms/your-realm"
TENANT_ID_CLAIM = "tenant_id"
JWKS_URI = f"{OIDC_ISSUER_URI}/protocol/openid-connect/certs"

# --- Logging Setup ---
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# --- JWT Validation Logic ---
# Caching the JWKS response is critical for performance. Without it, every
# request would trigger a network call to the IdP.
@lru_cache(maxsize=1)
def get_jwks():
    """Fetches and caches the JSON Web Key Set from the OIDC provider."""
    try:
        logger.info(f"Fetching JWKS from {JWKS_URI}")
        response = requests.get(JWKS_URI, timeout=5)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        logger.error(f"Failed to fetch JWKS: {e}")
        return None

def validate_jwt(token: str):
    """
    Validates the JWT signature and claims against the OIDC provider's public keys.
    """
    jwks = get_jwks()
    if not jwks:
        return None, "JWKS not available"

    try:
        unverified_header = jwt.get_unverified_header(token)
        rsa_key = {}
        for key in jwks["keys"]:
            if key["kid"] == unverified_header["kid"]:
                rsa_key = {
                    "kty": key["kty"],
                    "kid": key["kid"],
                    "use": key["use"],
                    "n": key["n"],
                    "e": key["e"]
                }
        if rsa_key:
            decoded_token = jwt.decode(
                token,
                rsa_key,
                algorithms=["RS256"],
                # Issuer and audience validation are crucial for security
                issuer=OIDC_ISSUER_URI,
                options={"verify_aud": False} # Set your audience or disable verification
            )
            return decoded_token, None
        return None, "Unable to find corresponding KID"
    except jwt.ExpiredSignatureError:
        logger.warning("Token has expired.")
        return None, "Token has expired"
    except jwt.PyJWTError as e:
        logger.error(f"JWT validation error: {e}")
        return None, f"Invalid token: {e}"

# --- BentoML Middleware for Authentication ---
class OIDCMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        auth_header = request.headers.get("Authorization")
        if not auth_header or not auth_header.startswith("Bearer "):
            return JSONResponse(status_code=401, content={"error": "Authorization header missing or invalid"})
        
        token = auth_header.split(" ")[1]
        decoded_token, error = validate_jwt(token)
        
        if error:
            return JSONResponse(status_code=401, content={"error": error})
        
        tenant_id = decoded_token.get(TENANT_ID_CLAIM)
        if not tenant_id:
            return JSONResponse(status_code=403, content={"error": "Tenant ID claim missing from token"})
        
        # Inject the tenant_id into the request state so the endpoint can access it.
        request.state.tenant_id = tenant_id
        
        response = await call_next(request)
        return response

# --- BentoML Service Definition ---
# This assumes you have models tagged like 'tenant-a-model:latest', 'tenant-b-model:latest'
# in your local BentoML model store.
model_runner = bentoml.sklearn.get("generic_iris_model:latest").to_runner()

svc = bentoml.Service("multi_tenant_classifier", runners=[model_runner])
svc.add_middleware(OIDCMiddleware)

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
async def classify(input_data: NumpyNdarray, ctx: bentoml.Context) -> NumpyNdarray:
    # Access the tenant_id injected by the middleware.
    tenant_id = ctx.request.state.tenant_id
    logger.info(f"Processing inference request for tenant: {tenant_id}")
    
    # In a real system, you would use the tenant_id to dynamically select the correct model.
    # For example:
    # runner = bentoml.sklearn.get(f"{tenant_id}_model:latest").to_runner()
    # await runner.async_run(input_data)
    # This example uses a single runner for simplicity but demonstrates the principle.
    
    result = await model_runner.async_run(input_data)
    return result

The bentofile.yaml would simply reference this service definition. This architecture ensures the ML service is a self-contained, secure component that adheres to the same security standards as the rest of the platform.

Frontend Integration: Chakra UI with OIDC Authentication

The frontend, built with React and Chakra UI, uses a library like oidc-client-ts to manage the OIDC authentication flow. Its role is to acquire the JWT and attach it to all subsequent API calls.

authService.ts

import { UserManager, User } from 'oidc-client-ts';

const userManager = new UserManager({
  authority: 'https://your-oidc-provider.com/realms/your-realm',
  client_id: 'frontend-client',
  redirect_uri: window.location.origin + '/callback',
  response_type: 'code',
  scope: 'openid profile email',
});

export const login = () => {
  return userManager.signinRedirect();
};

export const getUser = (): Promise<User | null> => {
  return userManager.getUser();
};

// This function is the key integration point. It retrieves the token
// and must be used to configure the API client.
export const getAccessToken = async (): Promise<string | null> => {
  const user = await getUser();
  if (user && !user.expired) {
    return user.access_token;
  }
  return null;
};

apiClient.ts (using Axios)

import axios from 'axios';
import { getAccessToken } from './authService';

const apiClient = axios.create({
  baseURL: '/api', // Backend API
});

const mlApiClient = axios.create({
  baseURL: '/ml-api', // BentoML API (routed via a gateway)
});

// Use an interceptor to dynamically attach the token to every request.
const setupInterceptor = (client) => {
  client.interceptors.request.use(
    async (config) => {
      const token = await getAccessToken();
      if (token) {
        config.headers.Authorization = `Bearer ${token}`;
      }
      return config;
    },
    (error) => {
      return Promise.reject(error);
    }
  );
};

setupInterceptor(apiClient);
setupInterceptor(mlApiClient);

export { apiClient, mlApiClient };

A sample React component using Chakra UI would then use these clients to fetch data or request inference.

InferenceComponent.tsx

import React, { useState, useEffect } from 'react';
import { Box, Button, Input, Text, useToast } from '@chakra-ui/react';
import { mlApiClient } from './apiClient';

export const InferenceComponent = () => {
  const [input, setInput] = useState('');
  const [result, setResult] = useState('');
  const toast = useToast();

  const handleInference = async () => {
    try {
      // Input processing to convert string to numpy array format
      const data = JSON.parse(input); 
      const response = await mlApiClient.post('/classify', data);
      setResult(JSON.stringify(response.data));
    } catch (error) {
      toast({
        title: 'Inference Error',
        description: error.response?.data?.error || 'An unexpected error occurred.',
        status: 'error',
        duration: 5000,
        isClosable: true,
      });
    }
  };

  return (
    <Box p={4} borderWidth="1px" borderRadius="lg">
      <Text mb={2}>Enter data for inference (e.g., [[5.1, 3.5, 1.4, 0.2]]):</Text>
      <Input
        placeholder="Numpy array-like data"
        value={input}
        onChange={(e) => setInput(e.target.value)}
        mb={4}
      />
      <Button colorScheme="blue" onClick={handleInference}>
        Run Inference
      </Button>
      {result && <Text mt={4}>Result: {result}</Text>}
    </Box>
  );
};

This completes the end-to-end flow, creating a cohesive and secure system where identity and authorization are managed centrally and enforced consistently across all architectural components.

Architectural Limitations and Future Considerations

This architecture, while robust, introduces a hard dependency on the OIDC provider. An outage at the IdP will impact the ability of users to acquire new tokens, effectively disabling new sessions across the entire platform. Services can continue to operate with existing valid tokens until they expire, but the system’s resilience is now tied to the IdP’s availability. Mitigating this requires a highly available IdP deployment.

The Hibernate Interceptor is a powerful tool for enforcing data isolation, but its string-manipulation approach to modifying SQL can be brittle. It may fail on complex, non-standard SQL queries or stored procedure calls. An alternative is to use Hibernate’s @Filter annotation, which is more integrated with the Hibernate metamodel but requires explicitly enabling the filter for each session, creating a potential point of failure if a developer forgets to do so. A defense-in-depth strategy could involve using both the interceptor as a fail-safe and filters for explicit, query-level clarity.

For future iterations, the authorization logic within the BentoML service could be externalized to a dedicated policy engine like Open Policy Agent (OPA). The BentoML middleware would then simply pass the JWT and request context to OPA for an authorization decision. This decouples policy from the service code, allowing security rules to be updated and managed independently of application deployments, which is a significant advantage in a complex, rapidly evolving environment.

OpenID Connect (OIDC) JPA/Hibernate BentoML Chakra UI ACID Multi-tenancy

Implementing a Resilient Time Series Aggregation System Using Dask with an etcd-based Control Plane

2023-10-27 Data Engineering

etcd Dask Time Series Distributed Systems Python Fault Tolerance

Implementing a Low-Latency Signal Processing Pipeline Using SwiftUI for Real-Time Data and a Ray Cluster for Distributed SciPy Computation

2023-10-27 Distributed Systems

SciPy Ray SwiftUI gRPC Signal Processing