The core challenge is building a unified model serving infrastructure for a multi-tenant SaaS platform. Each tenant possesses their own fine-tuned models and proprietary data. The architecture must guarantee absolute data isolation and access control at every layer—from the user interface down to the machine learning inference endpoint. Concurrently, it must maintain strict ACID compliance for all metadata operations, such as model promotions, versioning, or tenant onboarding, which are non-negotiable transactional boundaries in a production system.
A naive approach would be to treat each service as a distinct security domain. For instance, the ML serving layer could use static API keys, managed by a primary backend application. This is a common pattern but introduces significant operational friction and security vulnerabilities. A compromised API key provides long-lived access, revocation is a manual and error-prone process, and there is no unified mechanism for managing user sessions or permissions across the frontend, backend, and ML services. This fractures the security posture and creates an untenable management burden as the number of tenants and models scales. The risk of cross-tenant data leakage via a misconfigured or stolen key is unacceptably high.
A superior architectural choice is a federated identity model using OpenID Connect (OIDC). In this design, a central Identity Provider (IdP) becomes the single source of truth for authentication. All services—the Chakra UI frontend, the JPA/Hibernate-based backend, and the BentoML model serving API—are configured as OIDC clients or resource servers. They do not manage credentials. Instead, they consume and validate JSON Web Tokens (JWTs) issued by the IdP. This approach centralizes security policy, enables single sign-on (SSO), and uses short-lived, stateless tokens that drastically reduce the attack surface. More importantly, custom claims within the JWT, such as a tenant_id
, become the immutable and cryptographically verifiable basis for enforcing data access policies across the entire distributed system.
The trade-off for this enhanced security and coherence is increased implementation complexity and a dependency on the IdP’s availability. However, for any serious multi-tenant application, the security guarantees and operational simplicity of a federated identity model are not a luxury but a foundational requirement. The risk of data leakage in a decoupled, key-based system was deemed a critical failure point, making the OIDC-based architecture the only viable path forward.
Core Architectural Flow
The interaction model is orchestrated around the OIDC JWT. The token acts as a secure passport, carrying the necessary context (user identity, tenant ID, roles) for each service to perform its function without needing to trust any other service directly, only the IdP.
sequenceDiagram participant User participant ChakraUI as Chakra UI Frontend participant IdP as OIDC Provider (e.g., Keycloak) participant Backend as Spring Boot Backend (JPA/Hibernate) participant BentoML as BentoML Service participant Database as Relational DB User->>ChakraUI: Initiates Login ChakraUI->>IdP: Redirects for Authentication IdP-->>User: Presents Login Form User->>IdP: Submits Credentials IdP-->>ChakraUI: Returns Authorization Code ChakraUI->>IdP: Exchanges Code for JWT IdP-->>ChakraUI: Responds with JWT (containing tenant_id) Note over ChakraUI: Stores JWT securely ChakraUI->>Backend: API Request with "Authorization: Bearer" Backend->>IdP: Fetches JWKS (once, then caches) Backend->>Backend: Validates JWT signature and claims Backend->>Backend: Extracts tenant_id from JWT Note over Backend: Applies Hibernate Interceptor for Tenant Isolation Backend->>Database: Executes SQL with "WHERE tenant_id = ?" Database-->>Backend: Returns Tenant-Scoped Data Backend-->>ChakraUI: API Response ChakraUI->>BentoML: Inference Request with "Authorization: Bearer " BentoML->>IdP: Fetches JWKS (once, then caches) BentoML->>BentoML: Validates JWT signature and claims BentoML->>BentoML: Extracts tenant_id from JWT Note over BentoML: Loads model specific to tenant_id BentoML->>BentoML: Performs Inference BentoML-->>ChakraUI: Inference Result
This sequence ensures that both the data persistence layer and the model inference layer are independently secured and scoped by the same verifiable identity token. The Hibernate Interceptor in the backend provides a transparent, application-wide enforcement of data boundaries, while the BentoML middleware performs an analogous check at the ML gateway.
Backend Implementation: JPA/Hibernate with Tenant-Scoped ACID Transactions
The backend, built with Spring Boot and Spring Security, acts as an OIDC Resource Server. Its primary responsibilities are to manage tenant and model metadata, serve data to the frontend, and ensure all database interactions are strictly isolated by tenant.
First, the Spring Security configuration is established to validate incoming JWTs.
application.yml
spring:
security:
oauth2:
resourceserver:
jwt:
# The issuer-uri is used to auto-discover OIDC provider configuration,
# including the jwks-uri for token signature validation.
issuer-uri: https://your-oidc-provider.com/realms/your-realm
jpa:
hibernate:
ddl-auto: update
properties:
hibernate:
# We will register our custom interceptor programmatically.
# This property is shown for awareness of Hibernate's capabilities.
# session_factory.interceptor: com.example.multitenant.config.TenantInterceptor
show_sql: true
The core of the data isolation mechanism is a Hibernate Interceptor
. Unlike Hibernate Filters, which must be explicitly enabled per session, an interceptor can be configured globally to inspect and modify every single SQL statement before it is sent to the database. This provides a robust, low-level guarantee of tenant separation that developers cannot easily bypass.
TenantContext.java
package com.example.multitenant.config;
// A ThreadLocal is used to hold the tenant identifier for the duration of a single request.
// This is a standard pattern for propagating request-scoped context without passing it
// as a method parameter through every layer of the application.
public final class TenantContext {
private static final ThreadLocal<String> currentTenant = new ThreadLocal<>();
public static String getCurrentTenant() {
return currentTenant.get();
}
public static void setCurrentTenant(String tenantId) {
if (tenantId == null || tenantId.trim().isEmpty()) {
throw new IllegalArgumentException("Tenant ID cannot be null or empty.");
}
currentTenant.set(tenantId);
}
public static void clear() {
currentTenant.remove();
}
}
JwtTenantFilter.java
This servlet filter intercepts every incoming request, validates the JWT, extracts the tenant_id
claim, and populates the TenantContext
.
package com.example.multitenant.config;
import jakarta.servlet.FilterChain;
import jakarta.servlet.ServletException;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.security.core.context.SecurityContextHolder;
import org.springframework.security.oauth2.jwt.Jwt;
import org.springframework.stereotype.Component;
import org.springframework.web.filter.OncePerRequestFilter;
import java.io.IOException;
@Component
public class JwtTenantFilter extends OncePerRequestFilter {
private static final Logger log = LoggerFactory.getLogger(JwtTenantFilter.class);
private static final String TENANT_ID_CLAIM = "tenant_id"; // Custom claim in your JWT
@Override
protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain filterChain)
throws ServletException, IOException {
try {
var authentication = SecurityContextHolder.getContext().getAuthentication();
if (authentication != null && authentication.isAuthenticated() && authentication.getPrincipal() instanceof Jwt jwt) {
String tenantId = jwt.getClaimAsString(TENANT_ID_CLAIM);
if (tenantId != null) {
log.debug("Setting TenantContext for tenant: {}", tenantId);
TenantContext.setCurrentTenant(tenantId);
} else {
// In a real project, this should probably result in an error response.
// Access without a tenant ID claim is an invalid state for a multi-tenant system.
log.warn("Authenticated request is missing '{}' claim.", TENANT_ID_CLAIM);
}
}
filterChain.doFilter(request, response);
} finally {
// Crucial: Always clear the ThreadLocal after the request is complete
// to prevent memory leaks and cross-request contamination in thread pools.
log.debug("Clearing TenantContext.");
TenantContext.clear();
}
}
}
TenantInterceptor.java
This is the Hibernate Interceptor that transparently adds the WHERE
clause.
package com.example.multitenant.config;
import org.hibernate.EmptyInterceptor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;
// The interceptor modifies the SQL just before it's prepared.
// This is a powerful, low-level way to enforce data boundaries.
@Component
public class TenantInterceptor extends EmptyInterceptor {
private static final Logger log = LoggerFactory.getLogger(TenantInterceptor.class);
@Override
public String onPrepareStatement(String sql) {
String tenantId = TenantContext.getCurrentTenant();
if (tenantId == null) {
// If no tenant is set, we proceed without modification. This might be
// necessary for system-level operations or public tables. A real-world
// implementation would have more sophisticated logic here.
return super.onPrepareStatement(sql);
}
// A naive implementation might just append "WHERE tenant_id = ...".
// A more robust implementation must correctly handle existing WHERE clauses,
// subqueries, and different SQL statement types (SELECT, UPDATE, DELETE).
// This example uses a simplified but effective string manipulation approach.
if (isSelectUpdateOrDelete(sql) && !sql.contains("tenant_id =")) {
if (sql.contains(" where ")) {
sql = sql.replace(" where ", " where tenant_id = '" + tenantId + "' and ");
} else if (sql.contains(" group by ")) {
sql = sql.replace(" group by ", " where tenant_id = '" + tenantId + "' group by ");
} else if (sql.contains(" order by ")) {
sql = sql.replace(" order by ", " where tenant_id = '" + tenantId + "' order by ");
} else {
// For statements without a WHERE, GROUP BY, or ORDER BY clause
sql += " where tenant_id = '" + tenantId + "'";
}
log.trace("Modified SQL: {}", sql);
}
return super.onPrepareStatement(sql);
}
private boolean isSelectUpdateOrDelete(String sql) {
String lowerCaseSql = sql.trim().toLowerCase();
return lowerCaseSql.startsWith("select") || lowerCaseSql.startsWith("update") || lowerCaseSql.startsWith("delete");
}
}
Finally, the interceptor and filter need to be registered.
HibernateConfig.java
and SecurityConfig.java
// In a @Configuration class for Hibernate
@Configuration
public class HibernateConfig {
@Autowired
private TenantInterceptor tenantInterceptor;
@Bean
public JpaPropertiesCustomizer jpaPropertiesCustomizer() {
return properties -> properties.put("hibernate.session_factory.interceptor", tenantInterceptor);
}
}
// In your Spring Security @Configuration class
@Configuration
@EnableWebSecurity
public class SecurityConfig {
@Autowired
private JwtTenantFilter jwtTenantFilter;
@Bean
public SecurityFilterChain securityFilterChain(HttpSecurity http) throws Exception {
http
.authorizeHttpRequests(authorize -> authorize.anyRequest().authenticated())
.oauth2ResourceServer(oauth2 -> oauth2.jwt(Customizer.withDefaults()))
// Add our custom filter after the authentication filter to ensure
// the SecurityContext is populated before we try to read it.
.addFilterAfter(jwtTenantFilter, BearerTokenAuthenticationFilter.class);
return http.build();
}
}
With this in place, any entity that has a tenant_id
column will be automatically filtered. A service method annotated with @Transactional
will now execute within a transaction that is guaranteed to be ACID-compliant and tenant-scoped.
@Entity
public class ModelMetadata {
@Id @GeneratedValue
private Long id;
private String modelName;
private String version;
private String tenantId; // This column is crucial
// getters and setters
}
@Service
public class ModelService {
@Autowired
private ModelRepository repository;
@Transactional // Ensures ACID properties for this operation
public void promoteModel(String modelName, String newVersion) {
// The interceptor ensures this query will implicitly become:
// "SELECT ... FROM model_metadata WHERE model_name = ? AND tenant_id = 'current_tenant'"
ModelMetadata currentModel = repository.findByModelName(modelName);
// Any modification is also scoped and part of the same transaction.
currentModel.setVersion(newVersion);
repository.save(currentModel);
}
}
ML Service Implementation: BentoML with JWT Middleware
The BentoML service must perform the same JWT validation as the backend. This prevents unauthorized access to the inference endpoints and ensures it loads the correct model for the requesting tenant. This is achieved by implementing custom BentoML Middleware.
service.py
import bentoml
import jwt
import requests
import logging
from functools import lru_cache
from bentoml.io import NumpyNdarray
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import JSONResponse
# --- Configuration ---
# In a real application, these should come from environment variables or a config file.
OIDC_ISSUER_URI = "https://your-oidc-provider.com/realms/your-realm"
TENANT_ID_CLAIM = "tenant_id"
JWKS_URI = f"{OIDC_ISSUER_URI}/protocol/openid-connect/certs"
# --- Logging Setup ---
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# --- JWT Validation Logic ---
# Caching the JWKS response is critical for performance. Without it, every
# request would trigger a network call to the IdP.
@lru_cache(maxsize=1)
def get_jwks():
"""Fetches and caches the JSON Web Key Set from the OIDC provider."""
try:
logger.info(f"Fetching JWKS from {JWKS_URI}")
response = requests.get(JWKS_URI, timeout=5)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
logger.error(f"Failed to fetch JWKS: {e}")
return None
def validate_jwt(token: str):
"""
Validates the JWT signature and claims against the OIDC provider's public keys.
"""
jwks = get_jwks()
if not jwks:
return None, "JWKS not available"
try:
unverified_header = jwt.get_unverified_header(token)
rsa_key = {}
for key in jwks["keys"]:
if key["kid"] == unverified_header["kid"]:
rsa_key = {
"kty": key["kty"],
"kid": key["kid"],
"use": key["use"],
"n": key["n"],
"e": key["e"]
}
if rsa_key:
decoded_token = jwt.decode(
token,
rsa_key,
algorithms=["RS256"],
# Issuer and audience validation are crucial for security
issuer=OIDC_ISSUER_URI,
options={"verify_aud": False} # Set your audience or disable verification
)
return decoded_token, None
return None, "Unable to find corresponding KID"
except jwt.ExpiredSignatureError:
logger.warning("Token has expired.")
return None, "Token has expired"
except jwt.PyJWTError as e:
logger.error(f"JWT validation error: {e}")
return None, f"Invalid token: {e}"
# --- BentoML Middleware for Authentication ---
class OIDCMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request, call_next):
auth_header = request.headers.get("Authorization")
if not auth_header or not auth_header.startswith("Bearer "):
return JSONResponse(status_code=401, content={"error": "Authorization header missing or invalid"})
token = auth_header.split(" ")[1]
decoded_token, error = validate_jwt(token)
if error:
return JSONResponse(status_code=401, content={"error": error})
tenant_id = decoded_token.get(TENANT_ID_CLAIM)
if not tenant_id:
return JSONResponse(status_code=403, content={"error": "Tenant ID claim missing from token"})
# Inject the tenant_id into the request state so the endpoint can access it.
request.state.tenant_id = tenant_id
response = await call_next(request)
return response
# --- BentoML Service Definition ---
# This assumes you have models tagged like 'tenant-a-model:latest', 'tenant-b-model:latest'
# in your local BentoML model store.
model_runner = bentoml.sklearn.get("generic_iris_model:latest").to_runner()
svc = bentoml.Service("multi_tenant_classifier", runners=[model_runner])
svc.add_middleware(OIDCMiddleware)
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
async def classify(input_data: NumpyNdarray, ctx: bentoml.Context) -> NumpyNdarray:
# Access the tenant_id injected by the middleware.
tenant_id = ctx.request.state.tenant_id
logger.info(f"Processing inference request for tenant: {tenant_id}")
# In a real system, you would use the tenant_id to dynamically select the correct model.
# For example:
# runner = bentoml.sklearn.get(f"{tenant_id}_model:latest").to_runner()
# await runner.async_run(input_data)
# This example uses a single runner for simplicity but demonstrates the principle.
result = await model_runner.async_run(input_data)
return result
The bentofile.yaml
would simply reference this service definition. This architecture ensures the ML service is a self-contained, secure component that adheres to the same security standards as the rest of the platform.
Frontend Integration: Chakra UI with OIDC Authentication
The frontend, built with React and Chakra UI, uses a library like oidc-client-ts
to manage the OIDC authentication flow. Its role is to acquire the JWT and attach it to all subsequent API calls.
authService.ts
import { UserManager, User } from 'oidc-client-ts';
const userManager = new UserManager({
authority: 'https://your-oidc-provider.com/realms/your-realm',
client_id: 'frontend-client',
redirect_uri: window.location.origin + '/callback',
response_type: 'code',
scope: 'openid profile email',
});
export const login = () => {
return userManager.signinRedirect();
};
export const getUser = (): Promise<User | null> => {
return userManager.getUser();
};
// This function is the key integration point. It retrieves the token
// and must be used to configure the API client.
export const getAccessToken = async (): Promise<string | null> => {
const user = await getUser();
if (user && !user.expired) {
return user.access_token;
}
return null;
};
apiClient.ts
(using Axios)
import axios from 'axios';
import { getAccessToken } from './authService';
const apiClient = axios.create({
baseURL: '/api', // Backend API
});
const mlApiClient = axios.create({
baseURL: '/ml-api', // BentoML API (routed via a gateway)
});
// Use an interceptor to dynamically attach the token to every request.
const setupInterceptor = (client) => {
client.interceptors.request.use(
async (config) => {
const token = await getAccessToken();
if (token) {
config.headers.Authorization = `Bearer ${token}`;
}
return config;
},
(error) => {
return Promise.reject(error);
}
);
};
setupInterceptor(apiClient);
setupInterceptor(mlApiClient);
export { apiClient, mlApiClient };
A sample React component using Chakra UI would then use these clients to fetch data or request inference.
InferenceComponent.tsx
import React, { useState, useEffect } from 'react';
import { Box, Button, Input, Text, useToast } from '@chakra-ui/react';
import { mlApiClient } from './apiClient';
export const InferenceComponent = () => {
const [input, setInput] = useState('');
const [result, setResult] = useState('');
const toast = useToast();
const handleInference = async () => {
try {
// Input processing to convert string to numpy array format
const data = JSON.parse(input);
const response = await mlApiClient.post('/classify', data);
setResult(JSON.stringify(response.data));
} catch (error) {
toast({
title: 'Inference Error',
description: error.response?.data?.error || 'An unexpected error occurred.',
status: 'error',
duration: 5000,
isClosable: true,
});
}
};
return (
<Box p={4} borderWidth="1px" borderRadius="lg">
<Text mb={2}>Enter data for inference (e.g., [[5.1, 3.5, 1.4, 0.2]]):</Text>
<Input
placeholder="Numpy array-like data"
value={input}
onChange={(e) => setInput(e.target.value)}
mb={4}
/>
<Button colorScheme="blue" onClick={handleInference}>
Run Inference
</Button>
{result && <Text mt={4}>Result: {result}</Text>}
</Box>
);
};
This completes the end-to-end flow, creating a cohesive and secure system where identity and authorization are managed centrally and enforced consistently across all architectural components.
Architectural Limitations and Future Considerations
This architecture, while robust, introduces a hard dependency on the OIDC provider. An outage at the IdP will impact the ability of users to acquire new tokens, effectively disabling new sessions across the entire platform. Services can continue to operate with existing valid tokens until they expire, but the system’s resilience is now tied to the IdP’s availability. Mitigating this requires a highly available IdP deployment.
The Hibernate Interceptor is a powerful tool for enforcing data isolation, but its string-manipulation approach to modifying SQL can be brittle. It may fail on complex, non-standard SQL queries or stored procedure calls. An alternative is to use Hibernate’s @Filter
annotation, which is more integrated with the Hibernate metamodel but requires explicitly enabling the filter for each session, creating a potential point of failure if a developer forgets to do so. A defense-in-depth strategy could involve using both the interceptor as a fail-safe and filters for explicit, query-level clarity.
For future iterations, the authorization logic within the BentoML service could be externalized to a dedicated policy engine like Open Policy Agent (OPA). The BentoML middleware would then simply pass the JWT and request context to OPA for an authorization decision. This decouples policy from the service code, allowing security rules to be updated and managed independently of application deployments, which is a significant advantage in a complex, rapidly evolving environment.