The mandate was clear: build a new internal administrative service that couldn’t trust the network and couldn’t rely on manually provisioned credentials. Every internal tool we had deployed previously suffered from the same original sin—a collection of API keys, database passwords, and TLS certificates checked into environment files, managed via Docker secrets populated by a CI runner, or worse, left in a project’s docker-compose.yml
commented as “for dev only.” This approach was fragile, insecure, and an operational nightmare. For this project, the service had to bootstrap its own identity and secrets, and user access had to be gated by our corporate single sign-on.
Our chosen stack was Docker Swarm for orchestration due to its operational simplicity for our scale, Tornado for the asynchronous Python service, HashiCorp Vault as our central secrets broker, and SAML for federated identity. The core challenge wasn’t using any one of these technologies, but orchestrating their interaction to achieve a zero-trust startup sequence. The application container, upon starting, would know nothing. It had to securely prove its identity to Vault, fetch its entire configuration—including the SAML specifics needed to authenticate human users—and only then begin serving traffic.
This is the log of how we built it, the problems we hit, and the patterns we established.
Technical Pain Point: The Empty Container Problem
A freshly scheduled container on a Docker Swarm node is fundamentally untrusted. It has no inherent, verifiable identity that another system, like Vault, can immediately accept. The most common anti-pattern is to bake a Vault token into the Docker image or pass it as an environment variable. This is a non-starter; the token is static, long-lived, and exposed.
Our initial concept was to leverage Vault’s AppRole authentication backend. AppRole is designed for machine-to-machine authentication. An “AppRole” is a set of policies. To authenticate, a client needs a RoleID
(publicly known, like a username) and a SecretID
(a secret credential, like a password). The RoleID
could be baked into the image, but the SecretID
is the critical bootstrap secret. How do we deliver this SecretID
to the container securely and ephemerally?
Kubernetes solves this elegantly with Service Account Tokens automatically mounted into pods, which Vault’s Kubernetes auth method can then validate. Docker Swarm has no direct equivalent. We had to devise a secure introduction mechanism. The solution was to use a trusted orchestrator (our CI/CD pipeline) to request a short-lived, single-use SecretID
from Vault, but with a twist: we’d request a wrapped SecretID
.
A wrapped secret in Vault is encrypted and can only be unwrapped once using a temporary “wrapping token.” This CI/CD process would look like this:
- CI/CD authenticates to Vault with its own high-privilege policy.
- It generates a new
SecretID
for the application’s AppRole. - It asks Vault to wrap this
SecretID
, getting back a short-lived wrapping token. - The CI/CD pipeline injects this wrapping token into a Docker Secret, which is then attached to the Swarm service.
The container starts, reads the wrapping token from the Docker Secret file, unwraps it to get the real SecretID
, and then proceeds with the AppRole login. This wrapping token is useless after its first use, significantly shrinking the attack surface.
Step 1: Configuring Vault and the Swarm Stack
First, we need a foundational docker-compose.yml
to deploy Vault and our Tornado application on a Swarm cluster. For this demonstration, Vault runs in dev
mode, which is not for production but simplifies setup.
# docker-compose.yml
version: '3.8'
services:
vault:
image: hashicorp/vault:1.15
ports:
- "8200:8200"
environment:
- VAULT_DEV_ROOT_TOKEN_ID=root
- VAULT_ADDR=http://127.0.0.1:8200
cap_add:
- IPC_LOCK
command: server -dev -dev-listen-address="0.0.0.0:8200"
admin_app:
image: my-secure-app:latest # We will build this image
build:
context: ./app
environment:
- VAULT_ADDR=http://vault:8200
- VAULT_ROLE_ID=... # This will be set during deployment
secrets:
- vault_wrapping_token
deploy:
replicas: 1
restart_policy:
condition: on-failure
secrets:
vault_wrapping_token:
external: true
The key parts here are the VAULT_ROLE_ID
environment variable and the vault_wrapping_token
Docker secret. The deployer is responsible for creating this secret.
Next, we configure Vault. This is done via its CLI or API after the container starts.
# Wait for vault to be up
# Run these commands from a machine with Vault CLI or exec into the container
export VAULT_ADDR='http://127.0.0.1:8200'
export VAULT_TOKEN='root'
# 1. Enable AppRole auth method
vault auth enable approle
# 2. Enable KV v2 secrets engine
vault secrets enable -path=secret kv-v2
# 3. Create a policy for our app
# This policy allows reading SAML configs and database credentials
vault policy write admin-app-policy - <<EOF
path "secret/data/admin-app/saml" {
capabilities = ["read"]
}
path "secret/data/admin-app/database" {
capabilities = ["read"]
}
EOF
# 4. Create the AppRole
# We bind the policy to this role.
# We set a short TTL for the generated token for security.
vault write auth/approle/role/admin-app \
token_policies="admin-app-policy" \
token_ttl=1h \
token_max_ttl=4h
# 5. Get the RoleID
# This is considered non-sensitive and can be baked into the image or CI vars.
vault read auth/approle/role/admin-app/role-id
# Output:
# Key Value
# --- -----
# role_id <some-role-id> -> This goes into VAULT_ROLE_ID
# 6. Store some secrets for the app to fetch later
vault kv put secret/admin-app/saml \
sp_private_key=@/path/to/sp.key \
sp_public_cert=@/path/to/sp.crt \
idp_metadata_xml=@/path/to/idp-metadata.xml
vault kv put secret/admin-app/database \
username="dbuser" \
password="supersecretpassword"
Now for the deployment-time magic. The CI/CD script generates and injects the wrapped SecretID
.
# This would be in a CI/CD deployment script
# 1. Generate a new SecretID for the role
SECRET_ID_RESPONSE=$(vault write -f auth/approle/role/admin-app/secret-id)
SECRET_ID=$(echo "$SECRET_ID_RESPONSE" | grep 'secret_id ' | awk '{print $2}')
# 2. Wrap the SecretID
# The wrapping token will be valid for 5 minutes
WRAPPED_RESPONSE=$(vault wrap -d="secret_id=${SECRET_ID}" -ttl=5m)
WRAPPING_TOKEN=$(echo "$WRAPPED_RESPONSE" | grep 'wrapping_token' | awk '{print $2}')
# 3. Create the Docker secret before deploying the stack
# The secret name must match what's in docker-compose.yml
echo "$WRAPPING_TOKEN" | docker secret create vault_wrapping_token -
# 4. Deploy the stack
# Inject the RoleID as an environment variable
export VAULT_ROLE_ID=$(vault read -field=role_id auth/approle/role/admin-app/role-id)
docker stack deploy -c docker-compose.yml my_secure_stack
The application now has a one-time-use token to fetch its real credential.
Step 2: The Tornado Application’s Bootstrap Logic
The core of the application is a Python service using Tornado. On startup, it must perform the Vault authentication sequence before it even thinks about listening on a port.
Here is the app/
directory structure:
app/
├── Dockerfile
├── requirements.txt
├── vault_client.py
└── main.py
requirements.txt
:
tornado==6.4
hvac==1.2.1
python3-saml==1.16.0
vault_client.py
contains the logic for the bootstrap sequence.
# app/vault_client.py
import os
import hvac
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
class VaultClient:
"""
A client to handle the secure introduction to HashiCorp Vault.
1. Reads a wrapping token from a Docker secret.
2. Unwraps the token to get a SecretID.
3. Performs an AppRole login to get a client token.
4. Provides a configured HVAC client for further secret retrieval.
"""
def __init__(self, vault_addr: str, role_id: str, wrapping_token_path: str):
self.vault_addr = vault_addr
self.role_id = role_id
self.wrapping_token_path = wrapping_token_path
self.client = None
def initialize(self) -> bool:
"""
Performs the full authentication and initialization sequence.
Returns True on success, False on failure.
"""
try:
wrapping_token = self._read_wrapping_token()
if not wrapping_token:
return False
# Initialize a temporary client just for unwrapping
unwrap_client = hvac.Client(url=self.vault_addr)
unwrap_client.token = wrapping_token
logging.info("Attempting to unwrap SecretID from wrapping token...")
# The unwrap operation is a one-time deal.
unwrap_response = unwrap_client.sys.unwrap()
secret_id = unwrap_response['data']['secret_id']
logging.info("Successfully unwrapped SecretID.")
# Now perform the AppRole login
approle_client = hvac.Client(url=self.vault_addr)
logging.info(f"Performing AppRole login with RoleID: {self.role_id[:8]}...")
login_response = approle_client.auth.approle.login(
role_id=self.role_id,
secret_id=secret_id
)
# We got our client token. This is the goal of the bootstrap.
client_token = login_response['auth']['client_token']
logging.info("AppRole login successful. Client token acquired.")
# Create the final, authenticated client instance
self.client = hvac.Client(url=self.vault_addr, token=client_token)
if not self.client.is_authenticated():
logging.error("Client is not authenticated even after AppRole login.")
return False
logging.info("Vault client is fully initialized and authenticated.")
return True
except hvac.exceptions.InvalidRequest as e:
logging.error(f"HVAC Invalid Request during initialization: {e}. Check Vault policies and roles.")
return False
except hvac.exceptions.Forbidden as e:
logging.error(f"HVAC Forbidden error: {e}. Wrapping token might be expired or already used.")
return False
except Exception as e:
logging.error(f"An unexpected error occurred during Vault initialization: {e}")
return False
def _read_wrapping_token(self) -> str | None:
"""Reads the wrapping token from the file path provided by Docker secrets."""
try:
with open(self.wrapping_token_path, 'r') as f:
token = f.read().strip()
if not token:
logging.error(f"Wrapping token file is empty: {self.wrapping_token_path}")
return None
logging.info(f"Read wrapping token from {self.wrapping_token_path}")
return token
except FileNotFoundError:
logging.error(f"Wrapping token file not found at: {self.wrapping_token_path}")
return None
except IOError as e:
logging.error(f"Could not read wrapping token file: {e}")
return None
def read_secret(self, path: str) -> dict | None:
"""Reads a secret from the KVv2 engine."""
if not self.client or not self.client.is_authenticated():
logging.error("Cannot read secret: Vault client is not initialized or authenticated.")
return None
try:
logging.info(f"Reading secret from path: {path}")
response = self.client.secrets.kv.v2.read_secret_version(path=path)
# The actual data is nested under 'data' -> 'data' for KVv2
return response['data']['data']
except hvac.exceptions.InvalidPath:
logging.error(f"Secret path not found in Vault: {path}")
return None
except Exception as e:
logging.error(f"Failed to read secret from {path}: {e}")
return None
This VaultClient
class encapsulates the entire complex bootstrap process. The main.py
will use it to fetch configuration before starting the web server.
Step 3: Integrating SAML with Dynamically Fetched Configuration
The next challenge was configuring the python3-saml
library. It typically takes a static JSON configuration file. Our configuration, however, lives in Vault. The application must fetch it at runtime and construct the settings dictionary dynamically.
main.py
orchestrates this.
# app/main.py
import os
import tornado.ioloop
import tornado.web
import tornado.httpserver
import logging
from urllib.parse import urlparse
from onelogin.saml2.auth import OneLogin_Saml2_Auth
from onelogin.saml2.utils import OneLogin_Saml2_Utils
from vault_client import VaultClient
# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
# Global state (in a real app, manage this better)
SAML_SETTINGS = None
def build_saml_request(req):
"""Builds a SAML request dictionary for python3-saml from a Tornado request."""
# This is a helper to adapt Tornado's request object to the format expected by the SAML library
http_host = req.headers.get('Host', 'localhost')
server_port = urlparse(req.full_url()).port or (443 if req.protocol == 'https' else 80)
return {
'https': 'on' if req.protocol == 'https' else 'off',
'http_host': http_host,
'script_name': req.path,
'server_port': str(server_port),
'get_data': {k: v[0].decode('utf-8') for k, v in req.query_arguments.items()},
'post_data': {k: v[0].decode('utf-8') for k, v in req.body_arguments.items()},
'query_string': req.query
}
class BaseHandler(tornado.web.RequestHandler):
def get_current_user(self):
return self.get_secure_cookie("user_email")
class MainHandler(BaseHandler):
@tornado.web.authenticated
def get(self):
email = self.get_current_user().decode('utf-8')
db_creds = self.application.settings.get('db_credentials')
self.write(f"Hello, {email}. You are authenticated.<br>")
self.write(f"I have fetched these DB creds from Vault: {db_creds}")
class SamlLoginHandler(BaseHandler):
def get(self):
req = build_saml_request(self.request)
auth = OneLogin_Saml2_Auth(req, SAML_SETTINGS)
# This redirects the user to the IdP for authentication
self.redirect(auth.login())
class AcsHandler(BaseHandler):
"""Assertion Consumer Service (ACS) Handler."""
async def post(self):
req = build_saml_request(self.request)
auth = OneLogin_Saml2_Auth(req, SAML_SETTINGS)
auth.process_response()
errors = auth.get_errors()
if errors:
logging.error(f"SAML ACS Error: {errors}, Reason: {auth.get_last_error_reason()}")
self.set_status(401)
self.write("SAML authentication failed.")
return
if not auth.is_authenticated():
self.set_status(401)
self.write("Not authenticated via SAML.")
return
# SAML authentication is successful.
# 'nameId' is typically the user's email or username.
user_email = auth.get_nameid()
logging.info(f"User authenticated successfully: {user_email}")
# In a real app, you would check attributes, provision a session, etc.
self.set_secure_cookie("user_email", user_email)
self.redirect("/")
class MetadataHandler(BaseHandler):
def get(self):
req = build_saml_request(self.request)
auth = OneLogin_Saml2_Auth(req, SAML_SETTINGS)
settings = auth.get_settings()
metadata = settings.get_sp_metadata()
errors = settings.validate_metadata(metadata)
if len(errors) == 0:
self.set_header('Content-Type', 'text/xml')
self.write(metadata)
else:
self.set_status(500)
self.write(', '.join(errors))
async def make_app(vault_client: VaultClient) -> tornado.web.Application:
"""
Fetches configuration from Vault and constructs the Tornado application.
"""
global SAML_SETTINGS
# 1. Fetch SAML configuration from Vault
saml_secrets = vault_client.read_secret("admin-app/saml")
if not saml_secrets:
raise RuntimeError("Failed to fetch SAML secrets from Vault.")
# A pitfall here is ensuring the secrets are in the exact format the SAML library needs.
# The private key and cert must not have extra whitespace or encoding issues.
sp_private_key = saml_secrets['sp_private_key']
sp_public_cert = saml_secrets['sp_public_cert']
idp_metadata_xml = saml_secrets['idp_metadata_xml']
# 2. Fetch other application secrets, e.g., database credentials
db_credentials = vault_client.read_secret("admin-app/database")
if not db_credentials:
raise RuntimeError("Failed to fetch database credentials from Vault.")
# 3. Dynamically construct the SAML settings dictionary
# In a real-world project, the entityId and ACS URL should also come from config.
SAML_SETTINGS = {
"strict": True,
"debug": True,
"sp": {
"entityId": "http://localhost:8888/saml/metadata/",
"assertionConsumerService": {
"url": "http://localhost:8888/saml/acs/",
"binding": "urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST"
},
"x509cert": sp_public_cert,
"privateKey": sp_private_key,
},
"idp": {
# Instead of a URL, we provide the metadata content directly
"metadata": idp_metadata_xml
}
}
logging.info("SAML settings successfully constructed from Vault secrets.")
return tornado.web.Application([
(r"/", MainHandler),
(r"/saml/login/", SamlLoginHandler),
(r"/saml/acs/", AcsHandler),
(r"/saml/metadata/", MetadataHandler),
],
cookie_secret="a_secret_key_that_should_also_come_from_vault",
login_url="/saml/login/",
# Pass fetched secrets to handlers
db_credentials=db_credentials
)
async def main():
"""Main entry point for the application."""
vault_addr = os.getenv("VAULT_ADDR")
role_id = os.getenv("VAULT_ROLE_ID")
wrapping_token_path = "/run/secrets/vault_wrapping_token"
if not all([vault_addr, role_id]):
logging.critical("VAULT_ADDR and VAULT_ROLE_ID must be set.")
exit(1)
# --- Secure Bootstrap Sequence ---
vault_client = VaultClient(vault_addr, role_id, wrapping_token_path)
if not vault_client.initialize():
logging.critical("Failed to initialize Vault client. Shutting down.")
exit(1)
# --- Application Initialization ---
try:
app = await make_app(vault_client)
http_server = tornado.httpserver.HTTPServer(app)
http_server.listen(8888)
logging.info("Server started and listening on port 8888")
await tornado.ioloop.IOLoop.current().start()
except Exception as e:
logging.critical(f"Failed to create or start the application: {e}")
exit(1)
if __name__ == "__main__":
tornado.ioloop.IOLoop.current().run_sync(main)
Finally, the Dockerfile
to package it all.
# app/Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "main.py"]
The Full Picture: A Zero-Trust Flow
The complete, end-to-end process is now visible. We can represent it with a sequence diagram.
sequenceDiagram participant CI/CD participant Vault participant Docker Swarm participant TornadoApp CI/CD->>Vault: Generate & Wrap SecretID for 'admin-app' Vault-->>CI/CD: Wrapping Token CI/CD->>Docker Swarm: docker secret create vault_wrapping_token (content=token) CI/CD->>Docker Swarm: docker stack deploy (with RoleID) Docker Swarm->>TornadoApp: Start Container (mounts secret at /run/secrets/vault_wrapping_token) TornadoApp->>TornadoApp: Read Wrapping Token from file TornadoApp->>Vault: Unwrap SecretID using Wrapping Token Vault-->>TornadoApp: Plain SecretID TornadoApp->>Vault: AppRole Login(RoleID, SecretID) Vault-->>TornadoApp: Client Token TornadoApp->>Vault: Read secret/admin-app/saml (using Client Token) Vault-->>TornadoApp: SAML Config (SP Key, IdP Meta) TornadoApp->>Vault: Read secret/admin-app/database Vault-->>TornadoApp: DB Credentials TornadoApp->>TornadoApp: Configure SAML library & DB connections TornadoApp->>TornadoApp: Start Tornado HTTP Server on Port 8888 participant User User->>TornadoApp: GET / TornadoApp-->>User: Redirect to /saml/login/ User->>TornadoApp: GET /saml/login/ TornadoApp-->>User: Redirect to IdP for authentication participant IdP User->>IdP: Authenticates (user/pass/MFA) IdP-->>User: POST SAML Assertion to /saml/acs/ User->>TornadoApp: POST /saml/acs/ (with assertion) TornadoApp->>TornadoApp: Validate SAML Assertion TornadoApp-->>User: Set session cookie & Redirect to / User->>TornadoApp: GET / (now with valid session) TornadoApp-->>User: 200 OK (Authenticated Content)
This architecture successfully decouples the application from its secrets. The Docker image contains no credentials. The runtime configuration contains only a non-sensitive RoleID
and a path to a one-time-use token. The service dynamically pulls everything it needs to operate from a central, audited, and secure source. This is a significant step up from traditional configuration management.
The primary limitation of this specific implementation is its reliance on the CI/CD system as the “trusted introducer.” The security of the entire bootstrap process hinges on the security of the pipeline runner and its Vault token. For higher security environments, one might explore platform-level attestations if the orchestrator supports it. Furthermore, the client token obtained by the application is relatively long-lived (1 hour in our config). For services handling extremely sensitive data, a more sophisticated approach would involve the application renewing its own token periodically, a feature Vault’s tokens support, and ensuring the application code can handle token expiry and re-authentication gracefully without a full restart.