Our platform team was a bottleneck. Every feature squad, running with a Python backend and a collection of micro-frontends, needed isolated environments for development, staging, and QA. The process was a tangled mess of JIRA tickets, semi-automated Terraform modules, and manual kubectl
commands. A request for a new “staging” environment for the “payments” squad would trigger a week-long process involving at least three different engineers. It was slow, error-prone, and wildly inconsistent. The goal became clear: create a fully self-service, declarative API for application environments. A developer should be able to define their entire stack—backend service, database, and frontend hosting—in a single YAML file, commit it to git, and have a fully provisioned, ready-to-use environment materialize within minutes.
The Initial Concept and Technology Rationale
The core idea was to build an Internal Developer Platform (IDP) fronted by a simple, Kubernetes-native API. We didn’t want developers to write Terraform or understand the intricacies of AWS IAM policies. They should declare what they need, not how to build it.
Our technology choices were deliberate:
Crossplane: We evaluated standard Infrastructure as Code tools like Terraform. The problem is that they are primarily one-shot execution tools. They don’t maintain a constant reconciliation loop. We chose Crossplane because it extends the Kubernetes API, turning our cluster into a universal control plane. We could define our infrastructure as custom Kubernetes resources, and Crossplane’s controllers would work relentlessly to ensure the real-world state matched our declared state. This continuous reconciliation is the key to a truly declarative system.
Python: While Crossplane provides the declarative infrastructure layer, some procedural logic is unavoidable. We needed a robust scripting layer within our CI/CD pipelines to validate developer manifests, apply them to the cluster, and orchestrate subsequent steps like application deployment. Python, with its excellent
kubernetes
client andboto3
library, was the lingua franca of our backend teams, making it the obvious choice for this orchestration logic.Micro-frontends: This was our existing architectural pattern. Each frontend is a separate, buildable artifact of static files (JS, CSS, HTML). The challenge was to integrate the deployment of these static assets into the same unified, declarative workflow that managed the backend infrastructure.
Phase 1: Defining the Platform API with a CompositeResourceDefinition (XRD)
The first step was to define the “shape” of our self-service API. This is what developers would interact with. In Crossplane, this is done using a CompositeResourceDefinition
(XRD). It’s analogous to creating a custom resource definition (CRD), but for a higher-level abstraction that will be composed of other resources.
Our UnifiedEnvironment
XRD needed to capture the essential inputs from a developer: the application name, the desired version of the Python backend image, the size of the database, and any frontend-specific configurations.
Here is the complete XRD. A real-world project would have many more options, but this captures the core structure.
---
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: unifiedenvironments.platform.acme.com
spec:
group: platform.acme.com
names:
kind: UnifiedEnvironment
listKind: UnifiedEnvironmentList
plural: unifiedenvironments
singular: unifiedenvironment
claimNames:
kind: EnvironmentClaim
listKind: EnvironmentClaimList
plural: environmentclaims
singular: environmentclaim
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
# --- Backend Configuration ---
backend:
type: object
description: Configuration for the Python backend service.
properties:
image:
type: string
description: The full container image URI for the backend service.
port:
type: integer
description: The port the backend application listens on.
default: 8000
replicas:
type: integer
description: Number of replicas for the backend deployment.
default: 1
required:
- image
# --- Database Configuration ---
database:
type: object
description: Configuration for the required PostgreSQL database.
properties:
size:
type: string
description: The desired size of the database.
enum: ["small", "medium", "large"]
default: "small"
required:
- size
# --- Frontend Configuration ---
frontend:
type: object
description: Configuration for the micro-frontend static assets.
properties:
appName:
type: string
description: A unique name for the frontend application, used for resource naming.
required:
- appName
required:
- backend
- database
- frontend
status:
type: object
properties:
databaseHost:
type: string
description: The connection host for the provisioned database.
frontendBucketName:
type: string
description: The name of the S3 bucket created for the frontend assets.
ready:
type: boolean
description: Indicates if all underlying resources are provisioned and ready.
The critical parts here are the spec
and status
fields. The spec
is the developer’s desired state. We use OpenAPI v3 schema validation to enforce rules, like ensuring database.size
is one of the allowed values. The status
sub-resource is where Crossplane will write back the outcomes of the provisioning process, like the generated database hostname or the S3 bucket name. Our Python orchestrator will poll this status to know when to proceed.
Phase 2: Implementing the Provisioning Logic with a Composition
With the API defined, we now need to implement the logic that translates a UnifiedEnvironment
resource into actual infrastructure. This is done with a Composition
. It’s a template that maps the inputs from the XRD to a set of concrete managed resources from Crossplane providers (like provider-aws
and provider-kubernetes
).
This is where the real power lies. A single Composition
can create resources across multiple clouds and providers.
---
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: unifiedenvironment.aws.platform.acme.com
labels:
provider: aws
spec:
compositeTypeRef:
apiVersion: platform.acme.com/v1alpha1
kind: UnifiedEnvironment
# Define the resources to be created when a UnifiedEnvironment is instantiated.
resources:
# 1. A dedicated Kubernetes Namespace for isolation.
- name: namespace
base:
apiVersion: kubernetes.core.v1.Namespace
kind: Object
metadata:
labels:
team: platform-team
patches:
- fromFieldPath: "metadata.name"
toFieldPath: "metadata.name"
transforms:
- type: string
string:
fmt: "%s-environment"
# 2. An S3 bucket for the micro-frontend static assets.
- name: frontend-bucket
base:
apiVersion: s3.aws.upbound.io/v1beta1
kind: Bucket
spec:
forProvider:
region: us-east-1
acl: public-read
# We must specify a provider config to use for AWS credentials.
providerConfigRef:
name: aws-default
patches:
- fromFieldPath: "spec.frontend.appName"
toFieldPath: "metadata.name"
transforms:
- type: string
string:
fmt: "acme-mfe-%s"
- fromFieldPath: "metadata.uid" # Use UID for global uniqueness
toFieldPath: "metadata.name"
transforms:
- type: string
string:
fmt: "acme-mfe-%s-%s" # e.g., acme-mfe-payments-a1b2c3d4
vars:
- fromFieldPath: "spec.frontend.appName"
policy:
fromFieldPath: Required
# Write the bucket name back to the status of the UnifiedEnvironment CR
- fromFieldPath: "status.atProvider.id"
toFieldPath: "status.frontendBucketName"
policy:
toFieldPath: Required
# 3. A PostgreSQL instance via the official Crossplane provider.
# In a real scenario, this would likely be an RDSInstance or CloudSQL resource.
- name: postgres-db
base:
apiVersion: postgresql.sql.crossplane.io/v1alpha1
kind: Database
spec:
# This provider also needs a config pointing to a PostgreSQL server.
providerConfigRef:
name: postgresql-provider
patches:
- fromFieldPath: "metadata.name"
toFieldPath: "metadata.name"
# This is the most critical part for application connectivity.
# We instruct Crossplane to write the connection secret to a specific key
# in the newly created namespace.
connectionDetails:
- fromConnectionSecretKey: "username"
- fromConnectionSecretKey: "password"
- fromConnectionSecretKey: "endpoint"
- fromConnectionSecretKey: "port"
writeConnectionSecretToRef:
namespaceFieldPath: "metadata.name"
name: "db-connection-details"
# 4. The Kubernetes Deployment for the Python backend.
- name: backend-deployment
base:
apiVersion: kubernetes.apps.v1.Deployment
kind: Object
spec:
replicas: 1
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
containers:
- name: backend-container
image: placeholder-image
ports:
- containerPort: 8000
envFrom:
- secretRef:
name: "db-connection-details" # Consume the secret!
patches:
- fromFieldPath: "metadata.name"
toFieldPath: "metadata.namespace"
transforms:
- type: string
string:
fmt: "%s-environment"
- fromFieldPath: "spec.backend.replicas"
toFieldPath: "spec.replicas"
- fromFieldPath: "spec.backend.image"
toFieldPath: "spec.template.spec.containers[0].image"
- fromFieldPath: "spec.backend.port"
toFieldPath: "spec.template.spec.containers[0].ports[0].containerPort"
A common pitfall here is managing secrets. How does the Python application get the credentials for the database that Crossplane just created? The writeConnectionSecretToRef
field is the answer. It tells the PostgreSQL provider to take the connection details it generates (host, user, password) and write them into a standard Kubernetes Secret
named db-connection-details
inside the newly created namespace. The Deployment
resource then uses envFrom
to mount this secret as environment variables, making them available to the Python application seamlessly.
Phase 3: The Python Orchestrator for CI/CD
Now we have the declarative layer. The next piece is the procedural “glue” that runs in our CI pipeline. This Python script is responsible for taking a developer’s manifest, performing sanity checks, and interacting with the Kubernetes API to manage the UnifiedEnvironment
custom resource.
# orchestrator.py
import os
import sys
import time
import logging
import yaml
from kubernetes import client, config
from kubernetes.client.rest import ApiException
# --- Configuration ---
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
API_GROUP = "platform.acme.com"
API_VERSION = "v1alpha1"
RESOURCE_PLURAL = "unifiedenvironments"
POLL_INTERVAL_SECONDS = 15
MAX_WAIT_MINUTES = 20
# --- Kubernetes Client Setup ---
def get_k8s_client():
"""Initializes and returns a Kubernetes API client."""
try:
# Assumes running inside a pod with a service account
config.load_incluster_config()
except config.ConfigException:
try:
# Fallback for local development
config.load_kube_config()
except config.ConfigException:
raise RuntimeError("Could not configure Kubernetes client.")
return client.CustomObjectsApi()
def validate_manifest(manifest_path):
"""
Performs basic validation on the developer's manifest.
A production system would use a schema validation library like pykwalify.
"""
if not os.path.exists(manifest_path):
raise FileNotFoundError(f"Manifest file not found: {manifest_path}")
with open(manifest_path, 'r') as f:
data = yaml.safe_load(f)
if not all(k in data for k in ["apiVersion", "kind", "metadata", "spec"]):
raise ValueError("Manifest missing required top-level keys.")
logging.info(f"Manifest {manifest_path} passed basic validation.")
return data
def apply_environment(api_client, manifest_data):
"""
Applies the UnifiedEnvironment manifest to the cluster using server-side apply.
"""
namespace = "default" # Or a dedicated management namespace
name = manifest_data["metadata"]["name"]
try:
logging.info(f"Applying UnifiedEnvironment '{name}'...")
api_client.patch_namespaced_custom_object(
group=API_GROUP,
version=API_VERSION,
namespace=namespace,
plural=RESOURCE_PLURAL,
name=name,
body=manifest_data,
field_manager="PipelineOrchestrator",
force=True # Required for server-side apply patch type
)
logging.info(f"Successfully applied UnifiedEnvironment '{name}'.")
except ApiException as e:
logging.error(f"Failed to apply manifest for '{name}': {e.body}")
raise
def wait_for_ready(api_client, resource_name):
"""
Polls the status of the UnifiedEnvironment resource until it is ready.
"""
namespace = "default"
start_time = time.time()
max_wait_seconds = MAX_WAIT_MINUTES * 60
logging.info(f"Waiting for UnifiedEnvironment '{resource_name}' to become ready...")
while time.time() - start_time < max_wait_seconds:
try:
resource = api_client.get_namespaced_custom_object(
group=API_GROUP,
version=API_VERSION,
namespace=namespace,
plural=RESOURCE_PLURAL,
name=resource_name
)
# Check the status conditions populated by Crossplane
if 'status' in resource and 'conditions' in resource['status']:
is_ready = any(
cond['type'] == 'Ready' and cond['status'] == 'True'
for cond in resource['status']['conditions']
)
if is_ready:
logging.info(f"UnifiedEnvironment '{resource_name}' is ready.")
return resource # Return the full object for later use
logging.info(f"'{resource_name}' not ready yet. Polling again in {POLL_INTERVAL_SECONDS}s.")
except ApiException as e:
if e.status == 404:
logging.warning(f"Resource '{resource_name}' not found yet. Retrying...")
else:
logging.error(f"API error while polling '{resource_name}': {e.body}")
raise
time.sleep(POLL_INTERVAL_SECONDS)
raise TimeoutError(f"Timed out after {MAX_WAIT_MINUTES} minutes waiting for '{resource_name}'.")
if __name__ == "__main__":
if len(sys.argv) != 2:
print(f"Usage: python {sys.argv[0]} <path_to_manifest.yaml>")
sys.exit(1)
manifest_file = sys.argv[1]
try:
api = get_k8s_client()
manifest = validate_manifest(manifest_file)
resource_name = manifest["metadata"]["name"]
apply_environment(api, manifest)
ready_resource = wait_for_ready(api, resource_name)
# This is where we would trigger the next stage of the pipeline
bucket_name = ready_resource.get('status', {}).get('frontendBucketName')
if bucket_name:
logging.info(f"NEXT_STEP: Trigger frontend deployment to S3 bucket: {bucket_name}")
# Set output for the CI/CD system, e.g.,
# print(f"::set-output name=bucket_name::{bucket_name}")
else:
logging.warning("Could not find frontendBucketName in status.")
except (FileNotFoundError, ValueError, RuntimeError, ApiException, TimeoutError) as err:
logging.error(f"Orchestration failed: {err}")
sys.exit(1)
This script is designed to be executed by a CI/CD runner (like GitLab CI or GitHub Actions). The wait_for_ready
function is critical. A common mistake is to kubectl apply
and immediately move to the next step. Infrastructure provisioning takes time. This function polls the custom resource’s .status.conditions
field, which Crossplane updates as it works. Only when the Ready
condition is True
can we be certain that the database is available and the S3 bucket exists.
Phase 4: Integrating the Micro-frontend Deployment
The final piece of the puzzle is deploying the static frontend assets. After the Python orchestrator confirms the UnifiedEnvironment
is ready, it extracts the provisioned S3 bucket name from the resource’s status. The CI pipeline can then proceed to a stage that builds the React/Vue/Angular application and syncs the output to this bucket.
A small Python script using boto3
can handle this upload.
# deploy_frontend.py
import boto3
import logging
import os
import sys
from botocore.exceptions import ClientError
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
def sync_to_s3(source_dir, bucket_name):
"""
Syncs a local directory to an S3 bucket.
Assumes AWS credentials are configured via environment variables or IAM role.
"""
s3_client = boto3.client('s3')
logging.info(f"Starting sync of '{source_dir}' to bucket '{bucket_name}'...")
# A real implementation would be more robust, handling content types, etc.
for root, _, files in os.walk(source_dir):
for filename in files:
local_path = os.path.join(root, filename)
relative_path = os.path.relpath(local_path, source_dir)
s3_key = relative_path.replace("\\", "/") # For Windows compatibility
try:
s3_client.upload_file(local_path, bucket_name, s3_key)
logging.info(f"Uploaded {local_path} to s3://{bucket_name}/{s3_key}")
except ClientError as e:
logging.error(f"Failed to upload {local_path}: {e}")
return False
logging.info("Sync completed successfully.")
return True
if __name__ == "__main__":
try:
build_directory = os.environ['BUILD_DIR']
target_bucket = os.environ['S3_BUCKET_NAME']
except KeyError as e:
logging.error(f"Missing required environment variable: {e}")
sys.exit(1)
if not sync_to_s3(build_directory, target_bucket):
sys.exit(1)
The CI/CD pipeline now has a clear, ordered flow:
- Developer commits a
my-feature-env.yaml
manifest. - Pipeline triggers, executes
orchestrator.py apply ...
. -
orchestrator.py
applies the manifest and waits for theUnifiedEnvironment
to beReady
. -
orchestrator.py
extracts the bucket name from the status and passes it to the next stage. - The next stage runs the frontend build, then executes
deploy_frontend.py
to upload the assets.
A major pitfall we hit here was permissions. The CI runner’s service account needs Kubernetes permissions to manage our custom resources, but it also needs AWS permissions to upload to S3. We solved this using IAM Roles for Service Accounts (IRSA) on our EKS cluster. This allows us to associate a Kubernetes service account with an AWS IAM role, providing secure, keyless access to AWS resources from within the pod running the pipeline job.
The Final Workflow
The result is a powerful, fully automated system that abstracts away massive complexity from our developers.
graph TD subgraph Git Repository A[Developer commits env.yaml] end subgraph "CI/CD Pipeline" B[Trigger Pipeline] --> C{Run Python Orchestrator}; C --> D[1. Validate Manifest]; D --> E[2. Apply UnifiedEnvironment CR to K8s]; E --> F{3. Poll CR Status}; F -- Not Ready --> F; F -- Ready --> G[4. Extract Bucket Name from Status]; G --> H{Run Frontend Deploy}; H --> I[Build Static Assets]; I --> J[Sync to S3 Bucket via Python/boto3]; end subgraph Kubernetes Cluster K[API Server] subgraph Crossplane L[Crossplane Controller] M[Provider-AWS Controller] N[Provider-K8s Controller] end end subgraph AWS O[S3 Bucket] P[RDS Database] end E --> K; K --> L; L --Reads Composition--> M & N; M --> O & P; N --> Q[K8s Namespace, Deployment, Secret]; J --> O; A --> B
This system turned a week-long, manual process into a five-minute, git-driven workflow. It improved consistency, reduced errors, and freed up the platform team to work on higher-value problems.
The current Python script is a pragmatic solution that lives within the CI/CD pipeline, but it has limitations. It’s procedural and only runs on a git commit. A more advanced architecture would involve replacing this script with a dedicated Kubernetes Operator, perhaps written using a framework like Kopf. This operator would watch for changes to UnifiedEnvironment
resources and orchestrate the frontend deployment reactively, providing a true control loop for the entire application stack, not just the infrastructure. Furthermore, while Crossplane handles provisioning, a robust de-provisioning strategy with finalizers is needed to ensure that external resources, like backups or DNS entries not directly managed by the Composition, are cleaned up properly when an environment is deleted.