Constructing a Git-Based Event Sourcing System for Dynamic Mobile App Configuration with Serverless Lua


The core problem was latency, but not network latency. It was human latency. The cycle time for pushing a simple configuration change—adjusting a feature flag, tweaking UI copy, or modifying a validation rule—to our mobile user base was measured in days, sometimes weeks. The bottleneck was the mandatory app store review process. This operational friction meant we couldn’t A/B test effectively, respond to market needs quickly, or hotfix minor logic flaws without a full release. We were building a modern mobile application on a deployment model that felt archaic.

Our initial concept was to externalize configuration. A simple key-value store or a JSON file in S3 seemed like the obvious first step. But this approach lacked crucial features: auditability, versioning, and the ability to handle complex, dynamic rules. We needed to know who changed what, when, and why. We needed the ability to roll back to any point in time instantly. And we needed to move beyond static values to executable logic, for instance, “enable this feature only for users in Germany on weekends.”

This led to a more radical proposal: treat mobile configuration as code, and build a system around the toolchain developers already trust: Git. The architecture would be built on four pillars:

  1. Git as the Source of Truth: All configurations, both static JSON and dynamic Lua scripts, would live in a dedicated Git repository. The commit history would serve as our immutable ledger.
  2. Event Sourcing as the Architectural Pattern: Instead of storing the current state of the configuration, we would store the sequence of changes (commits). A git push would be treated as an event to be processed.
  3. Serverless for Reactive Processing: AWS Lambda functions would act as the engine, reacting to Git push events to process changes, update read models, and serve configurations. This avoids managing servers for a workflow that is inherently spiky.
  4. Lua for Sandboxed Dynamic Logic: To safely execute business rules, we would embed a Lua interpreter within our Serverless functions. Lua is lightweight, fast, and provides a secure sandbox, preventing configuration scripts from impacting the host environment.

This wasn’t about building a simple remote config service. It was about building a robust, auditable, and developer-centric platform for controlling mobile application behavior in real-time.

The first decision point was how to structure the Git repository. A flat structure would become unmanageable. We settled on a hierarchical model that mirrored the application’s domain.

/
├── configs/
│   └── v1/
│       ├── checkout/
│       │   ├── payment_options.json
│       │   └── shipping_rates.json
│       └── home/
│           ├── featured_banner.json
│           └── promotions.json
└── scripts/
    └── v1/
        ├── checkout/
        │   └── validate_postal_code.lua
        └── home/
            └── is_eligible_for_promotion.lua

This structure allows us to version the entire configuration set (v1, v2, etc.) and organize configurations and scripts by feature (checkout, home). Pull requests become the mechanism for change control, providing code review and discussion for configuration changes.

With the source of truth defined, the next challenge was reacting to changes. We configured a webhook in our Git provider (GitHub) to fire on every push to the main branch. This webhook points to an AWS API Gateway endpoint, which triggers our first Lambda function: the Ingestion Lambda.

A common mistake here is to put all the processing logic into this single function. This creates a brittle, monolithic serverless function. A better practice is to use the ingestion function for one purpose only: to validate the webhook payload and enqueue a job for asynchronous processing. This decouples the ingestion from the heavy lifting and provides durability through an SQS queue.

Here’s the core of the Node.js Ingestion Lambda:

// ingest-lambda/index.js
const { SQSClient, SendMessageCommand } = require("@aws-sdk/client-sqs");
const crypto = require("crypto");

const sqsClient = new SQSClient({ region: process.env.AWS_REGION });
const QUEUE_URL = process.env.QUEUE_URL;
const GITHUB_SECRET = process.env.GITHUB_SECRET;

exports.handler = async (event) => {
    // 1. Validate the webhook signature to ensure it's from GitHub
    const signature = event.headers['x-hub-signature-256'];
    const hmac = crypto.createHmac('sha256', GITHUB_SECRET);
    hmac.update(event.body);
    const expectedSignature = `sha256=${hmac.digest('hex')}`;

    if (signature !== expectedSignature) {
        console.error("Signature mismatch");
        return { statusCode: 401, body: "Invalid signature" };
    }

    // 2. Parse the payload and extract relevant commit information
    const payload = JSON.parse(event.body);
    
    // We only care about pushes to the main branch
    if (payload.ref !== 'refs/heads/main') {
        return { statusCode: 200, body: "Ignoring non-main branch push" };
    }
    
    // A single push can contain multiple commits. We process the last one.
    const commit = payload.head_commit;
    if (!commit) {
        return { statusCode: 200, body: "No head_commit found" };
    }
    
    const message = {
        commitId: commit.id,
        author: commit.author.name,
        message: commit.message,
        timestamp: commit.timestamp,
        repoUrl: payload.repository.clone_url,
    };

    // 3. Enqueue the message for the processor
    const command = new SendMessageCommand({
        QueueUrl: QUEUE_URL,
        MessageBody: JSON.stringify(message),
        MessageGroupId: "config-processor" // For FIFO queues to ensure order
    });

    try {
        await sqsClient.send(command);
        console.log(`Successfully enqueued job for commit ${commit.id}`);
        return { statusCode: 202, body: "Accepted" };
    } catch (error) {
        console.error("Failed to enqueue message", error);
        return { statusCode: 500, body: "Internal server error" };
    }
};

This function is lean. It validates, extracts, and sends. The real work happens in the Processor Lambda, which is triggered by messages appearing in our SQS queue.

graph TD
    A[Git Push to main] -->|Webhook| B(API Gateway);
    B --> C{Ingestion Lambda};
    C -->|Validates & Enqueues| D[SQS FIFO Queue];
    D -->|Triggers| E{Processor Lambda};

The Processor Lambda is the heart of the Event Sourcing engine. Its job is to materialize the state of the Git repository at a specific commit and update a fast-access read model. The primary challenge inside a Lambda environment is dealing with the filesystem. Lambda containers have an ephemeral, writable /tmp directory (limited to 512MB-10GB depending on configuration), which is where we must clone the Git repository.

A naive implementation might clone the entire repository on every invocation. This is slow and inefficient. In a real-world project, the pitfall is that repository size grows over time, increasing Lambda execution duration and cost. The pragmatic solution is to use a shallow clone.

Here’s a conceptual implementation of the Processor Lambda in Python, highlighting the critical sections.

# processor-lambda/main.py
import os
import json
import boto3
import subprocess
from datetime import datetime

# AWS clients
dynamodb = boto3.resource('dynamodb')
secrets_manager = boto3.client('secretsmanager')

# DynamoDB table names from environment variables
EVENT_STORE_TABLE = os.environ['EVENT_STORE_TABLE']
READ_MODEL_TABLE = os.environ['READ_MODEL_TABLE']
GIT_SECRET_ARN = os.environ['GIT_SECRET_ARN']

event_store = dynamodb.Table(EVENT_STORE_TABLE)
read_model = dynamodb.Table(READ_MODEL_TABLE)

# Fetch Git credentials securely from AWS Secrets Manager
secret = secrets_manager.get_secret_value(SecretId=GIT_SECRET_ARN)
git_credentials = json.loads(secret['SecretString'])
GIT_USERNAME = git_credentials['username']
GIT_TOKEN = git_credentials['token']

def handler(event, context):
    for record in event['Records']:
        message = json.loads(record['body'])
        commit_id = message['commitId']
        repo_url = message['repoUrl']
        
        # Authenticated repo URL
        auth_repo_url = f"https://{GIT_USERNAME}:{GIT_TOKEN}@{repo_url.split('//')[1]}"
        
        # --- 1. Git Operations in /tmp ---
        repo_path = f"/tmp/{commit_id}"
        if os.path.exists(repo_path):
            # Clean up previous invocation artifacts if any
            subprocess.run(['rm', '-rf', repo_path], check=True)
            
        try:
            # Shallow clone for efficiency
            subprocess.run(
                ['git', 'clone', '--depth', '1', '--branch', 'main', auth_repo_url, repo_path],
                check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE
            )
            # Ensure we are at the exact commit ID from the webhook
            subprocess.run(
                ['git', '-C', repo_path, 'checkout', commit_id],
                check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE
            )
        except subprocess.CalledProcessError as e:
            print(f"Git operation failed: {e.stderr.decode()}")
            # Re-queue or move to DLQ logic would go here
            raise e

        # --- 2. Record Event in Event Store ---
        # This creates our immutable, auditable log
        event_item = {
            'eventId': f"evt_{datetime.utcnow().isoformat()}",
            'commitId': commit_id,
            'commitMessage': message['message'],
            'author': message['author'],
            'processedAt': datetime.utcnow().isoformat(),
        }
        event_store.put_item(Item=event_item)

        # --- 3. Update the Read Model Projection ---
        # Walk the cloned repo and update DynamoDB
        for root, dirs, files in os.walk(repo_path):
            if '.git' in dirs:
                dirs.remove('.git') # don't walk the git directory
                
            for file_name in files:
                file_path = os.path.join(root, file_name)
                
                # Create a unique key for the read model based on file path
                # e.g., /tmp/commit_id/configs/v1/home/featured_banner.json -> configs/v1/home/featured_banner.json
                relative_path = os.path.relpath(file_path, repo_path)
                
                with open(file_path, 'r') as f:
                    content = f.read()

                # In a production system, you'd add versioning here for optimistic locking
                read_model.put_item(
                    Item={
                        'configKey': relative_path,
                        'content': content,
                        'commitId': commit_id,
                        'lastUpdated': datetime.utcnow().isoformat()
                    }
                )
        
        # --- 4. Cleanup ---
        subprocess.run(['rm', '-rf', repo_path], check=True)

    return {"status": "success"}

This Processor Lambda does two crucial things:

  1. Records the Event: It writes a record to the EventStore table. This table is append-only and provides the audit trail, linking our internal event ID to a specific Git commit.
  2. Projects the State: It updates the ReadModel table. This DynamoDB table is optimized for fast lookups by the mobile client’s backend. The configKey (e.g., scripts/v1/home/is_eligible_for_promotion.lua) becomes the primary key for direct fetching.

The final piece of the puzzle is the API that the mobile app consumes. This is another Lambda function, the API Endpoint Lambda, exposed via API Gateway. Its sole responsibility is to serve the configuration. For static JSON, it’s a simple lookup in the ReadModel table. The interesting part is handling the Lua scripts.

To run Lua inside AWS Lambda, we can’t just import lua. We need a Lua runtime. The most robust way to achieve this is with a Lambda Layer. We can compile a Lua interpreter, package it into a zip file, and attach it as a layer to our function. For better performance and safety, this endpoint function is best written in a compiled language like Go or Rust, which has excellent libraries for embedding Lua.

Here’s a conceptual Go implementation for the API Endpoint Lambda that executes a Lua script.

// api-endpoint-lambda/main.go
package main

import (
	"context"
	"encoding/json"
	"fmt"
	"os"

	"github.com/aws/aws-lambda-go/events"
	"github.com/aws/aws-lambda-go/lambda"
	"github.com/aws/aws-sdk-go-v2/config"
	"github.com/aws/aws-sdk-go-v2/service/dynamodb"
	"github.com/aws/aws-sdk-go-v2/service/dynamodb/types"
	lua "github.com/yuin/gopher-lua"
)

var ddbClient *dynamodb.Client
var readModelTable string

func init() {
	cfg, err := config.LoadDefaultConfig(context.TODO())
	if err != nil {
		panic(fmt.Sprintf("unable to load SDK config, %v", err))
	}
	ddbClient = dynamodb.NewFromConfig(cfg)
	readModelTable = os.Getenv("READ_MODEL_TABLE")
}

// Represents the data passed from the mobile app
type RequestBody struct {
	ConfigKey string                 `json:"configKey"`
	Context   map[string]interface{} `json:"context"`
}

// Fetches script content from our DynamoDB read model
func fetchScriptFromReadModel(ctx context.Context, key string) (string, error) {
	input := &dynamodb.GetItemInput{
		TableName: &readModelTable,
		Key: map[string]types.AttributeValue{
			"configKey": &types.AttributeValueMemberS{Value: key},
		},
	}
	result, err := ddbClient.GetItem(ctx, input)
	if err != nil {
		return "", err
	}
	if result.Item == nil {
		return "", fmt.Errorf("configKey not found: %s", key)
	}
	
	content, ok := result.Item["content"].(*types.AttributeValueMemberS)
	if !ok {
		return "", fmt.Errorf("invalid content format for key: %s", key)
	}

	return content.Value, nil
}

// Converts a Go map to a gopher-lua LTable
func mapToLTable(L *lua.LState, data map[string]interface{}) *lua.LTable {
	table := L.NewTable()
	for k, v := range data {
		switch val := v.(type) {
		case string:
			table.RawSetString(k, lua.LString(val))
		case float64:
			table.RawSetString(k, lua.LNumber(val))
		case bool:
			table.RawSetString(k, lua.LBool(val))
		// Add other type conversions as needed
		}
	}
	return table
}

func handler(ctx context.Context, request events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {
	var body RequestBody
	err := json.Unmarshal([]byte(request.Body), &body)
	if err != nil {
		return events.APIGatewayProxyResponse{Body: "Invalid request body", StatusCode: 400}, nil
	}
    
    // 1. Fetch the Lua script from DynamoDB
	scriptContent, err := fetchScriptFromReadModel(ctx, body.ConfigKey)
	if err != nil {
		// Log the internal error
		fmt.Printf("Failed to fetch script: %v\n", err)
		return events.APIGatewayProxyResponse{Body: "Configuration not found", StatusCode: 404}, nil
	}

    // 2. Initialize a new Lua state (VM) for each invocation for isolation
	L := lua.NewState()
	defer L.Close()
	
	// 3. Pre-load the mobile app's context into the Lua environment
	contextTable := mapToLTable(L, body.Context)
	L.SetGlobal("mobile_context", contextTable)

    // 4. Execute the script
	if err := L.DoString(scriptContent); err != nil {
		fmt.Printf("Lua execution error: %v\n", err)
        // A common mistake is returning the raw Lua error to the client.
        // This can leak implementation details. Return a generic error instead.
		return events.APIGatewayProxyResponse{Body: "Error executing configuration logic", StatusCode: 500}, nil
	}
	
    // 5. Extract the result. We expect the script to return a table.
	result := L.Get(-1) // get the value from the top of the stack
	if tbl, ok := result.(*lua.LTable); ok {
		// In a real implementation, you would convert this LTable back to a JSON string.
		// For simplicity, we'll just confirm it's a table.
		responseBody, _ := json.Marshal(map[string]string{"result": "success", "type": "table"})
		return events.APIGatewayProxyResponse{
			Body:       string(responseBody),
			StatusCode: 200,
			Headers:    map[string]string{"Content-Type": "application/json"},
		}, nil
	}

	return events.APIGatewayProxyResponse{Body: "Lua script did not return a table", StatusCode: 500}, nil
}

func main() {
	lambda.Start(handler)
}

The overall architecture now looks like this:

sequenceDiagram
    participant Dev as Developer
    participant Git
    participant Ingestion as Ingestion Lambda
    participant SQS
    participant Processor as Processor Lambda
    participant EventStore as Event Store (DDB)
    participant ReadModel as Read Model (DDB)
    participant MobileApp as Mobile App
    participant APIEndpoint as API Endpoint Lambda

    Dev->>+Git: git push
    Git->>+Ingestion: Webhook
    Ingestion->>+SQS: Enqueue Commit Job
    SQS-->>+Processor: Trigger
    Processor->>+Git: Clone Repo at Commit
    Processor->>+EventStore: Store Event
    Processor->>+ReadModel: Update Projection
    
    par
        MobileApp->>+APIEndpoint: Request Config ('logic.lua', context)
        APIEndpoint->>+ReadModel: Fetch 'logic.lua'
        APIEndpoint->>APIEndpoint: Execute Lua script with context
        APIEndpoint-->>-MobileApp: Return result (JSON)
    and
        MobileApp->>+APIEndpoint: Request Config ('config.json')
        APIEndpoint->>+ReadModel: Fetch 'config.json'
        APIEndpoint-->>-MobileApp: Return result (JSON)
    end

This system achieves our goals. Changes are versioned, auditable, and controlled via pull requests. Deployments are near-instantaneous, triggered by a git push. Complex logic is handled safely in a sandboxed environment. The event-sourced nature provides a full history, allowing us to not only roll back but also to potentially replay the entire history to rebuild the read model from scratch if needed.

However, this architecture is not without its limitations and trade-offs. The use of Git as a database introduces latency in the write path; the clone/checkout process in the Processor Lambda can be slow for large repositories. While shallow clones mitigate this, a more advanced solution might involve using AWS EFS to persist the repository across Lambda invocations, turning clones into faster git pull operations. Furthermore, the single SQS FIFO queue ensures ordering but creates a serial processing bottleneck. For a high-throughput system, one might partition the configurations and use multiple queues or a service like Kinesis to process changes in parallel while maintaining order per configuration key. Finally, the security of the Lua sandbox is paramount. It must be heavily restricted, disabling access to network, filesystem, and other OS-level libraries to prevent malicious code execution.


  TOC