The mandate was simple: no public internet access for our production build and deployment environment. Our entire infrastructure runs inside a strictly controlled, air-gapped Virtual Private Cloud (VPC). For our backend Elixir services, this was manageable. We built release artifacts using mix release
, packaged them in containers with all dependencies vendored, and pushed them to our internal registry. The real friction emerged with our frontend assets. The web application relies on a modern JavaScript stack, with Babel at its core for transpilation. The standard npm install && npx babel ...
workflow, which assumes unfettered access to the public npm registry, was a non-starter.
Our initial workaround was a painful, multi-step manual process. A developer would build the assets on their local machine, commit the compiled artifacts to a separate repository, and then a sanitized CI/CD pipeline would pull these pre-built assets and deploy them. This was fragile, slow, and a constant source of “it works on my machine” issues. It completely broke the ethos of a repeatable, automated build pipeline. We needed an internal, automated service that could reliably transpile our JavaScript assets within our secure perimeter.
The first thought was to stand up a dedicated Jenkins or GitLab runner inside the VPC, equipped with a private npm registry mirror. This is a common pattern, but it felt heavy and ill-suited for the specific task. We needed something lightweight, concurrent, and exceptionally fault-tolerant. A stuck npm install
shouldn’t require an administrator to SSH into a runner and kill a process. We wanted a system where build jobs were isolated processes that could fail and be cleaned up automatically without affecting other pending jobs. This line of thinking led us to Elixir and OTP. The BEAM’s model of lightweight, isolated processes with built-in supervision is a perfect match for managing potentially unreliable external tasks like a Babel build. We decided to build a dedicated Elixir service, codenamed “AssetForge,” to act as an on-demand, concurrent build orchestrator.
The core concept was an Elixir application that exposes an internal API. This API would accept a git repository URL and a commit hash. The service would then:
- Check out the specified code into a temporary, isolated directory.
- Run
npm install
against our internal, mirrored npm registry. - Execute the Babel CLI to transpile the assets.
- On success, push the compiled assets to a designated S3 bucket inside the VPC.
- On failure, log the error and clean up the temporary directory.
- Each build job would run in its own supervised OTP process, ensuring that a single failure could not crash the entire system.
The first step was setting up the Elixir project and outlining the core components.
$ mix new asset_forge --sup
Our supervision tree would be simple but robust. The main application supervisor would oversee a DynamicSupervisor
, which would be responsible for starting and stopping our BuildWorker
GenServers on demand. A DynamicSupervisor
is ideal here because we don’t know how many builds will be running at any given time.
# lib/asset_forge/application.ex
defmodule AssetForge.Application do
@moduledoc false
use Application
@impl true
def start(_type, _args) do
children = [
{DynamicSupervisor, name: AssetForge.BuildSupervisor, strategy: :one_for_one}
]
opts = [strategy: :one_for_one, name: AssetForge.Supervisor]
Supervisor.start_link(children, opts)
end
end
The real work happens inside the BuildWorker
, a GenServer
that encapsulates the state and logic for a single build job. It’s started with the necessary context (repo, commit hash) and manages the entire lifecycle of the build.
A public-facing module, AssetForge
, provides the entry point to start a new build.
# lib/asset_forge.ex
defmodule AssetForge do
@doc """
Starts a new build job asynchronously.
"""
def start_build(repo_url, commit_hash) do
spec = {AssetForge.BuildWorker, {repo_url, commit_hash}}
DynamicSupervisor.start_child(AssetForge.BuildSupervisor, spec)
end
end
The BuildWorker
itself needs to handle the sequence of operations. We decided to use Task.async/1
within the GenServer
‘s init/1
callback. This prevents the init
from blocking the caller and allows the DynamicSupervisor
to immediately start tracking the new process while the actual work happens in the background.
# lib/build_worker.ex
defmodule AssetForge.BuildWorker do
use GenServer
require Logger
# Time in milliseconds to allow for the entire build process.
@build_timeout 300_000 # 5 minutes
def start_link({repo_url, commit_hash}) do
GenServer.start_link(__MODULE__, {repo_url, commit_hash})
end
@impl true
def init({repo_url, commit_hash}) do
# The state holds all necessary information for the build.
state = %{
repo_url: repo_url,
commit_hash: commit_hash,
work_dir: create_work_dir(),
status: :pending,
build_task: nil
}
# Start the build process in a separate task so init doesn't block.
# The GenServer process will monitor this task.
build_task = Task.async(fn -> run_build_flow(state.work_dir, repo_url, commit_hash) end)
{:ok, %{state | build_task: build_task}}
end
# The GenServer waits for the build task to complete.
@impl true
def handle_info({ref, result}, state) when ref == state.build_task.ref do
case result do
{:ok, build_output_path} ->
Logger.info("Build succeeded. Artifacts at: #{build_output_path}")
# In a real system, we'd trigger the VPC deployment here.
# e.g., AssetForge.Deployer.upload_to_vpc_s3(build_output_path)
new_state = %{state | status: :success}
{:error, reason} ->
Logger.error("Build failed: #{inspect(reason)}")
new_state = %{state | status: :failed}
end
# Clean up the working directory regardless of outcome.
File.rm_rf!(state.work_dir)
Logger.info("Cleaned up work directory: #{state.work_dir}")
# The process has done its job and can now terminate.
{:stop, :normal, new_state}
end
# If the build task takes too long, we might want a timeout.
# This handle_info is a failsafe.
@impl true
def handle_info(:timeout, state) do
Logger.error("Build timed out for repo #{state.repo_url} at commit #{state.commit_hash}")
# Ensure the task is killed
Task.shutdown(state.build_task, :brutal_kill)
File.rm_rf!(state.work_dir)
{:stop, :shutdown, %{state | status: :timeout}}
end
# --- Private Helper Functions ---
defp create_work_dir do
# Generate a unique directory for each build to ensure isolation.
tmp_path = System.tmp_dir!()
build_id = :crypto.strong_rand_bytes(8) |> Base.encode16()
work_dir = Path.join(tmp_path, "asset_forge_#{build_id}")
File.mkdir_p!(work_dir)
work_dir
end
defp run_build_flow(work_dir, repo_url, commit_hash) do
with {:ok, _} <- git_clone(work_dir, repo_url, commit_hash),
{:ok, _} <- npm_install(work_dir),
{:ok, build_path} <- babel_transpile(work_dir) do
{:ok, build_path}
else
{:error, {cmd, exit_code, stderr}} ->
{:error, "Command '#{cmd}' failed with exit code #{exit_code}. Stderr: #{stderr}"}
{:error, reason} ->
{:error, reason}
end
end
end
The most critical part is the interaction with external commands: git
, npm
, and babel
. Using System.cmd/3
is a common approach, but for better control over I/O, error handling, and process management, Elixir’s Port
is a superior tool. A Port
allows the BEAM to communicate with an external OS process through standard I/O streams. This gives us a robust way to capture logs and errors.
We built a small utility module to wrap the Port
interaction, making it easier to reuse and test.
# lib/asset_forge/command_runner.ex
defmodule AssetForge.CommandRunner do
require Logger
# Timeout for external commands.
@command_timeout 60_000 # 1 minute
@doc """
Executes a command in a specified directory using a Port.
Returns {:ok, stdout} or {:error, {exit_code, stderr}}.
"""
def run(executable, args, opts \\ []) do
work_dir = Keyword.get(opts, :cd, System.cwd!())
port_opts = [
:binary,
{:args, args},
{:cd, work_dir},
:exit_status,
:hide,
:use_stdio
]
port = Port.open({:spawn_executable, executable}, port_opts)
# We need to collect stdout/stderr and wait for the exit status.
collect_output(port, %{stdout: "", stderr: ""})
end
defp collect_output(port, acc) do
receive do
{^port, {:data, data}} ->
# For simplicity, we merge stdout and stderr.
# In a production system, you might want to handle them separately.
new_acc = %{acc | stdout: acc.stdout <> data}
collect_output(port, new_acc)
{^port, {:exit_status, 0}} ->
Logger.debug("Command successful. Stdout: #{acc.stdout}")
{:ok, acc.stdout}
{^port, {:exit_status, exit_code}} ->
Logger.error("Command failed with exit code #{exit_code}. Stderr: #{acc.stdout}")
{:error, {exit_code, acc.stdout}}
after
@command_timeout ->
Port.close(port)
{:error, {:timeout, "Command timed out after #{@command_timeout}ms"}}
end
end
end
With this CommandRunner
, our run_build_flow
helpers become much cleaner and more robust.
# lib/build_worker.ex (helper function implementations)
defmodule AssetForge.BuildWorker do
# ... existing code ...
alias AssetForge.CommandRunner
defp git_clone(work_dir, repo_url, commit_hash) do
Logger.info("Cloning #{repo_url} into #{work_dir}")
case CommandRunner.run("git", ["clone", repo_url, "."], cd: work_dir) do
{:ok, _} ->
Logger.info("Checking out commit #{commit_hash}")
CommandRunner.run("git", ["checkout", commit_hash], cd: work_dir)
error ->
error
end
end
defp npm_install(work_dir) do
Logger.info("Running npm install in #{work_dir}")
# --registry points to our internal, air-gapped mirror.
# This configuration is critical.
registry_url = Application.fetch_env!(:asset_forge, :npm_registry)
CommandRunner.run("npm", ["install", "--registry=#{registry_url}"], cd: work_dir)
end
defp babel_transpile(work_dir) do
Logger.info("Running babel transpilation in #{work_dir}")
# These paths would be configured based on project conventions.
source_dir = Path.join(work_dir, "src")
output_dir = Path.join(work_dir, "dist")
File.mkdir_p!(output_dir)
# Using npx to ensure we use the project's local babel version.
case CommandRunner.run("npx", ["babel", source_dir, "--out-dir", output_dir], cd: work_dir) do
{:ok, _} -> {:ok, output_dir}
error -> error
end
end
end
Configuration is managed through config/config.exs
, ensuring we don’t hardcode sensitive information like registry URLs.
# config/prod.exs
import Config
config :asset_forge,
# This URL points to our internal Verdaccio/Artifactory instance
# within the VPC.
npm_registry: System.get_env("NPM_REGISTRY_URL"),
# Configuration for the S3 bucket where assets are deployed.
deployment_bucket: System.get_env("ASSET_DEPLOYMENT_BUCKET")
# Logger configuration for production
config :logger, :console,
format: "$time $metadata[$level] $message\n",
metadata: [:request_id]
To visualize the flow, the entire process can be mapped out.
sequenceDiagram participant Client as API Client participant AssetForge as AssetForge Service participant Supervisor as BuildSupervisor participant Worker as BuildWorker (GenServer) participant Runner as CommandRunner (Port) participant Infra as VPC Infrastructure Client->>+AssetForge: start_build(repo, hash) AssetForge->>+Supervisor: start_child(BuildWorker, {repo, hash}) Supervisor->>+Worker: start_link() Note right of Worker: init() called, Task.async started Worker->>+Runner: run("git clone ...") Runner-->>-Worker: {:ok, _} Worker->>+Runner: run("npm install ...") Runner-->>-Worker: {:ok, _} Worker->>+Runner: run("npx babel ...") Runner-->>-Worker: {:ok, build_path} Note right of Worker: Task completes successfully Worker-->>Supervisor: :normal exit Supervisor-->>-AssetForge: AssetForge-->>-Client: (ack) Worker->>Infra: Upload assets from build_path to S3
Testing this system required careful consideration of the external dependencies. We can’t run git
and npm
in our unit tests. The solution was to use Mox to create a mock for our CommandRunner
.
# test/support/conn_case.ex
Mox.defmock(AssetForge.CommandRunnerMock, for: AssetForge.CommandRunner)
And in config/test.exs
, we tell our application to use the mock instead of the real module.
# config/test.exs
config :asset_forge, command_runner: AssetForge.CommandRunnerMock
This allows us to write tests that verify the BuildWorker
‘s logic without ever touching the filesystem or network.
# test/asset_forge/build_worker_test.exs
defmodule AssetForge.BuildWorkerTest do
use ExUnit.Case, async: true
import Mox
alias AssetForge.CommandRunnerMock
setup :verify_on_exit!
test "a successful build flow executes all commands and stops normally" do
repo_url = "..."
commit_hash = "..."
# Expect all external commands to be called in order
expect(CommandRunnerMock, :run, fn "git", ["clone", ^repo_url, "."], _opts -> {:ok, ""} end)
expect(CommandRunnerMock, :run, fn "git", ["checkout", ^commit_hash], _opts -> {:ok, ""} end)
expect(CommandRunnerMock, :run, fn "npm", ["install", _], _opts -> {:ok, ""} end)
expect(CommandRunnerMock, :run, fn "npx", ["babel", _, _, _], _opts -> {:ok, ""} end)
# Start the worker, which will trigger the flow
{:ok, pid} = AssetForge.BuildWorker.start_link({repo_url, commit_hash})
# Assert that the process terminates cleanly after the work is done.
ref = Process.monitor(pid)
assert_receive {:DOWN, ^ref, :process, ^pid, :normal}
end
test "a failing npm install stops the flow and terminates" do
repo_url = "..."
commit_hash = "..."
expect(CommandRunnerMock, :run, 2, fn
"git", ["clone", _, _], _ -> {:ok, ""}
"git", ["checkout", _, _], _ -> {:ok, ""}
# Simulate npm failure
"npm", _, _ -> {:error, {1, "npm failed"}}
end)
{:ok, pid} = AssetForge.BuildWorker.start_link({repo_url, commit_hash})
ref = Process.monitor(pid)
# The process should stop, but not with a :normal reason.
# Our GenServer logic translates the error into a :normal stop,
# but the logs would show the failure. This is an implementation detail.
assert_receive {:DOWN, ^ref, :process, ^pid, :normal}
# A more advanced test could check for the log output.
end
end
The final implementation provided a stable, observable, and resilient service. Build jobs are isolated, timeouts prevent stuck processes from consuming resources indefinitely, and the supervision tree ensures the service as a whole remains healthy even if individual builds fail. It solved our air-gapped deployment problem in a way that felt native to the Elixir ecosystem, turning a brittle manual process into a reliable piece of infrastructure.
This architecture is not without its limitations. The current system executes commands directly on the host machine, which presents a security risk if a malicious package.json
script were introduced. A future iteration must sandbox the entire build process within a short-lived container (e.g., using Docker or gVisor), which the Elixir service would orchestrate. Furthermore, the build queue logic is non-existent; builds are started immediately. A more sophisticated system would use a queuing mechanism to control concurrency and prioritize jobs, perhaps by introducing another GenServer to act as a pool manager in front of the DynamicSupervisor
. The mechanism of using OS ports also carries a performance penalty due to data serialization between the BEAM and the external process; however, for a task as coarse-grained as a full npm/Babel build, this overhead is negligible compared to the benefits of process isolation and robustness.