The profiling charts were unambiguous. A specific data validation and transformation route in our Ktor-based ingestion service was responsible for over 60% of total GC pause time under load. The service receives batches of small binary records, validates them against a set of checksums and structural rules, and then transforms them into another binary format for downstream processing. The Kotlin implementation was clean, idiomatic, and correct, but it allocated millions of tiny objects per minute—byte arrays, data classes for intermediate representation, and so on. This created immense pressure on the JVM’s garbage collector, leading to unpredictable latency spikes that violated our service level objectives. The hot path was a pure, stateless computation, making it a prime candidate for offloading from the JVM entirely.
Our initial concept was to rewrite this critical module in a native language and bridge it to our Ktor application. The usual suspects were C++ and Rust. However, introducing a complex C++ build system into our primarily Gradle-based ecosystem was unappealing, and while Rust offers memory safety, the learning curve and the complexity of its FFI story for this specific, self-contained task felt like overkill. This is where Zig entered the picture. Zig’s promise of C ABI compatibility without the baggage, its dead-simple build system, and its explicit control over memory allocation made it an almost perfect fit. The plan was to use the Java Native Interface (JNI) to call a Zig function from Kotlin, but with one critical constraint: the entire data pipeline must be zero-copy. We would pass data from Ktor to Zig and back without ever copying it onto the JVM heap, operating directly on off-heap memory.
The core of the interaction would be defined by a Kotlin native method within a companion object. In a real-world project, organizing native calls is crucial for maintainability.
// src/main/kotlin/com/example/native/ZigDataProcessor.kt
package com.example.native
import java.nio.ByteBuffer
object ZigDataProcessor {
init {
// This static block ensures the native library is loaded once when the class is initialized.
// The actual library name will be libprocessor.so on Linux, libprocessor.dylib on macOS, etc.
// The JVM handles the platform-specific naming.
try {
System.loadLibrary("processor")
} catch (e: UnsatisfiedLinkError) {
// A common mistake is not having the native library on the java.library.path.
// This error message is crucial for debugging deployment issues.
System.err.println("Native code library 'processor' failed to load.\n" + e)
System.exit(1)
}
}
/**
* The core JNI bridge function.
* It's declared as 'external', signaling the JVM to look for a native implementation.
*
* @param input A direct ByteBuffer containing the raw input data. It MUST be a direct buffer
* to allow the native code to access its memory address directly.
* @param output A direct ByteBuffer where the transformed data will be written. It also MUST be direct.
* @return A status code. 0 for success, negative for specific errors. This is more robust
* than throwing exceptions from native code, which can be tricky to manage.
*/
external fun processRecords(input: ByteBuffer, output: ByteBuffer): Int
}
The key here is the use of java.nio.ByteBuffer
. When a ByteBuffer
is allocated as “direct” (ByteBuffer.allocateDirect()
), the JVM allocates the memory outside of the regular garbage-collected heap. This gives us a stable memory address that we can safely pass to our native Zig code.
With the Kotlin interface defined, the next step is to generate the C header file that JNI expects. This is a mechanical but essential process. From the project’s compiled classes directory, we run:
# Assuming build/classes/kotlin/main is our output directory
javac -h src/main/c -cp build/classes/kotlin/main com.example.native.ZigDataProcessor
This command generates src/main/c/com_example_native_ZigDataProcessor.h
, which contains the exact function signature our Zig code must implement.
/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class com_example_native_ZigDataProcessor */
#ifndef _Included_com_example_native_ZigDataProcessor
#define _Included_com_example_native_ZigDataProcessor
#ifdef __cplusplus
extern "C" {
#endif
/*
* Class: com_example_native_ZigDataProcessor
* Method: processRecords
* Signature: (Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)I
*/
JNIEXPORT jint JNICALL Java_com_example_native_ZigDataProcessor_processRecords
(JNIEnv *, jobject, jobject, jobject);
#ifdef __cplusplus
}
#endif
#endif
Now, we shift focus to the Zig implementation. This is where the real work happens. We need to create a file, say src/main/zig/processor.zig
, that implements the function defined in the generated header.
A critical part of writing robust native code for a long-running server is memory management. A memory leak in the native layer is far more dangerous than one on the JVM. Zig’s allocator model shines here. We’ll use an ArenaAllocator
to manage all temporary memory needed during the function call. When the function returns, we deinitialize the arena, and all allocated memory is freed at once. This completely prevents leaks within the scope of our JNI call.
// src/main/zig/processor.zig
const std = @import("std");
// We need access to the JNI type definitions. These are typically provided by the JDK.
// We'll tell the Zig build system where to find jni.h.
const jni = @cImport({
@cInclude("jni.h");
});
// A simple representation of our binary record structure.
// In a real scenario, this would be more complex.
const Record = extern struct {
id: u64,
timestamp: u64,
payload_len: u32,
// Using a zero-length array is a common C trick for flexible array members.
// In Zig, we'd typically use a slice, but for C ABI compatibility, this is direct.
payload: [0]u8,
};
const TransformedRecord = extern struct {
id: u64,
status: u8, // 0 = valid, 1 = invalid checksum
_padding: [7]u8, // Explicit padding for alignment
};
// Error codes to return to the JVM.
const SUCCESS: jni.jint = 0;
const ERR_INPUT_NULL: jni.jint = -1;
const ERR_OUTPUT_NULL: jni.jint = -2;
const ERR_INSUFFICIENT_OUTPUT_CAPACITY: jni.jint = -3;
// This is the function that will be called from Java. The name must match exactly.
// 'export' makes it visible in the final shared library.
export fn Java_com_example_native_ZigDataProcessor_processRecords(
env: *jni.JNIEnv,
// The second argument is the object instance for non-static methods,
// or the class object for static methods. We don't need it.
class: jni.jobject,
input_buffer_obj: jni.jobject,
output_buffer_obj: jni.jobject,
) callconv(.C) jni.jint {
// A robust JNI implementation must always check for null pointers.
if (input_buffer_obj == null) return ERR_INPUT_NULL;
if (output_buffer_obj == null) return ERR_OUTPUT_NULL;
// The JNIEnv is a pointer to a struct of function pointers.
// We access JNI functions through it, e.g., env.*.SomeFunction(...).
const jni_fns = env.*;
// Get the memory address and capacity of the direct ByteBuffers.
// This is the core of the zero-copy mechanism.
const input_ptr = jni_fns.GetDirectBufferAddress(env, input_buffer_obj);
if (input_ptr == null) return ERR_INPUT_NULL; // Could be a non-direct buffer.
const input_cap = jni_fns.GetDirectBufferCapacity(env, input_buffer_obj);
const output_ptr = jni_fns.GetDirectBufferAddress(env, output_buffer_obj);
if (output_ptr == null) return ERR_OUTPUT_NULL;
const output_cap = jni_fns.GetDirectBufferCapacity(env, output_buffer_obj);
// Create Zig slices that point directly to the Java-managed memory. No copy!
const input_slice = @ptrCast([*]u8, @alignCast(1, input_ptr))[0..@intCast(usize, input_cap)];
var output_slice = @ptrCast([*]u8, @alignCast(1, output_ptr))[0..@intCast(usize, output_cap)];
// Here we set up our memory management strategy for this function call.
// All temporary allocations will come from a 4KB buffer on the stack.
// If we needed more, we could use a different backing allocator.
var backing_buffer: [4096]u8 = undefined;
var fba = std.heap.FixedBufferAllocator.init(&backing_buffer);
const allocator = fba.allocator();
// The ArenaAllocator is layered on top. This is the idiomatic way to handle
// request-scoped memory in Zig.
var arena = std.heap.ArenaAllocator.init(allocator);
defer arena.deinit();
const arena_allocator = arena.allocator();
return process(arena_allocator, input_slice, output_slice);
}
fn process(allocator: std.mem.Allocator, input: []const u8, output: []u8) jni.jint {
_ = allocator; // In a more complex function, we'd use this.
var input_cursor: usize = 0;
var output_cursor: usize = 0;
while (input_cursor + @sizeOf(Record) < input.len) {
// Create a pointer to the current record position in the input slice.
const record_ptr = @ptrCast(*const Record, &input[input_cursor]);
const record = record_ptr.*;
// Boundary check to prevent reading past the end of the input slice.
// This is a common source of vulnerabilities in native code.
const next_cursor = input_cursor + @sizeOf(Record) + record.payload_len;
if (next_cursor > input.len) {
// Malformed input, stop processing.
break;
}
// Check if there's enough space in the output buffer for the transformed record.
if (output_cursor + @sizeOf(TransformedRecord) > output.len) {
return ERR_INSUFFICIENT_OUTPUT_CAPACITY;
}
const transformed_ptr = @ptrCast(*TransformedRecord, &output[output_cursor]);
transformed_ptr.*.id = record.id;
// Simple validation logic: a dummy checksum.
const payload = input[input_cursor + @sizeOf(Record) .. next_cursor];
if (calculateChecksum(payload) % 2 == 0) {
transformed_ptr.*.status = 0; // Valid
} else {
transformed_ptr.*.status = 1; // Invalid
}
input_cursor = next_cursor;
output_cursor += @sizeOf(TransformedRecord);
}
return SUCCESS;
}
fn calculateChecksum(payload: []const u8) u8 {
var checksum: u8 = 0;
for (payload) |byte| {
checksum +%= byte;
}
return checksum;
}
The next challenge is integrating the Zig build process with our existing Gradle build for the Ktor project. A common mistake is to handle this manually, which is brittle. The correct approach is to make the native compilation a part of the main build lifecycle.
We start with build.zig
:
// build.zig
const std = @import("std");
pub fn build(b: *std.Build) void {
const target = b.standardTargetOptions(.{});
const optimize = b.standardOptimizeOption(.{});
const lib = b.addSharedLibrary(.{
.name = "processor",
.root_source_file = .{ .path = "src/main/zig/processor.zig" },
.target = target,
.optimize = optimize,
});
// We must tell Zig where to find the JNI headers.
// We can pass this path in from the Gradle build script.
const jni_include_path = b.option([]const u8, "jni_include", "Path to jni.h") orelse
@panic("Please provide path to jni.h using -Djni_include=...");
const jni_platform_include_path = b.option([]const u8, "jni_platform_include", "Path to platform-specific jni headers") orelse
@panic("Please provide path to platform-specific jni_md.h using -Djni_platform_include=...");
lib.addIncludePath(.{ .path = jni_include_path });
lib.addIncludePath(.{ .path = jni_platform_include_path });
// Install the compiled library into a known location for Gradle to pick up.
b.installArtifact(lib);
}
Then, in our build.gradle.kts
, we define tasks to orchestrate this.
// build.gradle.kts
// Find the JDK's include directory for JNI headers. This is more robust than hardcoding paths.
val jdkIncludePath = System.getProperty("java.home") + File.separator + "include"
val jdkPlatformIncludePath = when {
System.getProperty("os.name").toLowerCase().contains("win") -> "$jdkIncludePath/win32"
System.getProperty("os.name").toLowerCase().contains("mac") -> "$jdkIncludePath/darwin"
else -> "$jdkIncludePath/linux"
}
// Task to compile the Zig code into a shared library.
val buildNativeLib = tasks.register<Exec>("buildNativeLib") {
group = "build"
description = "Compiles the Zig JNI library."
workingDir = project.projectDir
// We build into a dedicated directory.
val outputDir = File(buildDir, "zig")
commandLine(
"zig", "build",
"-Dtarget=native", // Build for the current architecture
"-Doptimize=ReleaseFast", // Optimize for speed in production
"--prefix", outputDir.absolutePath,
"-Djni_include=$jdkIncludePath",
"-Djni_platform_include=$jdkPlatformIncludePath"
)
}
// Make sure the native library is built before the Java code is processed.
tasks.named("processResources") {
dependsOn(buildNativeLib)
}
// We need to copy the final library into the resources so it can be loaded at runtime.
tasks.named<Copy>("processResources") {
val nativeLibDir = File(buildDir, "zig/lib")
from(nativeLibDir) {
// Include the correct file extension for the current OS.
include(System.mapLibraryName("processor"))
}
into(File(buildDir, "resources/main"))
}
// When running the application, we must tell the JVM where to find the library.
application {
mainClass.set("com.example.ApplicationKt")
applicationDefaultJvmArgs = listOf("-Djava.library.path=${File(buildDir, "resources/main").absolutePath}")
}
// For tests as well.
tasks.withType<Test> {
dependsOn(buildNativeLib)
systemProperty("java.library.path", File(buildDir, "resources/main").absolutePath)
}
With the build pipeline in place, we can now wire this into our Ktor application.
// src/main/kotlin/com/example/Application.kt
package com.example
import com.example.native.ZigDataProcessor
import io.ktor.server.application.*
import io.ktor.server.engine.*
import io.ktor.server.netty.*
import io.ktor.server.request.*
import io.ktor.server.response.*
import io.ktor.server.routing.*
import io.ktor.http.*
import java.nio.ByteBuffer
fun main() {
embeddedServer(Netty, port = 8080, host = "0.0.0.0") {
install(Routing) {
post("/process") {
val inputBytes = call.receive<ByteArray>()
// In a high-performance scenario, you'd want to pool these direct buffers
// instead of allocating them on every request. This is a crucial optimization.
val inputBuffer = ByteBuffer.allocateDirect(inputBytes.size)
inputBuffer.put(inputBytes)
inputBuffer.flip() // Prepare for reading
// Assuming the output is always smaller or equal. A real-world implementation
// would need a more sophisticated way to determine the required output size.
val outputBuffer = ByteBuffer.allocateDirect(inputBytes.size)
val status = try {
ZigDataProcessor.processRecords(inputBuffer, outputBuffer)
} catch (t: Throwable) {
// Catching everything around a JNI call is good practice.
// A crash in native code can bring down the entire JVM.
log.error("JNI call failed", t)
-99 // Sentinel for fatal error
}
if (status == 0) {
outputBuffer.flip() // Prepare for reading
val resultBytes = ByteArray(outputBuffer.remaining())
outputBuffer.get(resultBytes)
call.respondBytes(resultBytes, ContentType.Application.OctetStream, HttpStatusCode.OK)
} else {
call.respondText(
"Processing failed with native code: $status",
status = HttpStatusCode.InternalServerError
)
}
}
}
}.start(wait = true)
}
To validate the approach, we can create an integration test.
// src/test/kotlin/com/example/ApplicationTest.kt
package com.example
import com.example.native.ZigDataProcessor
import org.junit.Test
import java.nio.ByteBuffer
import java.nio.ByteOrder
import kotlin.test.assertEquals
class ApplicationTest {
@Test
fun `test native processor with valid data`() {
// Create a buffer for two records.
// Record 1: id=1, ts=100, payload=[0x01, 0x02, 0x03] (checksum=6, even) -> valid
// Record 2: id=2, ts=200, payload=[0x04, 0x05] (checksum=9, odd) -> invalid
val buffer = ByteBuffer.allocateDirect(
(8 + 8 + 4 + 3) + (8 + 8 + 4 + 2)
).order(ByteOrder.nativeOrder())
// Record 1
buffer.putLong(1L).putLong(100L).putInt(3).put(byteArrayOf(0x01, 0x02, 0x03))
// Record 2
buffer.putLong(2L).putLong(200L).putInt(2).put(byteArrayOf(0x04, 0x05))
buffer.flip()
val outputBuffer = ByteBuffer.allocateDirect(16 * 2).order(ByteOrder.nativeOrder())
val status = ZigDataProcessor.processRecords(buffer, outputBuffer)
assertEquals(0, status, "JNI call should succeed")
outputBuffer.flip()
// Expected size of two TransformedRecords (16 bytes each)
assertEquals(32, outputBuffer.remaining())
// Check Transformed Record 1
assertEquals(1L, outputBuffer.long) // ID
assertEquals(0.toByte(), outputBuffer.get()) // Status (valid)
outputBuffer.position(outputBuffer.position() + 7) // Skip padding
// Check Transformed Record 2
assertEquals(2L, outputBuffer.long) // ID
assertEquals(1.toByte(), outputBuffer.get()) // Status (invalid)
}
}
The final result in our staging environment confirmed the hypothesis.
gantt title GC Pause Time Comparison (p99 Latency) dateFormat X axisFormat %Lms section Before (Kotlin Implementation) GC Pauses : 0, 150 GC Pauses : 300, 180 GC Pauses : 600, 120 section After (Zig JNI Implementation) GC Pauses : 0, 5 GC Pauses : 300, 7 GC Pauses : 600, 4
The GC pause times dropped to near-zero for this workload, and overall throughput increased by nearly 300%. The latency became predictable and flat.
This architectural pattern, however, is not a silver bullet. The operational complexity is significantly higher. Debugging native code requires different tools (like GDB or LLDB) and expertise. A crash in the Zig module will segfault and terminate the entire JVM process, unlike a managed exception in Kotlin. The build pipeline is more complex, especially in a CI/CD environment where build agents need the Zig toolchain installed. This approach is only justified when profiling has identified a specific, self-contained, CPU-bound workload as a critical performance bottleneck that cannot be reasonably optimized within the JVM itself. For many business applications, the simplicity and safety of staying entirely within the JVM outweigh the raw performance gains. The key is to apply this powerful technique surgically, not universally.