The initial attempt was naive: a central Go server, WebSocket connections, and a database field acting as a lock. When a client started typing, it would acquire a lock. Any other client attempting to edit would be blocked. In staging, this worked flawlessly. In the first user test with more than three people, the user experience collapsed. The UI felt sluggish and unresponsive, with users constantly seeing a “document is locked” message. The core issue was a fundamental misunderstanding of the problem domain. Real-time collaboration isn’t a transactional database problem; it’s a distributed systems problem in miniature, and it forces an immediate confrontation with the CAP theorem.
Our requirements were clear: users must always be able to type into their local editor (Availability), and the system must function even if some clients have intermittent network connectivity (Partition Tolerance). According to the CAP theorem, by prioritizing ‘A’ and ‘P’, we must necessarily sacrifice strong Consistency. We cannot guarantee that every user sees the exact same document state at the exact same millisecond. Instead, we must aim for eventual consistency, where all replicas (clients) will converge to the same state over time.
This decision immediately invalidated locking mechanisms and made traditional Operational Transformation (OT) seem too complex, with its notorious need for a central, ordered log of operations and transformation functions to handle concurrent changes. The path forward was to adopt a data structure designed for AP systems: a Conflict-free Replicated Data Type (CRDT). For our text editor, we chose a sequence CRDT known as LSEQ (Log-structured Sequence).
The Go Backend: Implementing the LSEQ CRDT Core
An LSEQ represents a sequence of characters as a tree-like structure. Each character, or Atom
, is associated with a unique, ordered Identifier
. This identifier doesn’t just specify the character’s position; it encodes its origin (site ID) and its causal history, allowing for deterministic insertion without central coordination.
Here is the core data model in Go. In a real-world project, this would be in a dedicated crdt
package.
// crdt/lseq.go
package crdt
import (
"log"
"math/rand"
"sync"
"time"
)
// Identifier represents the position of an Atom in the sequence.
// It's a list of integers that forms a path in a conceptual tree.
// SiteID distinguishes which collaborator generated this identifier.
type Identifier struct {
Position []uint32 `json:"position"`
SiteID uint64 `json:"siteId"`
}
// Atom is the fundamental unit, representing a character and its unique position.
type Atom struct {
Value rune `json:"value"`
ID Identifier `json:"id"`
IsTombstone bool `json:"isTombstone"` // Marks the atom as deleted
}
// LSEQTree holds the entire sequence of atoms.
// The core logic for insertion and deletion resides here.
// In a production system, a more efficient data structure like a B-tree or skip list
// might be used instead of a simple slice to speed up lookups.
type LSEQTree struct {
mu sync.RWMutex
Atoms []Atom
SiteID uint64
clock uint32
}
// NewLSEQTree creates a new LSEQ tree for a given collaborator site.
// We add boundary atoms to simplify insertion logic.
func NewLSEQTree(siteID uint64) *LSEQTree {
tree := &LSEQTree{
SiteID: siteID,
Atoms: make([]Atom, 0),
clock: 0,
}
// Add "beginning-of-file" and "end-of-file" markers.
// These simplify the identifier generation logic by ensuring there's always a left and right boundary.
tree.Atoms = append(tree.Atoms, Atom{ID: Identifier{Position: []uint32{0}, SiteID: 0}})
tree.Atoms = append(tree.Atoms, Atom{ID: Identifier{Position: []uint32{uint32(1) << 16}, SiteID: 0}})
return tree
}
// findInsertIndex finds the correct position to insert a new atom.
// It performs a linear scan, which is a performance bottleneck for large documents.
// A production implementation should use a more efficient search algorithm.
func (t *LSEQTree) findInsertIndex(atom Atom) int {
// A simple linear search for the insertion point.
// This is O(n) and a major candidate for optimization.
for i := 1; i < len(t.Atoms); i++ {
if compareIdentifiers(atom.ID, t.Atoms[i].ID) < 0 {
return i
}
}
return len(t.Atoms)
}
// ApplyOp processes an incoming operation (insert or delete).
// This function must be idempotent. Applying the same operation multiple times
// should have no additional effect.
func (t *LSEQTree) ApplyOp(op Operation) {
t.mu.Lock()
defer t.mu.Unlock()
switch op.Type {
case "insert":
// Check if this atom already exists to ensure idempotency.
for _, existingAtom := range t.Atoms {
if identifiersEqual(existingAtom.ID, op.Atom.ID) {
// Already processed this op, common in distributed systems with at-least-once delivery.
return
}
}
insertIndex := t.findInsertIndex(op.Atom)
t.Atoms = append(t.Atoms[:insertIndex], append([]Atom{op.Atom}, t.Atoms[insertIndex:]...)...)
case "delete":
// Deletion is handled by marking an atom as a tombstone.
// We never truly remove atoms, which is a potential memory issue for long-running documents.
for i, atom := range t.Atoms {
if identifiersEqual(atom.ID, op.Atom.ID) {
if !t.Atoms[i].IsTombstone {
t.Atoms[i].IsTombstone = true
}
return // Found and marked, or it was already marked.
}
}
}
}
// LocalInsert generates a new atom at a specific linear index and returns the operation.
func (t *LSEQTree) LocalInsert(index int, value rune) Operation {
t.mu.Lock()
defer t.mu.Unlock()
// The index here corresponds to the visible text, so we need to map it to the underlying Atoms slice, skipping tombstones.
realIndex := t.findRealIndex(index)
prevID := t.Atoms[realIndex].ID
nextID := t.Atoms[realIndex+1].ID
newID := t.generateIdentifier(prevID, nextID)
newAtom := Atom{Value: value, ID: newID, IsTombstone: false}
op := Operation{
Type: "insert",
Atom: newAtom,
}
// Immediately apply the change locally.
// The caller is responsible for broadcasting the operation.
self.ApplyOp(op)
return op
}
// LocalDelete marks an atom at a specific linear index as a tombstone and returns the operation.
func (t *LSEQTree) LocalDelete(index int) Operation {
t.mu.Lock()
defer t.mu.Unlock()
realIndex := t.findRealIndex(index) + 1 // +1 because deletion targets the character at the index
// This check is crucial to prevent crashes on deleting at the end of the document.
if realIndex >= len(t.Atoms)-1 {
log.Printf("Error: Attempted to delete past the end of the document. Index: %d, RealIndex: %d", index, realIndex)
// Return a no-op
return Operation{Type: "noop"}
}
op := Operation{
Type: "delete",
Atom: t.Atoms[realIndex], // The atom to be tombstoned
}
self.ApplyOp(op)
return op
}
// findRealIndex converts a visible text index to an index in the Atoms slice, skipping tombstones.
func (t *LSEQTree) findRealIndex(visibleIndex int) int {
count := -1 // Start at -1 to account for the BOF marker
for i, atom := range t.Atoms {
if !atom.IsTombstone {
if count == visibleIndex {
return i - 1
}
count++
}
}
return len(t.Atoms) - 2 // Default to the position before EOF marker
}
// ToString renders the current state of the document.
func (t *LSEQTree) ToString() string {
t.mu.RLock()
defer t.mu.RUnlock()
var builder []rune
// Skip BOF and EOF markers (index 0 and len-1)
for i := 1; i < len(t.Atoms)-1; i++ {
if !t.Atoms[i].IsTombstone {
builder = append(builder, t.Atoms[i].Value)
}
}
return string(builder)
}
// generateIdentifier is the core of the LSEQ algorithm.
// It finds a positional identifier between two existing ones.
func (t *LSEQTree) generateIdentifier(prev, next Identifier) Identifier {
t.clock++ // Increment logical clock for uniqueness
var newPos []uint32
level := 0
for {
prevVal := uint32(0)
if level < len(prev.Position) {
prevVal = prev.Position[level]
}
nextVal := uint32(uint32(1) << 16) // Default to a large boundary if next is shorter
if level < len(next.Position) {
nextVal = next.Position[level]
}
// The interval between previous and next position is the available space.
interval := nextVal - prevVal
if interval > 1 {
// There's space. Pick a random position within the interval.
step := uint32(min(10, interval-1)) // Boundary strategy to avoid too-dense identifiers
randPos := prevVal + uint32(rand.Intn(int(step))) + 1
newPos = append(newPos, randPos)
break
} else {
// No space at this level, copy the prefix and deepen the tree.
newPos = append(newPos, prevVal)
level++
}
}
return Identifier{
Position: newPos,
SiteID: t.SiteID,
}
}
// Helper functions for comparing identifiers
func compareIdentifiers(id1, id2 Identifier) int {
len1, len2 := len(id1.Position), len(id2.Position)
minLen := min(len1, len2)
for i := 0; i < minLen; i++ {
if id1.Position[i] < id2.Position[i] {
return -1
}
if id1.Position[i] > id2.Position[i] {
return 1
}
}
if len1 < len2 {
return -1
}
if len1 > len2 {
return 1
}
// Positions are identical, fall back to SiteID to ensure a total order.
if id1.SiteID < id2.SiteID {
return -1
}
if id1.SiteID > id2.SiteID {
return 1
}
return 0
}
func identifiersEqual(id1, id2 Identifier) bool {
return compareIdentifiers(id1, id2) == 0
}
func min(a, b int) int {
if a < b {
return a
}
return b
}
// Operation defines a change to be broadcasted to other clients.
type Operation struct {
Type string `json:"type"` // "insert" or "delete"
Atom Atom `json:"atom"`
}
func init() {
rand.Seed(time.Now().UnixNano())
}
The key takeaway from this implementation is the generateIdentifier
function. It navigates a conceptual tree to find a unique, ordered position for a new character, guaranteeing that concurrent insertions at the same logical position will result in a deterministic, stable order based on their generated identifiers. Deletions are handled by tombstones, which is a common pattern in AP systems to avoid the complexities of coordinating permanent removal.
The Go Backend: WebSocket Hub for Broadcasting Operations
With the CRDT logic in place, the server’s next job is to manage client connections and broadcast operations. A standard Go concurrency pattern using channels is perfect for this.
// hub/hub.go
package hub
import (
"log"
"sync"
"github.com/your-repo/crdt" // Assuming the CRDT code is in this package
)
// Hub maintains the set of active clients, the document state, and broadcasts messages.
type Hub struct {
clients map[*Client]bool
broadcast chan []byte
register chan *Client
unregister chan *Client
document *crdt.LSEQTree
mu sync.Mutex
}
func NewHub() *Hub {
// For this example, the server itself has a SiteID of 0.
// In a multi-server setup, this would need to be unique per server instance.
doc := crdt.NewLSEQTree(0)
return &Hub{
broadcast: make(chan []byte),
register: make(chan *Client),
unregister: make(chan *Client),
clients: make(map[*Client]bool),
document: doc,
}
}
func (h *Hub) Run() {
for {
select {
case client := <-h.register:
h.mu.Lock()
h.clients[client] = true
log.Printf("Client registered: %s. Total clients: %d", client.conn.RemoteAddr(), len(h.clients))
// On registration, send the current full state of the document.
// This is a critical step for new clients to catch up.
// A more robust implementation would send a serialized snapshot.
initialState := h.document.ToString()
client.send <- []byte(`{"type":"initial_state", "content":"` + initialState + `"}`)
h.mu.Unlock()
case client := <-h.unregister:
h.mu.Lock()
if _, ok := h.clients[client]; ok {
delete(h.clients, client)
close(client.send)
log.Printf("Client unregistered: %s. Total clients: %d", client.conn.RemoteAddr(), len(h.clients))
}
h.mu.Unlock()
case message := <-h.broadcast:
h.mu.Lock()
for client := range h.clients {
select {
case client.send <- message:
default:
// If the send buffer is full, we assume the client is slow or dead.
close(client.send)
delete(h.clients, client)
}
}
h.mu.Unlock()
}
}
}
The hub serializes all access to the shared document state and client list through channels, avoiding the need for complex mutexing in the application logic. Each incoming operation from a client will be processed, applied to the server’s master copy of the LSEQTree, and then broadcast to all other clients.
The diagram below illustrates the message flow:
sequenceDiagram participant ClientA as Client A (React) participant Server as Go Backend (Hub) participant ClientB as Client B (React) ClientA->>+Server: Inserts 'H' at index 0 Server->>Server: LSEQTree.LocalInsert(0, 'H') -> generates Op1 Server->>Server: LSEQTree.ApplyOp(Op1) Server-->>-ClientA: (no broadcast to sender) Server-->>ClientB: Broadcasts Op1 ClientB->>+Server: Inserts 'W' at index 0 Server->>Server: LSEQTree.LocalInsert(0, 'W') -> generates Op2 Server->>Server: LSEQTree.ApplyOp(Op2) Server-->>-ClientB: (no broadcast to sender) Server-->>ClientA: Broadcasts Op2 Note right of ClientA: Receives Op2, applies to local CRDT Note right of ClientB: Receives Op1, applies to local CRDT Note over ClientA, ClientB: Both states eventually converge to "WH" or "HW" deterministically based on Op1/Op2 identifier comparison.
The React Frontend: A Custom Hook for State Synchronization
On the frontend, the complexity lies in managing the WebSocket connection and synchronizing the local state with the remote operations. A custom React hook is the ideal abstraction for this.
// src/hooks/useCRDT.js
import { useState, useEffect, useRef, useCallback } from 'react';
// A lightweight CRDT representation on the client.
// For simplicity, we don't rebuild the full LSEQ tree, we just apply operations
// to a string representation. A robust client would replicate the LSEQ structure.
const initialDoc = {
atoms: [
{ id: { position: [0], siteId: 0 } }, // BOF
{ id: { position: [Number.MAX_SAFE_INTEGER], siteId: 0 } } // EOF
],
text: ''
};
// Simplified identifier comparison on the client side
const compareIdentifiers = (id1, id2) => {
const len1 = id1.position.length;
const len2 = id2.position.length;
const minLen = Math.min(len1, len2);
for (let i = 0; i < minLen; i++) {
if (id1.position[i] < id2.position[i]) return -1;
if (id1.position[i] > id2.position[i]) return 1;
}
if (len1 < len2) return -1;
if (len1 > len2) return 1;
if (id1.siteId < id2.siteId) return -1;
if (id1.siteId > id2.siteId) return 1;
return 0;
};
export const useCRDT = (wsUrl) => {
const [document, setDocument] = useState(initialDoc);
const [isConnected, setIsConnected] = useState(false);
const ws = useRef(null);
const rebuildText = (atoms) => {
return atoms
.slice(1, -1) // Exclude BOF/EOF
.filter(atom => !atom.isTombstone)
.sort((a, b) => compareIdentifiers(a.id, b.id))
.map(atom => atom.value)
.join('');
};
useEffect(() => {
if (!wsUrl) return;
ws.current = new WebSocket(wsUrl);
ws.current.onopen = () => {
console.log("WebSocket connected");
setIsConnected(true);
};
ws.current.onclose = () => {
console.log("WebSocket disconnected");
setIsConnected(false);
};
ws.current.onerror = (error) => {
console.error("WebSocket error:", error);
};
ws.current.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.type === 'initial_state') {
// This is a simplified bootstrap. A real implementation would receive the
// full atom array and construct the initial state.
console.warn("Received initial state. Client-side state reset.");
// For now, we just set the text, which is not CRDT-compliant but demonstrates connection.
setDocument(prev => ({...prev, text: message.content}));
return;
}
// This is where incoming operations from other clients are applied.
setDocument(prevDoc => {
const newAtoms = [...prevDoc.atoms];
const op = message;
let opApplied = false;
if (op.type === "insert") {
const exists = newAtoms.some(a => compareIdentifiers(a.id, op.atom.id) === 0);
if (!exists) {
newAtoms.push(op.atom);
opApplied = true;
}
} else if (op.type === "delete") {
const atomIndex = newAtoms.findIndex(a => compareIdentifiers(a.id, op.atom.id) === 0);
if (atomIndex > -1 && !newAtoms[atomIndex].isTombstone) {
newAtoms[atomIndex].isTombstone = true;
opApplied = true;
}
}
if (opApplied) {
return { atoms: newAtoms, text: rebuildText(newAtoms) };
}
return prevDoc;
});
};
return () => {
ws.current.close();
};
}, [wsUrl]);
const sendOperation = useCallback((op) => {
if (ws.current && ws.current.readyState === WebSocket.OPEN) {
ws.current.send(JSON.stringify(op));
} else {
console.error("Cannot send operation: WebSocket is not open.");
// In a real app, queue this operation for later sending.
}
}, []);
// handleLocalChange needs to be implemented to generate and send operations
// based on user input in a textarea. This part is non-trivial as it
// requires diffing the old and new text to create insert/delete operations.
const handleLocalChange = (newText) => {
// A production implementation requires a diffing algorithm here
// to convert a text change into a series of insert/delete CRDT operations.
// This is a complex task and omitted for brevity.
console.warn("handleLocalChange requires a proper diff-to-op implementation.");
// For demonstration, we'll just update the text locally.
setDocument(prev => ({...prev, text: newText}));
}
return { documentText: document.text, isConnected, handleLocalChange, sendOperation };
};
This hook encapsulates all WebSocket logic. A component can simply use it like const { documentText, isConnected } = useCRDT('ws://localhost:8080/ws');
and bind documentText
to a textarea. The most significant challenge, omitted here for focus, is the handleLocalChange
function. It cannot simply send the new text; it must compare the previous text with the new text, determine the exact characters that were inserted or deleted, and then request the Go backend to generate the corresponding CRDT operations.
Lingering Issues and Production Hardening
This implementation successfully demonstrates the core principle of using CRDTs to build an AP-compliant collaborative system. However, it is far from production-ready. A pragmatic engineer must acknowledge the remaining gaps.
First, the memory footprint of the server’s LSEQTree is unbounded. Since tombstones are never removed, a document that is edited heavily over a long period will consume ever-increasing amounts of memory. Production systems require a garbage collection or “compaction” strategy for these tombstones, which is a complex process that must be coordinated across replicas.
Second, the initial state synchronization is naive. Sending the entire rendered string is incorrect; the server must serialize its entire Atoms
array (including tombstones) and send it to new clients so they can build a correct local replica. For large documents, this bootstrap process could be slow and needs optimization, perhaps through chunking or a more compact state representation.
Third, the client-side CRDT logic is simplified. It doesn’t generate its own operations but relies on the server. A truly partition-tolerant client should be able to generate operations while offline, queue them, and send them upon reconnection. This would require porting the full LSEQ identifier generation logic to JavaScript.
Finally, the linear scan (findInsertIndex
) in the Go implementation has O(n) complexity, which is unacceptable for large documents. A production-grade CRDT library would use a more sophisticated data structure like a balanced binary search tree or skip list to achieve O(log n) performance for insertions and lookups.