Skip to main content

AILANG Vision

A language where AI-generated code is easier to debug, replay, and fix.

AILANG provides:

  • Effect boundaries that constrain what code can do
  • Structured traces that show exactly what happened
  • Deterministic execution that enables replay and caching
  • Inline tests as machine-readable specs

What Makes AILANG Different

1. Effects as Capability Boundaries (Not Just Types)

Normal compilers:

  • doAnything() is fine as long as types line up
  • Side-effects are invisible to the type system
  • Logs are ad-hoc; no semantic boundary between "pure logic" and "this talks to the database"

AILANG:

-- The type signature constrains what this function CAN do
let processUser: User -> ! {DB} UserRecord =
\user. dbWrite(transformUser(user))

-- This CANNOT compile - no DB capability granted
let pureTransform: User -> UserRecord =
\user. dbWrite(user) -- ERROR: Effect mismatch

Why this matters for AI:

  • The model literally cannot "hallucinate" a network call in a pure function
  • Effect boundaries are machine-checkable constraints, not just documentation
  • The search space for valid implementations is genuinely smaller

2. Traces Are First-Class (Not Forensic Artifacts)

Normal languages give you:

  • Arbitrary logs scattered throughout code
  • Mutable globals and half-observable state
  • Trace reconstruction is forensic art

AILANG gives you:

  • A deterministic core plus effect calls as the interaction points
  • Traces structured as: pure steps + {effect invocations with typed inputs/outputs}
  • Sliceable by effect type: "Show only DB writes in this function for this test"

Example trace slice:

Function: aggregateUsers
Step 1: map(transform, users) [pure]
Step 2: DB.read(userTable) [effect: DB]
Step 3: filter(isActive, _) [pure]
Step 4: DB.write(aggregates) [effect: DB]

-- Slice to DB effects only:
DB.read(userTable) -> [User×10]
DB.write(aggregates) -> [Agg×10]

Why this matters: When something goes wrong, you know exactly where to look. The model gets structured feedback ("DB.write called 10 times but expected 5 after dedupe"), not "it crashed somewhere."


3. Inline Tests Are a Spec Surface (Not Magic)

The critique is valid: "If the AI is smart enough to write a good test, it's smart enough to write correct code."

But that misunderstands the purpose:

Inline tests are not "AI spontaneously inventing the spec."

They are:

  1. A spec channel — Humans write/approve tests, or they're generated from API contracts, examples in docs, acceptance criteria
  2. A portable contract — Tests live next to the function, in the same language, which is what the AI sees and optimizes against
  3. Strengthened over time — Today's weak test can be improved by tomorrow's model or human reviewer
let dedupe: List[User] -> List[User] =
\users. uniqueBy(_.id, users)

test dedupe {
assert dedupe([{id: 1}, {id: 1}, {id: 2}]) == [{id: 1}, {id: 2}]
assert len(dedupe(users)) <= len(users) -- property
}

Why this matters: The test isn't magic; it's a machine-readable spec that travels with the code and provides leverage for both humans and future models.


Better Feedback Loops

AI code generation is iterative. AILANG makes each iteration more productive:

TraditionalAILANG
Arbitrary logsStructured effect traces
"Something crashed""Effect mismatch at line 42"
Blind retryTargeted fix

The result: faster convergence to working code.


What Effects Actually Buy You

The effect system isn't just "types for side effects." It's a capability constraint surface.

Example: The Missed Dedupe Problem

A human asks: "Aggregate users from the database and write a summary."

In Python/Go:

def aggregate_users():
users = db.read_all() # hidden DB access
# Oops, AI forgot to dedupe
summary = compute_summary(users)
db.write(summary) # hidden DB write
log.info("Done") # ad-hoc logging

Debugging this:

  • Where did it go wrong? Check logs (if you added them)
  • What did the DB see? Unknown without tracing infrastructure
  • Can you replay? Only if you captured all inputs somewhere

In AILANG:

let aggregateUsers: () -> ! {DB, IO} Summary =
let users = dbReadAll()
-- AI forgot dedupe here
let summary = computeSummary(users)
let _ = dbWrite(summary)
let _ = print("Done")
summary

-- Trace output:
-- DB.readAll -> [User×100]
-- DB.write(Summary{count: 100}) -- Wait, why 100? Should be unique count

Debugging this:

  • Effect trace shows exactly what DB operations happened
  • You can see: "read 100 users, wrote 100 aggregates" — the bug is obvious
  • Replay: deterministic, just re-run with same inputs

The AI gets structured feedback:

Expected: DB.write(Summary{count: <unique_users>})
Actual: DB.write(Summary{count: 100})
Hint: No dedupe operation between DB.read and computeSummary

Benchmark Results — M-EVAL

We continuously track AI code generation success across 46 benchmarks with Claude, GPT, and Gemini.

Key metrics:

  • Zero-Shot: Code works on first try
  • Final Success: Code works after self-repair
  • Agent Success: Multi-turn agent completes task

The jump from zero-shot to agent mode shows structured error feedback helping models self-correct.

See the Benchmark Dashboard for live results, trends, and per-model breakdowns.


Current Capabilities — v0.5.x

AILANG today provides:

  • Algebraic Effects! {IO, FS, Net, DB} declares capabilities in types
  • Deterministic Core — pure functions are referentially transparent
  • Structured Traces — effect calls logged with typed inputs/outputs
  • Inline Tests — specs travel with code, machine-readable
  • Go Codegen — compile to native performance

Roadmap — Where We're Going

v0.6: Execution Profiles

Formalize the three execution modes:

ProfileEntry ShapeUse Case
SimProfilestep(World, Input) -> (World, Output)Simulations, games, RL
ServiceProfilehandle(Request) -> ResponseMicroservices, agents
CliProfilemain(args) -> ()CLI tools

v0.7: Deterministic Tooling

AI-friendly code transformation tools:

  • ailang normalize — canonical form for semantic comparison
  • ailang suggest-imports — auto-fix missing imports
  • ailang apply — structured code edits

v0.8: Shared Semantic State

Multi-agent coordination through language-level shared memory:

  • Semantic caching keyed by (problem + types + tests)
  • CAS-based coordination for deterministic updates
  • Effect-tracked caching patterns

What AILANG Deliberately Excludes

AILANG prioritizes machine reasoning over human ergonomics:

  • LSP/IDE servers — AIs use CLI/API, not text editors
  • Multiple syntaxes — one canonical way to express each concept
  • Implicit behaviors — all effects are explicit

These aren't limitations — they're design choices that make the language more predictable and constrainable for AI generation.


Summary

AILANG gives you:

  1. Effect constraints — Models can't generate impossible side effects
  2. Structured traces — See exactly what happened, slice by effect type
  3. Better error signals — Specific feedback, not "it crashed"
  4. Deterministic replay — Same input, same output, every time

Get Involved


AILANG — AI-first programming, done right.