AILANG Vision
A language where AI-generated code is easier to debug, replay, and fix.
AILANG provides:
- Effect boundaries that constrain what code can do
- Structured traces that show exactly what happened
- Deterministic execution that enables replay and caching
- Inline tests as machine-readable specs
What Makes AILANG Different
1. Effects as Capability Boundaries (Not Just Types)
Normal compilers:
doAnything()is fine as long as types line up- Side-effects are invisible to the type system
- Logs are ad-hoc; no semantic boundary between "pure logic" and "this talks to the database"
AILANG:
-- The type signature constrains what this function CAN do
let processUser: User -> ! {DB} UserRecord =
\user. dbWrite(transformUser(user))
-- This CANNOT compile - no DB capability granted
let pureTransform: User -> UserRecord =
\user. dbWrite(user) -- ERROR: Effect mismatch
Why this matters for AI:
- The model literally cannot "hallucinate" a network call in a pure function
- Effect boundaries are machine-checkable constraints, not just documentation
- The search space for valid implementations is genuinely smaller
2. Traces Are First-Class (Not Forensic Artifacts)
Normal languages give you:
- Arbitrary logs scattered throughout code
- Mutable globals and half-observable state
- Trace reconstruction is forensic art
AILANG gives you:
- A deterministic core plus effect calls as the interaction points
- Traces structured as:
pure steps + {effect invocations with typed inputs/outputs} - Sliceable by effect type: "Show only DB writes in this function for this test"
Example trace slice:
Function: aggregateUsers
Step 1: map(transform, users) [pure]
Step 2: DB.read(userTable) [effect: DB]
Step 3: filter(isActive, _) [pure]
Step 4: DB.write(aggregates) [effect: DB]
-- Slice to DB effects only:
DB.read(userTable) -> [User×10]
DB.write(aggregates) -> [Agg×10]
Why this matters: When something goes wrong, you know exactly where to look. The model gets structured feedback ("DB.write called 10 times but expected 5 after dedupe"), not "it crashed somewhere."
3. Inline Tests Are a Spec Surface (Not Magic)
The critique is valid: "If the AI is smart enough to write a good test, it's smart enough to write correct code."
But that misunderstands the purpose:
Inline tests are not "AI spontaneously inventing the spec."
They are:
- A spec channel — Humans write/approve tests, or they're generated from API contracts, examples in docs, acceptance criteria
- A portable contract — Tests live next to the function, in the same language, which is what the AI sees and optimizes against
- Strengthened over time — Today's weak test can be improved by tomorrow's model or human reviewer
let dedupe: List[User] -> List[User] =
\users. uniqueBy(_.id, users)
test dedupe {
assert dedupe([{id: 1}, {id: 1}, {id: 2}]) == [{id: 1}, {id: 2}]
assert len(dedupe(users)) <= len(users) -- property
}
Why this matters: The test isn't magic; it's a machine-readable spec that travels with the code and provides leverage for both humans and future models.
Better Feedback Loops
AI code generation is iterative. AILANG makes each iteration more productive:
| Traditional | AILANG |
|---|---|
| Arbitrary logs | Structured effect traces |
| "Something crashed" | "Effect mismatch at line 42" |
| Blind retry | Targeted fix |
The result: faster convergence to working code.
What Effects Actually Buy You
The effect system isn't just "types for side effects." It's a capability constraint surface.
Example: The Missed Dedupe Problem
A human asks: "Aggregate users from the database and write a summary."
In Python/Go:
def aggregate_users():
users = db.read_all() # hidden DB access
# Oops, AI forgot to dedupe
summary = compute_summary(users)
db.write(summary) # hidden DB write
log.info("Done") # ad-hoc logging
Debugging this:
- Where did it go wrong? Check logs (if you added them)
- What did the DB see? Unknown without tracing infrastructure
- Can you replay? Only if you captured all inputs somewhere
In AILANG:
let aggregateUsers: () -> ! {DB, IO} Summary =
let users = dbReadAll()
-- AI forgot dedupe here
let summary = computeSummary(users)
let _ = dbWrite(summary)
let _ = print("Done")
summary
-- Trace output:
-- DB.readAll -> [User×100]
-- DB.write(Summary{count: 100}) -- Wait, why 100? Should be unique count
Debugging this:
- Effect trace shows exactly what DB operations happened
- You can see: "read 100 users, wrote 100 aggregates" — the bug is obvious
- Replay: deterministic, just re-run with same inputs
The AI gets structured feedback:
Expected: DB.write(Summary{count: <unique_users>})
Actual: DB.write(Summary{count: 100})
Hint: No dedupe operation between DB.read and computeSummary
Benchmark Results — M-EVAL
We continuously track AI code generation success across 46 benchmarks with Claude, GPT, and Gemini.
Key metrics:
- Zero-Shot: Code works on first try
- Final Success: Code works after self-repair
- Agent Success: Multi-turn agent completes task
The jump from zero-shot to agent mode shows structured error feedback helping models self-correct.
See the Benchmark Dashboard for live results, trends, and per-model breakdowns.
Current Capabilities — v0.5.x
AILANG today provides:
- Algebraic Effects —
! {IO, FS, Net, DB}declares capabilities in types - Deterministic Core — pure functions are referentially transparent
- Structured Traces — effect calls logged with typed inputs/outputs
- Inline Tests — specs travel with code, machine-readable
- Go Codegen — compile to native performance
Roadmap — Where We're Going
v0.6: Execution Profiles
Formalize the three execution modes:
| Profile | Entry Shape | Use Case |
|---|---|---|
| SimProfile | step(World, Input) -> (World, Output) | Simulations, games, RL |
| ServiceProfile | handle(Request) -> Response | Microservices, agents |
| CliProfile | main(args) -> () | CLI tools |
v0.7: Deterministic Tooling
AI-friendly code transformation tools:
ailang normalize— canonical form for semantic comparisonailang suggest-imports— auto-fix missing importsailang apply— structured code edits
v0.8: Shared Semantic State
Multi-agent coordination through language-level shared memory:
- Semantic caching keyed by (problem + types + tests)
- CAS-based coordination for deterministic updates
- Effect-tracked caching patterns
What AILANG Deliberately Excludes
AILANG prioritizes machine reasoning over human ergonomics:
- ❌ LSP/IDE servers — AIs use CLI/API, not text editors
- ❌ Multiple syntaxes — one canonical way to express each concept
- ❌ Implicit behaviors — all effects are explicit
These aren't limitations — they're design choices that make the language more predictable and constrainable for AI generation.
Summary
AILANG gives you:
- Effect constraints — Models can't generate impossible side effects
- Structured traces — See exactly what happened, slice by effect type
- Better error signals — Specific feedback, not "it crashed"
- Deterministic replay — Same input, same output, every time
Get Involved
- Try it: See Getting Started
- Benchmark it: Run
ailang eval-suiteto test AI code generation - Contribute: github.com/sunholo-data/ailang
AILANG — AI-first programming, done right.