ZJIT: Inside Ruby 4.0's Next-Generation JIT Compiler on Linux

Ruby 4.0.0 was released on December 25, 2025 -- thirty years after Ruby's first public release on December 21, 1995 -- and it shipped with two headline features: Ruby Box, an experimental namespace isolation mechanism, and ZJIT, a new just-in-time compiler. While Ruby Box generated plenty of discussion around multi-tenant deployments and test isolation, ZJIT is the change that will define Ruby's performance story for the next decade. It is not an incremental improvement. It is a different kind of compiler, built on fundamentally different principles, targeting a higher performance ceiling than its predecessor YJIT was ever designed to reach. (The name carries no specific meaning, according to heise online's analysis of Ruby 4.0 -- it simply stands as the successor to YJIT, internally called the "scientific successor" because its architecture mirrors classic compiler textbooks.)

As of March 2026, Ruby 4.0.2 (released March 16, 2026) is the current stable release. This article covers ZJIT from the ground up: the architectural decisions behind it, how it fits into the Linux runtime, how to enable and profile it, and what the roadmap toward Ruby 4.1 production readiness actually looks like.

Why Build a New Compiler at All?

To understand ZJIT, you need to understand why YJIT, Ruby's previous JIT, stopped being enough. YJIT was developed by a team at Shopify and merged into Ruby 3.1 in December 2021. It delivered real, measurable gains. According to the Rails at Scale blog, YJIT achieved around 2x speedup over the interpreter on representative workloads by Ruby 3.3, peaking near 2.8x on benchmarks like Liquid rendering, and deployment reports from companies like Shopify, Discourse, and Mastodon showed speed improvements as high as 30 percent just from flipping the --yjit flag. By Ruby 3.4, Shopify's own benchmark suite reported a 92 percent speedup over the interpreter on x86-64.

But YJIT is hitting a ceiling. Its core design -- Lazy Basic Block Versioning (LBBV) -- compiles one basic block at a time as code executes, specializing each block for the types it observes at runtime. This is clever and fast to compile, but it limits the optimizer. When you only see one block at a time, you cannot perform the kind of cross-block analysis that enables aggressive constant folding, dead code elimination, or inlining. The compilation unit is just too small.

There is also a contributor problem. YJIT's LBBV architecture is genuinely novel -- novel enough that few engineers outside the core team have experience building or modifying it. This concentrates institutional knowledge dangerously. As the Ruby 4.0 structural analysis on DEV Community put it, YJIT's architecture mirrors nothing taught in standard compiler courses, while ZJIT's SSA-based method compiler mirrors the architecture found in GCC, LLVM, and the JVM's HotSpot. The goal is to make Ruby's JIT infrastructure something that a broader community of systems programmers can actually understand and modify.

In the launch announcement on Rails at Scale (December 24, 2025), the ZJIT team -- Aaron Patterson, Aiden Fox Ivey, Alan Wu, Jacob Denbeaux, Kevin Menard, Max Bernstein, Maxime Chevalier-Boisvert, Randy Stauner, Stan Lo, and Takashi Kokubun -- stated that the twin goals of the new compiler were raising the performance ceiling through a larger compilation unit and SSA IR, and broadening the contributor base by adopting a more traditional method compiler architecture that mirrors textbook designs rather than the novel LBBV approach. As the original upstream proposal on the Ruby issue tracker (bugs.ruby-lang.org, issue #21221) put it: "YJIT is very limited when it comes to optimizations that cross YARV instruction boundaries."

The Architecture: SSA, HIR, and Method-Level Compilation

ZJIT compiles complete methods rather than individual basic blocks. When a method has been called enough times to cross the JIT threshold (30 calls by default), ZJIT compiles the entire method body at once. This larger compilation unit is the prerequisite for every meaningful optimization ZJIT performs.

Static Single Assignment Form

The centerpiece of ZJIT's internal representation is Static Single Assignment (SSA) form. In SSA, every variable is assigned exactly once. If a variable would normally be overwritten, SSA renames it -- you get x_1, then x_2, rather than the same x written twice. This property, which sounds simple, has profound implications for what the compiler can prove about your code.

In standard Ruby bytecode (YARV instructions), a local variable can be assigned in multiple places across multiple control-flow paths. Tracking all the possible values that variable might hold requires complex dataflow analysis. In SSA form, every use of a variable points directly back to exactly one definition. The compiler can trivially answer "what is the value of this variable at this point?" without chasing through a web of possible assignments. This simplifies constant propagation, value numbering, and type inference significantly.

ZJIT's High-Level Intermediate Representation (HIR) is where SSA lives. YARV bytecode is first converted to HIR, which preserves Ruby semantics while restructuring the data flow into a graph form. In HIR, jumps have direct pointers to their target basic blocks rather than encoded offsets, and there is no implicit operand stack -- every instruction that uses a value holds a pointer directly to the instruction that produced it. This graph structure is what lets the optimizer reason globally about a method rather than locally about a single block.

Note

You can inspect ZJIT's HIR directly on a Linux system with ./miniruby --zjit --zjit-dump-hir --zjit-call-threshold=1 -e "1 + 1". Setting the call threshold to 1 forces compilation immediately, bypassing the profiling warmup. The output shows the HIR graph before and after each optimization pass. Note: --zjit-dump-hir requires a development build configured with --enable-zjit=dev -- it is not available in the standard release binary.

The Compilation Pipeline

ZJIT's compilation pipeline has three stages. First, YARV bytecode is lifted into HIR. Second, a modular high-level optimizer runs a series of passes over the HIR graph -- type inference, constant folding, branch folding, dead code elimination, and more. Third, the optimized HIR is lowered to a Low-Level IR (LIR) and finally to native machine code via a code generation backend. On Linux x86-64 and arm64/aarch64, this produces native instructions written directly into executable memory.

The optimizer passes are composable and independently testable, which is a significant advantage over YJIT's architecture where optimization is interleaved with code generation. If you add a new optimization pass to ZJIT, you can unit-test it against HIR snapshots in isolation. The project uses cargo-insta for snapshot testing, and the test suite requires cargo-nextest to run each test in its own process -- necessary because CRuby only supports a single boot per process and many of its APIs are not thread-safe.

Type Inference and Guards

Ruby is dynamically typed, which means the compiler cannot know statically that a variable holds an Integer rather than a String. ZJIT handles this with a combination of profiling and guard instructions. Before ZJIT compiles a method, Ruby's interpreter profiles it, recording the types that each operation observes. ZJIT reads that historical type information and generates specialized code with embedded type guards.

A guard instruction -- say, GuardType(Fixnum) -- checks at runtime that the actual value matches the expected type. If the guard passes, the following code can assume the type and use fast, unboxed arithmetic. If the guard fails, ZJIT performs a side-exit: it transfers control back to the interpreter, which handles the unexpected type correctly. In the May 2025 merge announcement, the core team noted that side-exit capability did not exist at the time of ZJIT's initial merge. By the Ruby 4.0 release in December 2025, side exits were fully implemented and used liberally throughout the compiler.

Pro Tip

On Linux, use --zjit-perf to dump ISEQ (instruction sequence) symbols into /tmp/perf-{PID}.map. This integrates ZJIT's compiled methods with perf for CPU profiling, letting you see exactly which compiled Ruby methods are consuming cycles in a flame graph.

ZJIT on Linux: Platform Support and Build Requirements

ZJIT is supported on Linux (as well as macOS and BSD) on both x86-64 and arm64/aarch64 architectures. Building Ruby with ZJIT requires Rust 1.85.0 or newer. On most Linux distributions, the system Rust package will be behind this requirement, so you will need to install via rustup.

building Ruby 4.0 with ZJIT on Linux

# Install Rust via rustup (system Rust may be too old)
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
$ rustup update stable
$ rustc --version
rustc 1.85.0 (4d91de4e4 2025-02-17)

# Clone Ruby 4.0 and configure with ZJIT support
$ git clone https://github.com/ruby/ruby.git
$ cd ruby && git checkout v4_0_0
$ ./autogen.sh
$ ./configure --enable-zjit
$ make -j$(nproc)
$ make install

# Or install via mise (recommended for version management)
$ mise install [email protected]
$ mise use [email protected]

ZJIT is compiled into the Ruby binary by default in Ruby 4.0 but is not enabled at runtime unless you opt in. There are three ways to enable it:

enabling ZJIT

# 1. Command-line flag
$ ruby --zjit my_script.rb

# 2. Environment variable (useful for Rack/Rails apps)
$ RUBY_ZJIT_ENABLE=1 bundle exec rails server

# 3. Runtime API (enable after startup, e.g. after warmup)
RubyVM::ZJIT.enable

Memory usage will be higher than the pure interpreter because ZJIT allocates executable memory for generated machine code and maintains additional state for the compilation pipeline. This is a standard tradeoff for any JIT compiler, and ZJIT exposes several command-line options to tune its memory budget and compilation behavior.

ZJIT tuning options

# Cap executable memory at 64 MiB (default: 128 MiB)
$ ruby --zjit --zjit-mem-size=64 my_app.rb

# Raise the compilation threshold (default: 30 calls)
# Useful when processes are short-lived and you want fewer methods compiled
$ ruby --zjit --zjit-call-threshold=50 my_app.rb

# Change how many interpreter runs profile a method before ZJIT compiles it
# (default: 5 profiled calls, separate from the compilation threshold)
$ ruby --zjit --zjit-num-profiles=10 my_app.rb

The distinction between --zjit-call-threshold and --zjit-num-profiles is worth understanding. ZJIT's warmup is actually two-phase: a method must first be called a certain number of times to be considered worth compiling (the call threshold, default 30), and within those calls, ZJIT profiles it for type information across a separate count of interpreter runs (the profile count, default 5). The profiling phase happens first and is embedded within the threshold window. Raising --zjit-num-profiles gives the type inference engine more observed calls to work from, potentially producing tighter type specializations at the cost of delaying compilation. For short-lived processes that handle fewer requests per worker than the default call threshold, lowering --zjit-call-threshold may help ZJIT reach hot methods before the process is recycled.

What ZJIT Optimizes Today

By the time Ruby 4.0 shipped in December 2025, ZJIT had moved far beyond its initial May 2025 state. The Rails at Scale launch post documents the progression in concrete terms. When ZJIT was first merged, it could optimize only fixnum arithmetic and method sends to the main object. By December, the list had grown substantially.

ZJIT now optimizes all sorts of method sends, instance variable reads and writes, attribute accessor and reader/writer methods, struct reads and writes, object allocations, string operations, and optional parameters. It performs constant folding across methods -- because it has a limited inliner that can inline constants, self, and parameters, it can fold the entirety of a two-method addition down to a single constant value at compile time, while still correctly handling cases where those methods are redefined at runtime.

ZJIT also has a feature analogous to JavaScriptCore's DOMJIT: an inline C method "inliner" that emits HIR representations of known C methods rather than generating actual C function calls. This allows the optimizer to reason about built-in methods like Integer#succ (used to drive Integer#times loops) as if they were Ruby methods, enabling the type inference and constant folding machinery to operate across what would otherwise be opaque C boundaries.

Writing on Rails at Scale on December 24, 2025, Max Bernstein described how dramatically ZJIT's scope had expanded since the May 2025 merge: at that point the compiler handled little beyond fixnum arithmetic and method sends to the main object, whereas by the December release it covered a substantially broader range of operations. By launch, the team had verified ZJIT against the full Ruby test suite, shadow traffic from a large Shopify application, and the GitHub.com test suite without correctness failures.

ZJIT vs. YJIT: Where Things Stand

YJIT remains the recommended production JIT in Ruby 4.0. According to heise online's analysis of Ruby 4.0, YJIT delivers a 92 percent speedup over the interpreter in Shopify's benchmark suite on x86-64 as of Ruby 3.4. The practical proof arrived on Black Friday 2025: Shopify's Ruby on Rails infrastructure processed purchases from 81 million customers, hitting 117 million requests per minute on application servers. If you are running Ruby on Linux servers tuned for high-traffic workloads, YJIT is where you should be today. ZJIT currently beats the interpreter on all measured benchmarks but has not yet matched YJIT across the board.

The gap is expected and deliberate. ZJIT's team spent much of 2025 building correct foundations -- side exits, guard generation, the full optimization pipeline, register spilling for large functions, C method inlining -- rather than chasing benchmark numbers. The goal was to establish a correct, stable base from which aggressive Ruby-specific optimizations could be layered in.

Warning

The Ruby core team explicitly advises against deploying ZJIT in production in Ruby 4.0. The compiler is new enough that crashes and unexpected performance degradations are possible. Run it in CI, test it locally against your application, and report issues to the Ruby bug tracker or the Shopify/ruby GitHub repository. Production readiness is targeted for Ruby 4.1.

The key architectural difference that will determine ZJIT's long-term ceiling: YJIT's LBBV approach optimizes within the scope of a single basic block with type specialization at the edges. ZJIT's SSA-based method compiler can perform global optimizations across an entire method -- value numbering, loop-invariant code motion, and eventually cross-method inlining -- none of which are tractable in YJIT's model. The bet is that this architectural advantage will compound as the optimizer matures.

There is also a planned capability that has no equivalent in YJIT: saving and reusing compiled code between program executions. This was mentioned explicitly in Maxime Chevalier-Boisvert's RubyKaigi 2025 presentation (held in Matsuyama, Ehime, Japan) as a long-term ZJIT goal. If realized, it would eliminate warmup time entirely for long-running applications, a significant win for Rails servers that currently need time to reach peak JIT performance after a restart.

Should You Enable ZJIT Today? A Practical Decision Guide

The short answer: not in production, but absolutely in CI and staging. Here is a more precise breakdown by scenario.

Enable ZJIT if you are: a Ruby runtime contributor or compiler researcher, a developer running ZJIT in a dedicated CI lane to catch regressions before they hit ruby-lang.org, someone stress-testing a new application under various JIT configurations, or someone specifically looking to file bug reports that help the Ruby team. The ZJIT team has explicitly said bug reports from real-world applications are among their highest-value contributions at this stage -- and because the compiler is new, there is a meaningful chance a bug you hit has never been seen before.

Use YJIT instead if you are: running any production workload, operating under SLA constraints, using Ruby on a Windows system (ZJIT does not support Windows), or running Ruby in an environment where executable memory is tightly constrained. YJIT is battle-hardened across hundreds of production deployments including GitHub, Shopify, Discourse, and Mastodon. Its stability profile is well-understood.

Warning

ZJIT and YJIT cannot run simultaneously in the same process. If YJIT is already enabled when you call RubyVM::ZJIT.enable, the call returns false and prints a warning: "Only one JIT can be enabled at the same time." This applies in both directions. If your Rack or Rails boot sequence enables YJIT early (via RUBY_YJIT_ENABLE=1 or RubyVM::YJIT.enable), a later call to RubyVM::ZJIT.enable will silently do nothing. Make sure you are not setting both environment variables at once in a test environment.

Use neither if you are: running short-lived scripts, one-off data migrations, or any process where the overhead of JIT warmup exceeds the time the process actually runs. JIT compilers -- both YJIT and ZJIT -- require a warmup period before their compiled code is executing. For a script that runs for under a second, both JITs add overhead without benefit.

Pro Tip

To compare ZJIT and YJIT against your own codebase without deploying to production, run your full test suite three ways: ruby my_suite.rb (interpreter baseline), ruby --yjit my_suite.rb, and ruby --zjit --zjit-stats=quiet my_suite.rb. The quiet stats flag collects ZJIT telemetry without printing it on each run, so your test output stays clean. Then check RubyVM::ZJIT.runtime_stats at the end of the suite to see what ZJIT compiled, how many side exits it took, and what it left on the table.

One practical note: the --zjit-stats flag itself does not require a special build. The --zjit-stats=quiet variant suppresses output during the run and lets you inspect stats programmatically. The --zjit-dump-hir and --zjit-dump-hir-iongraph flags require a development build (--enable-zjit=dev at configure time) -- they are not available in the standard release binary.

Profiling ZJIT on Linux

ZJIT ships with a statistics mode and several developer-facing flags that are genuinely useful for understanding what the compiler is doing with your code. If you want to go deeper on the Linux side of this, the eBPF tracing guide on this site covers how to instrument running processes without the overhead of traditional ptrace-based tools -- a complementary approach when perf alone isn't granular enough.

ZJIT diagnostic flags

# Enable statistics collection (outputs on exit)
$ ruby --zjit --zjit-stats my_app.rb

# Quiet stats -- collect but suppress output (for production testing)
$ ruby --zjit --zjit-stats=quiet my_app.rb

# Dump HIR for compiled methods (inspect optimizer output)
$ ruby --zjit --zjit-dump-hir my_app.rb

# Dump HIR as Iongraph JSON for visual inspection
$ ruby --zjit --zjit-dump-hir-iongraph my_app.rb
# Then collate the per-function JSON files:
$ jq --slurp --null-input '.functions=inputs | .version=1' \
    /tmp/zjit-iongraph-<PID>/func*.json > ~/ion.json
# Open ion.json at https://mozilla-spidermonkey.github.io/iongraph/

# Linux perf integration -- dump symbols for flame graphs
$ ruby --zjit --zjit-perf my_app.rb &
$ sudo perf record -p $! -g -- sleep 30
$ sudo perf report

# Record side-exit sources for tuning
$ ruby --zjit --zjit-trace-exits my_app.rb

# Log every compiled ISEQ name to a file (useful for coverage audits)
$ ruby --zjit --zjit-log-compiled-iseqs=/tmp/zjit-compiled.log my_app.rb

# Disable ZJIT initially, enable it after manual warmup
$ ruby --zjit-disable my_app.rb
# Then call RubyVM::ZJIT.enable in code when ready

The Iongraph integration is particularly useful. The ZJIT team integrated Mozilla's Iongraph visualization tool into ZJIT's developer workflow in late 2025. The result is a clickable, zoomable graph of every function's HIR at each optimization pass -- you can step through constant folding, type inference, and dead code elimination visually and see exactly what the compiler proved about your code and what it eliminated.

Side-exit analysis has a second, higher-resolution path via StackProf integration. When you run with --zjit-trace-exits, ZJIT records which YARV instruction triggered each side exit. You can then serialize that data to a file and read it with StackProf to get a call-stack-aware flame graph of exit pressure:

exit location analysis with StackProf

# Run with exit tracing enabled
$ ruby --zjit --zjit-trace-exits my_app.rb

# At the end of your program (or in an at_exit hook), dump to file:
RubyVM::ZJIT.dump_exit_locations("zjit_exits.dump")

# Then read it with StackProf to see which methods are causing the exits:
require "stackprof"
data = Marshal.load(File.read("zjit_exits.dump"))
StackProf::Report.new(data).print_text

The output shows exit counts grouped by call stack, so you can see not just which instruction triggered a side exit but in which call chain it appears most frequently. Methods with high exit counts are good candidates for investigation: check whether they are receiving unexpected types, calling yield or super (currently unoptimized), or hitting a code path ZJIT does not yet handle. The --zjit-trace-exits-sample-rate=N flag lets you sample one exit per N occurrences to reduce overhead when exit volume is high.

What Comes Next: The Road to Ruby 4.1

The core team has been explicit about what ZJIT still needs before it can be declared production-ready and performance-competitive with YJIT. The roadmap items are drawn directly from the Rails at Scale December 2025 launch post -- but the technical depth behind each item is worth unpacking, because these are not incremental fixes. They represent the compiler research problems that will determine whether ZJIT's architectural bet actually pays off.

Shape-Transition Instance Variable Writes

ZJIT currently optimizes instance variable reads and writes in the common case where the object's shape is stable. But setinstancevariable in the shape-transition case -- where assigning to an ivar changes the object's internal shape for the first time -- is not yet optimized. This matters for typical Ruby initialization patterns: when a constructor assigns multiple instance variables in sequence, each assignment potentially triggers a shape transition, and ZJIT currently falls back to the interpreter for these.

The team has also noted that @a ||= b, a common Rails idiom for memoization, could be optimized more aggressively than a plain assignment using value numbering. The standard lowering of @a ||= b emits a getinstancevariable, a branch on truthiness, and a conditional setinstancevariable. In the common case where @a is read-heavy and written only once, ZJIT's type inference can observe that after the first assignment @a is never nil. With that proof in hand, the compiler can eliminate the conditional branch entirely on subsequent calls to the enclosing method, reducing a three-instruction sequence to a single register load. This is a nontrivial optimization for Rails applications because ||= memoization is used pervasively: current_user, request headers, scoped associations, and many ActiveRecord helper methods all rely on it. The value numbering pass that enables this optimization is the same one that drives constant folding -- the two capabilities share infrastructure, so progress on one accelerates the other.

There is a deeper shape-related opportunity that goes beyond the obvious transition case. Ruby's object shape system, introduced in Ruby 3.2, assigns each object a shape identifier based on which instance variables have been set and in what order. ZJIT can use the shape identifier as a compile-time key for ivar slot lookups, eliminating the runtime hash table lookup that the interpreter uses. For objects with a stable, well-known shape -- which describes the overwhelming majority of ActiveRecord model instances -- this means ivar reads compile down to a fixed-offset memory load from the object's internal slot array. The shape-transition case matters primarily at object construction time, which is why optimizing it is particularly high value: a faster initialize means fewer interpreter fallbacks during the warmup period that precedes peak JIT performance.

Register Allocator Rewrite

The most impactful near-term work is a new register allocator. The existing allocator, borrowed from YJIT's backend, works but was designed for YJIT's block-at-a-time model. ZJIT compiles entire methods at once, which generates longer live ranges and more complex interference graphs than YJIT's allocator was designed to handle.

The deeper issue is that the current allocator cannot efficiently represent polymorphic call sites -- cases where a method send sees more than one receiver class. When ZJIT encounters a polymorphic send today, it takes a conservative side-exit. A more capable allocator, combined with type-splitting through the HIR graph, would allow ZJIT to emit specialized code paths for each observed receiver type inline rather than bailing out. This is the mechanism that allows V8 and HotSpot to handle the full spectrum of send polymorphism without constantly falling back to the interpreter.

The mechanism in detail: when a call site has observed two receiver types during profiling, ZJIT could emit a two-branch inline cache -- a guard for type A leading to compiled code for type A's method, and a guard for type B leading to compiled code for type B's method, with the interpreter as the fallback for anything else. This is inline caching at the HIR level rather than at the interpreter level, and it is only tractable because ZJIT's SSA graph gives the register allocator global visibility into which values are live at each branch. A block-at-a-time allocator like YJIT's cannot make this determination without re-analyzing every predecessor block. The new allocator would also handle register spilling more intelligently for large methods: rather than spilling eagerly to the stack whenever the register file is full, it can use the interference graph to identify which live ranges have minimal overlap and prefer to keep high-frequency values in registers across the whole method body. For compute-intensive Ruby code -- numeric processing, string transformation pipelines, data structure traversals -- this can substantially reduce memory traffic in the generated machine code.

The team notes that the high-80s to low-90s percent of real-world sends are monomorphic anyway, so the urgency is moderated -- but for Rails codebases with significant ActiveRecord polymorphism, this is the optimization that closes the remaining gap to YJIT in mixed-type method dispatching.

Yield and Super Compilation

Two common bytecode instructions -- invokeblock (used by yield) and invokesuper (used by super) -- are not yet optimized. This matters more than benchmark numbers suggest. Rails callbacks, ActiveSupport::Concern chains, and virtually every well-factored Ruby library use yield as the primary abstraction boundary. When ZJIT falls back to the interpreter on yield, it also discards all the type information it has built up in the surrounding method, because the side-exit resets profiling context. Compiling invokeblock properly means treating the block's type signature as an additional specialization axis -- ZJIT would need to inline the block's HIR into the caller's HIR graph and propagate type information across that boundary, analogous to how HotSpot handles inlined lambdas in Java streams.

The invokesuper case is similarly structural. Ruby's method resolution order (MRO) for super calls is linearized at class definition time via C3 linearization, which means the target of a super call is statically knowable in the common case where the inheritance hierarchy has not been modified at runtime. ZJIT could speculatively inline the super target's HIR into the caller with a guard on the class hierarchy version counter -- a technique used in the JVM's inline cache invalidation scheme -- allowing the optimizer to fold away the dispatch overhead entirely in stable class hierarchies, which describes virtually every production Rails application.

General-Purpose Method Inlining

The Rails at Scale December 2025 launch post identifies general-purpose method inlining as a planned capability: once in place, it will reduce polymorphic sends, enable branch folding, and cut overall method send overhead. ZJIT currently inlines only constants, self, and parameters -- a narrow but well-chosen subset that enables constant folding across module-level accessor patterns. The next step is a full inliner. Unlike YJIT's block-level model, ZJIT's SSA HIR is structurally ready for method inlining: because every use of a value points directly to its definition, merging a callee's HIR into a caller's HIR is a well-defined graph operation. The compiler inserts the callee's entry block, replaces parameter references with the caller's argument values, and merges return values at the call site.

The practical consequence is significant. A full inliner eliminates an entire class of sends that would otherwise require type guards and potential side-exits. It also unlocks cross-method constant folding: if a method always returns a literal value under a given type specialization, the inliner reduces the call to that literal, and dead code elimination removes the rest. For Rails applications that chain through many thin wrapper methods -- attribute readers, delegators, simple predicate methods -- this is the optimization that turns the method call overhead from a tax on abstraction into a compile-time zero-cost abstraction, similar to what Rust's monomorphization provides for generic code.

Lazy Frame Writes

Another pending optimization involves lazy frame writes. ZJIT currently flushes local variable state to the VM frame on every effectful operation. This is correct but expensive: it means that every iteration of a hot loop writes its locals back to the heap-allocated frame object, even when nothing outside the compiled code ever reads them. The cases where that reified frame state actually needs to be visible -- exception unwinding, Binding#local_variable_get, set_trace_func -- are rare in production code.

The solution mirrors what LLVM does with alloca promotion via mem2reg: keep values in virtual registers through the HIR graph, and only materialize them to the frame object when the compiler can prove that an operation might observe the frame. In ZJIT's case, this means tagging HIR nodes that could trigger unwinding or binding capture, and emitting frame-flush instructions only at those points. For tight numeric loops with no exception handling, this eliminates memory traffic almost entirely -- the locals live in machine registers for the full duration of the loop body, and the frame is updated only on exit or at explicit safepoints.

Code Caching Between Process Starts

Longer-term, ZJIT is targeting something YJIT does not attempt: persisting compiled machine code across process restarts. Every Rails server restart currently throws away all JIT-compiled code and requires a warmup period -- typically several thousand requests -- before the JIT reaches peak performance. This means that rolling deploys, Kubernetes pod restarts, and any crash-recovery scenario incur a predictable performance trough.

The technical challenge is that ZJIT's compiled code contains absolute memory addresses, type specializations tied to a specific run's class layout, and inline cache entries that reference live Ruby objects. Making this portable across restarts requires either position-independent code generation (shifting from absolute to relative addressing throughout the backend), or a serialization format that rebases addresses on load -- the approach taken by the JVM's class data sharing and V8's code cache. Either path is non-trivial, but the SSA HIR is the right foundation for it: because ZJIT's IR is a clean graph structure rather than YJIT's interleaved code-generation output, the compiled artifacts have a well-defined serializable form.

The practical implications extend beyond raw startup time. The warmup problem is asymmetric: a cold Rails process is both slower and more memory-hungry than a warmed-up one, because before JIT compilation kicks in, the interpreter's object allocation rate is higher and garbage collection pressure is elevated. Code caching would make the warmup curve nearly flat. For high-availability deployments using preforking servers like Puma in cluster mode, it would also allow forked workers to inherit already-compiled code from the master process, a capability analogous to what Ruby's Copy-on-Write semantics provide for heap objects. The code cache itself introduces a new attack surface -- a compromised or corrupted cache file could contain arbitrary native code -- so the serialization format will need integrity verification, likely via a hash of the Ruby bytecode and type profile that produced each cached compilation unit. This is not an unsolved problem: V8's code cache includes a validation hash and recompiles on mismatch rather than executing stale cached code.

Note

ZJIT development discussion happens in a public Zulip workspace at zjit.zulipchat.com. The team uses it for casual technical discussion, design reviews, and onboarding new contributors. If you want to contribute patches or understand the internals more deeply, this is the right place to start.

Contributing to ZJIT

One of ZJIT's explicit design goals is to lower the barrier for outside contributions. Being implemented in Rust, using a conventional SSA IR, and following textbook method compiler architecture means that any systems programmer familiar with Rust and compiler fundamentals can read the code and understand what it is doing. This is a deliberate departure from YJIT's situation.

The project uses standard Rust tooling. Tests are run via make zjit-test rather than directly via cargo test, because test binaries link against CRuby and the CRuby runtime only supports a single boot per process. Snapshot tests use cargo-insta and can be updated with make zjit-test-update. The ZJIT source lives in the zjit/ directory of the CRuby repository.

The core team asks that contributors with large patches reach out on Zulip or open a GitHub issue before submitting a pull request, specifically to avoid the common open source failure mode where significant work is rejected because it doesn't fit the project's design direction. Bug reports are especially valuable -- the codebase is new enough that many real-world bugs have never been reported.

The Bigger Picture

ZJIT is not a feature you flip on to get faster Rails applications today. It is an investment in Ruby's performance infrastructure for the next several years. The architectural choices made in ZJIT -- SSA form, method-level compilation, a modular optimizer, Rust implementation, Iongraph visualization, Linux perf integration -- are the choices of a team that intends to keep pushing the performance ceiling for a long time.

YJIT will remain the production default through Ruby 4.0 and likely into Ruby 4.1 as well. But the trajectory is clear: once ZJIT's register allocator is complete, yield and super are optimized, and general-purpose inlining is in place, the SSA IR's theoretical advantages over LBBV will start showing up in benchmark numbers. The heise online analysis of Ruby 4.0 put it plainly: ZJIT, Modular GC, and improved Ractors are the infrastructure for the next decade, not just the next release. ZJIT is internally referred to as the "scientific successor" because its architecture corresponds to classic compiler textbooks -- making it genuinely easier to understand, extend, and contribute to than its predecessor.

For Linux engineers running Ruby in production: the current stable release is Ruby 4.0.2 (released March 16, 2026). Install it, test your application with --zjit-stats=quiet, file any crashes or regressions you find, and keep an eye on Ruby 4.1. The upgrade path from YJIT to ZJIT will be a single flag change. The compiler will be ready before you know it.

Sources

^ back to top