Moving ATONE to UE5 IRIS — and what broke first

We finished migrating ATONE's replication layer to IRIS in January. It took seven months, one near-rollback, and the realization — about six weeks in — that we were going to have to build tooling Epic hadn't shipped yet. This is the writeup we'd have wanted before we started.

First, context. ATONE is a dense-area MMORPG targeting 70 concurrent players per zone with full character animation, physics-driven movement (Mover 2.0), server-authoritative abilities, and handcrafted Nanite geometry in the field of view. Our legacy replication layer worked, but it didn't have headroom. We were spending too much CPU on the send path, and our bandwidth-per-player numbers were creeping up every time the design team added interactions.

IRIS promises three things we wanted: push-model replication (you tell it when data changes, instead of it polling every frame), NetObjectGroups for declarative filtering, and a cleaner separation between replication policy and data layout. The pitch matched our problem. That's how we ended up here.

The decision was easier than the migration

We made the call in May 2025. The reasoning was short: we were going to outgrow legacy replication within a year of live operations, and migrating while still in production was cheaper than migrating after launch.

We gave ourselves four months. We took seven. The extra three came from two things we didn't budget: tooling gaps and Mover interaction. I'll get to both.

What migrated cleanly

A decent amount of our replication was already push-ish in practice — we had discipline around MarkPropertyDirty and we weren't doing much bNetDirty-style full-object replication. The migration path for those objects was straightforward: annotate the replicated properties, remove the old registration, verify with the replication graph inspector.

Objects that were property-heavy but behaviorally simple — inventory items, static actors, most NPC state — went through in about three weeks. The engineers on this path were bored by week two. That's a good sign.

What didn't

The harder path was anything involving custom FastArraySerializer usage, complex RPC ordering, or gameplay ability system state. We had all three.

Sharp edge #1: debugging what isn't replicating

The first thing that surprised us is how much harder it is to debug replication gaps on IRIS than on the legacy system. Legacy had 15+ years of Stack Overflow threads, a replication graph you could visualize, and well-known failure modes. IRIS is newer. When an object stopped replicating in our staging server, our first debugging sessions were genuinely painful.

The specific failure looked like this: an ability state would replicate fine in isolation, then silently stop replicating once the player crossed a zone boundary. The NetObjectGroup filtering was doing exactly what we'd told it to do — we just hadn't understood what we'd told it to do.

// PRACTICAL NOTE

We built an internal tool that hooks IRIS's group membership at runtime and prints which groups a given UObject currently belongs to, along with the filter predicates that matched. It took one engineer about three weeks. Every partner project we do on IRIS now ships with this tool in the first week.

If you're starting a migration, write this tool before you migrate anything non-trivial. Trying to debug group membership through log output alone will cost you more time than writing the tracer does.

Sharp edge #2: Mover 2.0 state at 70 players is too much

Mover 2.0 is, legitimately, great. The physics-driven movement it enables is core to how ATONE plays. It also replicates a lot of state per player, because the simulation is authoritative and the client needs enough to predict and rewind without visible hitching.

The naive approach — replicate the full FMoverDefaultSyncState for every player at our standard replication rate — works at 10 players. At 30 players it starts to eat into our per-tick budget. At 70 players it breaks it.

We ended up writing a delta codec. The short version:

Every player has a known-last-acked Mover state on the server
The server ships only the components that changed since that ack, with a one-byte component mask
Every Nth replication frame, we force a full anchor state to absorb any acknowledgment drift
The client applies deltas against the last anchor, and falls back on the anchor if deltas desync

The codec saved us about 60% on movement bandwidth at target CCU. The total implementation was six weeks, most of which was the correctness work around rollback and jitter — not the codec itself.

The lesson isn't “write a delta codec for Mover.” The lesson is: Mover's replication cost is load-sensitive in ways that prototype-scale testing won't surface. Test at target CCU early, not late, or budget for the rewrite.

Sharp edge #3: NetObjectGroups are the design, not an implementation detail

This one took us longest to internalize. On the legacy system, interest management was something you added on top of replication — a distance check here, a bNetRelevantFor override there. On IRIS, NetObjectGroups are interest management, and the group topology you choose effectively is the design of your replication.

We initially modeled groups one-to-one with our old relevance system: a group per zone, a group per squad. It worked. It also had ugly behavior at zone boundaries — objects would flicker in and out of replication as players crossed edges, and our tooling couldn't tell us why.

The group topology you pick is your replication design. Treat it like architecture, not implementation.

We redesigned around hysteresis-aware groups: players are in a group as long as they're within a generous radius, and they only leave when they're clearly past the boundary. It added complexity to the group membership logic, but it eliminated the boundary flicker entirely.

What we'd tell a studio starting today

If we were advising a partner studio on an IRIS migration today — and we increasingly are — this is the compressed version.

Build the group-membership tracer first. Do not try to debug gaps with log output.
Test at target CCU before you migrate anything Mover-adjacent. Movement replication breaks at scale, not in isolation.
Design your group topology like architecture. Don't one-to-one map it from your old relevance system.
Add hysteresis at group boundaries. Objects flickering in and out of replication is a symptom of this.
Keep the legacy system compilable for a month after you “finish.” We found three bugs by diffing behavior against legacy during the stabilization period. We almost didn't keep the escape hatch — glad we did.

Where we are now

ATONE runs fully on IRIS in production. Movement bandwidth is down 60%. Server tick cost is down around 35%. Our debugging tooling is in better shape than we had on legacy — because we had to build it deliberately. And we have a migration playbook we can bring to partner studios, which wasn't a thing when we started.

The migration was worth it. It was also harder than we estimated, and most of the unexpected cost was in tooling and correctness, not in the migration itself. If you're planning the same move, plan for those two.

If you're in the middle of (or thinking about) an IRIS migration, we do tech audits and hands-on migration work as a technical service. Details here, or mail us directly.

Moving ATONE to UE5 IRIS — and what broke first.