Alignerr WorldSim Eval — $90/hr AI evaluation project for software engineers

Alignerr's WorldSim Eval: An AI Gig Worker's Dream — Long-Term Stability and $90/hr

Joshua Drake·Breaking Even·May 26, 2026·Updated June 9, 2026

TL;DR: Alignerr is reportedly onboarding software engineers for a long-term, full-time project called WorldSim that pays $90/hr. The work involves evaluating Claude Code output against human preference — comparing AI-generated coding responses, analyzing diffs, checking correctness, writing evidence-backed justifications. Workers get two attempts at the eval. Those who pass reportedly receive $100 for the eval plus access to production work. The project appears to have started in March and they're ramping up hard right now.

The emails started going out in March. "$90/hr. Long-term. Full-time. Several months." From Alignerr, a platform known for high ceilings and empty project boards in equal measure.

People noticed. Reddit threads filled up. Discord channels lit up. And the same question kept surfacing: should I take the second attempt?

Here's what's been reported so far.

What WorldSim Appears to Be

WorldSim is an Alignerr project that reportedly involves evaluating AI-generated coding responses — specifically, output from Claude Code. Workers are said to be comparing multiple responses, analyzing code diffs, verifying correctness, identifying reasoning flaws, and writing up conclusions backed by real evidence.

The recruitment emails describe it as work for "a leading AI lab." They don't name Anthropic directly. But the project is called WorldSim, the eval runs on Labelbox, and the work involves reviewing Claude Code output against human preference. The dots are there to connect.

This doesn't appear to be the usual Alignerr "rate some chatbot responses and move on" project. Reports indicate they're looking for experienced software engineers with strong technical judgment — people who can read a code diff and assess not just whether it's correct, but whether the reasoning behind it holds up.

The reported rate — $90/hr — reflects that. This appears to be the highest publicly advertised rate on the platform, and it's not hourly in the "we'll give you 3 hours this week" sense. The emails explicitly say full-time, long-term, several months minimum.

The Selection Process

Getting in reportedly isn't a single eval. There are four steps, and Alignerr hasn't been shy about calling the process "rigorous."

Step 1: WorldSim Eval. Workers log into their Alignerr dashboard, go to Projects, and find "WorldSim Eval." Read the instructions, acknowledge them, and start. This is the coding evaluation — reviewing AI-generated responses and writing up analysis. Reviews are said to come back in 24–48 hours.

Step 2: Zara AI Interview. Those who pass the eval reportedly get invited to complete the Zara AI interview. Anyone who's been on Alignerr knows Zara — it's their AI interviewer. Instructions come by email.

Step 3: Background check. Required. The emails reference the "nature and confidentiality" of the work, which tracks with the client being a major AI lab.

Step 4: Onboarding. Selected candidates are reportedly onboarded to the customer's platform directly — not working inside Labelbox for production.

That's a longer pipeline than has been seen on other Alignerr projects. Four steps, with a background check, for contract work. That says something about what's on the other side.

The Second-Attempt Problem

Here's where it gets messy.

Workers reportedly get two shots at the WorldSim eval. Each attempt is the same single task loaded as two labels — one submission each. Alignerr is emailing aggressively for people to get onboard here. If you pass the eval you'll reportedly receive $100 for your evaluation work plus access to production tasks. Fail the first time, try again. Fail the second, thank you for your time, you're done.

The confusion is about what happens when you fail the first time.

It had been reported that: "if you got accepted, you will be invited to the production phase. If it failed, you won't receive any notification." And this is usually how things go down. No rejection email. No "you didn't pass." No "here's your second chance." Just silence. And then if they allow for it, weeks later, a re-attempt email arrives — but by that point, you've been sitting in the dark, not knowing if you were rejected or if your submission is still in the queue or not. Alignerr purgatory, if you will.

The reattempt process isn't intuitive. And there has been confusion on whether or not you edit your original or submit an entirely new answer, because the eval is identical. People have said both — edit the original, and, submit an entirely new answer. That's a judgment call. You don't get any feedback on why you didn't pass the first eval, and because it is the same question, you need to decide how you failed the first time. If you let it, it could easily 🤯.

If you fail the second time, there will be no confusion. You will know :(

Why This Project Has People's Attention

The AI gig economy has a well-documented consistency problem. Projects spin up, workers do good work, the queue dries up, and then comes the wait. Alignerr has been particularly susceptible to this because of its project-by-project structure. Workers report going months between active gigs.

WorldSim appears to be different for a few reasons:

Long-term commitment. "Several months" minimum, full-time hours. That's not a two-week labeling sprint.
The rate appears to be real. $90/hr isn't a "Per Finished Hour" number that collapses when you do the math. It's reported as straight hourly for cognitive work.
The client appears serious. Four-step selection process with a background check. That level of investment in the pipeline suggests they plan to use it.
It reportedly started in March. This isn't a just-announced project with no track record. It's been running for three months and they're scaling up, which suggests the early cohort is working and the work is ongoing.

For anyone who's been treating Alignerr as a secondary income source — something that pays well when it's active but can't be relied on as a baseline — this could be the project that changes that equation. If the work is as consistent as advertised, $90/hr at full-time hours is a different conversation entirely.

What Workers Are Saying Before Starting the Eval

Based on community discussion, a few things keep coming up:

Read the instructions completely before starting. There's reportedly only one task per attempt. The eval appears to be about quality of analysis, not speed. Evidence-backed conclusions, not gut reactions.

Code review skills reportedly matter more than coding skills. Workers aren't writing code — they're evaluating it. Can you read a diff and explain why one approach is better than another? Can you identify when an AI's reasoning sounds right but is actually flawed? That seems to be the skill set.

Check the stage column in Labelbox after submitting. Workers report that if it says "production," you passed. If it still says "evaluation," you're either waiting for review or you didn't make it. That column appears more reliable than waiting for an email.

Don't expect instant turnaround. The stated timeline is 24–48 hours for eval review. In practice, with the volume of people reportedly taking this, it could stretch longer.

There's a Discord channel. Use it before you take/retake. It could have updates you need.

Frequently Asked Questions

What is the Alignerr WorldSim eval? It's reported to be a coding evaluation where workers review and compare AI-generated coding responses. The work involves analyzing code diffs, verifying correctness, identifying reasoning flaws, and writing evidence-backed conclusions. It's described as the first step in the selection process for a long-term $90/hr software engineering project.

How much does the Alignerr WorldSim project pay? The reported rate is $90/hr. The role is described as full-time, long-term (several months minimum), remote contract work.

Do you get paid for the WorldSim eval? Reports indicate $100 if you pass. If you fail both attempts, workers reportedly don't receive eval compensation.

How many attempts do you get on the WorldSim eval? Two. If you fail the first time, a second attempt reportedly becomes available. The timing of when workers are notified about the second attempt has been inconsistent.

What happens if you fail the WorldSim eval? If you fail the second attempt, workers report receiving a notification. If you fail the first attempt, the communication has been unclear — reports indicate no notification at all, followed by a general email weeks later offering re-attempts.

What AI model does the WorldSim project evaluate? The work reportedly involves evaluating Claude Code output against human preference. The client is described as "a leading AI lab."

When did the WorldSim project start? The project appears to have launched in March 2026. As of late May 2026, Alignerr is actively ramping up and onboarding new cohorts.

What qualifications do you need? Reports indicate they're looking for experienced software engineers with strong technical judgment, critical thinking, and the ability to write objective evaluations. A background check is described as part of the process.

For a full breakdown of Alignerr — rates, projects, and the reality of working on the platform — read the 2026 Alignerr review. If you're already on the platform and wondering about payment timing, the payment schedule breakdown is here. And if you're comparing this to other platforms, the tier list has the full picture.

Comments

Joshua Drake has worked on AI training platforms for over four years, tracking earnings, sentiment data, and platform stability across Outlier, DataAnnotation, Alignerr, and others. He has a degree in data analytics and runs this site, breakingeven.online and the sentiment analysis used to derive a sense of what is happening in a world often hiding in the shadows.