DEV Community

Wilbur Suero
Wilbur Suero

Posted on β€’ Originally published at wilburhimself.github.io

Set-Based Updates in Rails: 4 Hours to 8 Seconds

I once inherited a background job to deactivate stale users. In production, a job processing 50,000 users that should have taken a minute was taking over 4 hours, consuming 2GB of RAM, and frequently timing out.

The culprit was a classic Rails performance pitfall: the N+1 update loop. The code looked innocent:

# Find users whose last login was more than 90 days ago
users_to_deactivate = User.where("last_login_at < ?", 90.days.ago)

users_to_deactivate.each do |user|
  # This runs one UPDATE query for every single user
  user.update(active: false)
end
Enter fullscreen mode Exit fullscreen mode

This generates a flood of queries, hammering the database with thousands of individual transactions:

-- The N+1 Update Hell
SELECT "users".* FROM "users" WHERE (last_login_at < '2025-11-30...');
UPDATE "users" SET "active" = false, "updated_at" = '...' WHERE "users"."id" = 1;
UPDATE "users" SET "active" = false, "updated_at" = '...' WHERE "users"."id" = 2;
-- ... 49,998 more UPDATE statements
Enter fullscreen mode Exit fullscreen mode

This approach is not just inefficient; it's hostile to your database. There is a much better way.

A Note on Indexes
All performance advice in this post assumes your query conditions are indexed. The User.where("last_login_at < ?", ...) query is only fast if you have a database index on the last_login_at column. Without it, the SELECT itself will be slow.

You can check this with User.where("last_login_at < ?", 90.days.ago).explain. If you see a Seq Scan (Sequential Scan), you need an index:

# db/migrate/YYYYMMDDHHMMSS_add_index_to_users_last_login_at.rb
add_index :users, :last_login_at

The Power of update_all

Instead of pulling 50,000 records into memory, we can tell the database to update the entire set in a single command. Active Record's update_all constructs one SQL UPDATE statement and sends it directly to the database.

Let's refactor the slow job:

count = User.where("last_login_at < ?", 90.days.ago)
            .update_all(active: false, updated_at: Time.current)

Rails.logger.info "Deactivated #{count} users."
Enter fullscreen mode Exit fullscreen mode

This generates one beautiful, efficient SQL query and returns the number of rows affected. The result: 4 hours becomes 8 seconds.

The Sharp Edges of update_all

This performance comes with a critical trade-off: update_all bypasses most of the ActiveRecord lifecycle.

  • It ignores validations.
  • It skips callbacks. This will break any business logic in after_save or after_commit, like sending notifications or invalidating caches.

The updated_at Trap

The most insidious side effect is that update_all does not automatically update the updated_at timestamp. This will break any downstream system that relies on it for cache invalidation, audit trails, or synchronization.

Imagine a UI that caches a user's profile:

Rails.cache.fetch("user-#{user.id}-#{user.updated_at.to_i}") { ... }
Enter fullscreen mode Exit fullscreen mode

If you use update_all(active: false), updated_at remains unchanged. The cache key stays the same, and your UI continues to show the user as active until the cache expires. The fix is to set it manually, as shown in the example above.

The Middle Ground: update_columns

update_columns is for when you already have an object in memory and need to perform a targeted update while bypassing callbacks and validations. It's a scalpel for a single record.

user = User.find_by(email: "some_user@example.com")
# ... some complex logic ...

# Now, update just one attribute without triggering callbacks.
user.update_columns(login_attempts: 0)
Enter fullscreen mode Exit fullscreen mode

Unlike update_all, update_columns does touch updated_at by default (though this can be configured). It is not for bulk operations; using it in a loop brings you right back to the N+1 problem.

War Story #2: The Death-by-a-Thousand-Increments

I once debugged a Rails app where a popular blog post page was taking 20 seconds to load. The culprit? The controller was incrementing view counts like this for every recommended article on the page:

# ANTI-PATTERN: 50 posts on the page = 50 UPDATE queries
recommended_posts.each { |post| post.increment!(:view_count) }
Enter fullscreen mode Exit fullscreen mode

increment! is just update under the hood. The fix was to switch to update_all with a SQL fragment, which dropped the response time to 200ms.

# GOOD: One query to increment all counters
Post.where(id: recommended_posts.map(&:id))
    .update_all("view_count = view_count + 1")
Enter fullscreen mode Exit fullscreen mode

Going Deeper: Safe Heterogeneous Updates with SQL CASE

update_all is great for applying the same change to many records. But what if you need to update a set of records, each with a different value, like reordering items in a playlist?

⚠️ Security Warning: SQL Injection
Building raw SQL queries with string interpolation is extremely dangerous. The following example demonstrates a safe approach using sanitize_sql_array. Never inject raw user input directly into SQL strings.

# Use case: Reorder a list of tasks from user input
# updates = { "1" => 1, "2" => 2, "3" => 3 } where key is task_id and value is new position
def self.reorder_tasks(updates)
  # Build the CASE statement using sanitize_sql_array for each condition
  case_sql = updates.map do |id, position|
    # Ensure id and position are integers to prevent injection
    ActiveRecord::Base.sanitize_sql_array(["WHEN ? THEN ?", id.to_i, position.to_i])
  end.join(" ")

  # Construct the final, safe update statement
  update_sql = [
    "position = CASE id #{case_sql} END, updated_at = :now",
    { now: Time.current }
  ]

  Task.where(id: updates.keys).update_all(update_sql)
end
Enter fullscreen mode Exit fullscreen mode

This is the most performant way to handle complex batch updates, combining the power of raw SQL with the safety of Rails' sanitization.

Production Considerations

Locking and Concurrency

A long-running update_all can lock many rows, blocking other requests. This can cause timeouts in a high-traffic application. It's often better to process records in batches to reduce lock duration.

Processing in Batches

For very large updates, use in_batches to break the work into smaller chunks. This runs more queries but keeps transactions short and locks minimal.

# Process 50,000 users in batches of 5,000
User.where("last_login_at < ?", 90.days.ago)
    .in_batches(of: 5000) do |batch|
      batch.update_all(active: false, updated_at: Time.current)
      sleep(0.1) # Optional: yield to other processes
end
Enter fullscreen mode Exit fullscreen mode

The Decision Framework

  • update: The default for single-record updates where callbacks and validations must run.
  • update_columns: A scalpel for single-record updates where callbacks and validations must be skipped.
  • update_all: The workhorse for bulk-updating many records to the same value.
  • update_all with CASE: The specialist tool for bulk-updating many records to different values.

Avoid set-based updates entirely when you need to trigger side effects (like sending emails) for each updated record or when you need to run model validations.

Conclusion

The next time you write .each { |record| record.update(...) }, pause. Ask yourself: am I processing complex business logic for each record, or am I simply updating a set of records based on a condition?

If you're just changing data, you're in set-based territory. Choosing the right tool for the job is the mark of a seasoned developer. Your database will thank you. And when your background job finishes in 8 seconds instead of timing out after 4 hours, you will too.

Top comments (2)

Collapse
Β 
mamoor_ahmad profile image
Mamoor Ahmad β€’

Great post. The pause before you .
each advice is something every Rails dev needs to hear at least once.

The CASE statement approach for heterogeneous updates is a gem I've seen so many codebases that either don't know it exists or are scared of raw SQL in Rails. The sanitize_sql_array pattern makes it safe, and it's worth showing people that.

One thing I'd add: if you're on Postgres, upsert_all is another powerful option when you need bulk inserts or upserts without callbacks. Same philosophy skip the ORM loop, let the database do what it's good at.

Also worth mentioning: update_all doesn't return updated records, just the count.
So if you need to do something with the affected rows after the update (like log them or push to a queue), you'll need a separate query or a different approach.

The 4 hours β†’ 8 seconds story is a great hook.
Everyone has that one job they're afraid to look at. πŸ˜…
πŸ™Œ πŸ™Œ πŸ‘ πŸ‘

Collapse
Β 
travelingwilbur profile image
Wilbur Suero β€’

Thanks for the additions, both are spot-on and worth calling out explicitly.

On upsert_all: You're right that it deserves mention. It's the Postgres-native tool for the "insert or update" pattern, and it sidesteps the classic race condition you get when you do find_or_create_by followed by update. One query, atomic, no callbacks. The trade-off is the same as update_all: you lose the ActiveRecord lifecycle, so it's a performance tool, not a default. I've used it for syncing external API data into Rails ... thousands of records where callbacks would be pure overhead.

On the return value: Absolutely. update_all returns the count, not the objects. That's critical for verification and logging, and I should've been clearer about it in the post. If you need the actual records post-update, you're back to a SELECT, which is fine, it's still two queries instead of N+1, and the SELECT is cheap since the data's already in memory at the database level. The pattern I use:

ids = User.where("last_login_at < ?", 90.days.ago).pluck(:id)
count = User.where(id: ids).update_all(active: false, updated_at: Time.current)
Rails.logger.info "Deactivated #{count} users: #{ids.join(', ')}"
Enter fullscreen mode Exit fullscreen mode

Pluck is fast, and you've got the IDs for downstream work (queue jobs, logging, whatever).

Appreciate the push on both points. The 4-hour job is real, and yes ... everyone has that one Sidekiq worker they're secretly terrified will break production if they touch it. πŸ˜