Two comments. Same subreddit. Same topic. Posted minutes apart.
One has +847 karma. It sits at the top, gets gilded, spawns a thread of 200 replies. The person who wrote it gets DMs. Their next post gets upvoted reflexively because people remember the name.
The other has +1. It exists somewhere below the fold, unseen by 99% of readers. Same words, different fate.
That number β Reddit karma β is one of the most consequential invisible forces on the internet. It shapes what millions of people read, what opinions gain traction, whose voices carry weight in a community. And most Reddit users have a vague sense of how it works: upvotes good, downvotes bad, net score determines visibility.
What almost nobody knows is what's actually happening inside Reddit's infrastructure every time someone clicks that arrow.
The score you see? It's a lie. Not a malicious one. A mathematically necessary, deliberately constructed approximation β and the algorithm behind it is one of the most elegant ideas in distributed systems.
This is that story.
What Reddit Karma Actually Is
Let's start simple, because the simple version matters.
Every post and comment on Reddit has a score. That score is calculated as:
Score = Upvotes β Downvotes
A post with 1,200 upvotes and 200 downvotes shows a score of +1,000. A comment with 50 upvotes and 80 downvotes shows -30. Reddit uses this score to rank content β higher scores float up, lower scores sink. In highly active subreddits, the first few minutes of score accumulation can determine whether a post reaches the front page or disappears forever.
The score also feeds into account-level karma, which affects posting privileges, community credibility, and how other users perceive you before they've read a single word you've written.
Simple math. Massive consequences.
Now let's build the system that computes it.
The Simple Version (That Works Fine Until It Doesn't)
Imagine Reddit in its early days. A few thousand users. One web server. One database.
The schema for tracking votes is straightforward:
CREATE TABLE post_scores (
post_id BIGINT PRIMARY KEY,
score INT DEFAULT 0
);
When someone upvotes:
UPDATE post_scores SET score = score + 1 WHERE post_id = ?;
When someone downvotes:
UPDATE post_scores SET score = score - 1 WHERE post_id = ?;
User loads the page, you return score. Done.
This works beautifully for a small community. Your database handles concurrent updates with row-level locking. Reads are fast. Writes are fast. The score is always accurate.
And then the site grows.
The Moment Everything Gets Complicated
Reddit today serves over 50 million daily active users. Popular posts on r/worldnews or r/AskReddit can accumulate thousands of votes in minutes. During major news events, that number spikes even higher.
One database server cannot handle that write throughput. So you do what every scaling team eventually does β you distribute.
You spin up multiple database replicas. You deploy servers in the US, Europe, and Asia so votes from each region get handled locally. Latency drops. Throughput goes up. Everything seems fine.
Until two servers try to update the same post's score at the same time.
Here's the problem in its simplest form. The US server reads the score: 142. The EU server reads the score: 142. A user in the US clicks upvote β the US server wants to write 143. Simultaneously, a user in Europe clicks downvote β the EU server wants to write 141.
Both writes land. One of them overwrites the other. A real vote disappears.
At small scale this is a rounding error. At Reddit scale, with thousands of concurrent votes per second on a single viral post, you're losing a meaningful number of real votes from real users. Users who clicked an arrow and expected their opinion to count.
Let's be honest β this is not a hypothetical. This is the exact class of problem that has caused production incidents at companies you use every day.
The Approaches That Don't Work
The instinct is to reach for coordination. Make the servers talk to each other before writing.
Distributed locks: before updating the score, acquire a lock on that post's row across all replicas. Only one server writes at a time. Others wait.
This works. It's also slow. Lock acquisition across data centres separated by hundreds of milliseconds of network latency adds that latency to every single vote. During peak traffic, lock contention turns your vote system into a queue. Users notice.
Two-phase commit (2PC): a coordinator server proposes the write, all participants acknowledge, then the coordinator commits. Guarantees consistency across all nodes.
Also works. Also slow. 2PC requires multiple round trips between servers. It assumes all participants are available β if any node is down or unreachable during the commit phase, the transaction blocks. In a system operating across multiple continents, "all participants available" is never guaranteed.
Both approaches trade availability for consistency. That's a valid trade-off in some systems. For a vote counter on a social platform β where a few hundred milliseconds of delay on every click would be immediately noticeable and wildly unacceptable β it's not.
There had to be a better way.
A Different Question
The distributed systems community eventually asked a different question.
Instead of "how do we make all servers agree before writing?" β what if we asked: "what constraints can we place on the data so that disagreements become impossible?"
This line of thinking led to CRDTs β Conflict-free Replicated Data Types. Data structures specifically designed so that concurrent updates on multiple replicas can always be merged correctly, without coordination, without locks, without consensus rounds.
The key insight: if you design your data structure such that the merge operation is always unambiguous β no matter what order updates arrive, no matter how long replicas were disconnected β you get eventual consistency for free.
I covered the simplest CRDT β the G-Counter β in Episode 1 of this series. The short version: each replica keeps its own local count, only ever increments it, and merge means "take the maximum per replica." YouTube's view counter works this way. There's no conflict possible because a count that can only grow is always unambiguous β the higher number is always more up to date.
But G-Counters have an obvious limitation.
They only go up.
Why Upvotes Alone Aren't Enough
Reddit's score isn't a count of upvotes. It's upvotes minus downvotes. You need both directions.
A G-Counter can track upvotes fine. But downvotes require decrement. And decrement is where things get complicated β because "subtract one" from a shared counter across distributed replicas brings back exactly the ambiguity we were trying to avoid.
Two servers both see the score as 50. Both receive a downvote simultaneously. Both want to write 49. Same lost vote problem as before, just through a different door.
So: we need increment and decrement, without coordination, with correct merging across replicas.
Here's where things get interesting.
The Insight: You Never Actually Decrement
This is the moment the whole thing clicks.
What if you never decremented anything? What if every downvote was secretly an increment β just to a different counter?
Go back to the lemonade stand mental model. You and your friends are tracking the cash register. Money comes in from customers. Money goes out for supplies. You need to track both.
You already know the G-Counter trick: give everyone a notebook where they can only add tally marks. Works for money coming in.
For money going out, the solution is almost embarrassingly simple: give everyone two notebooks.
- A green notebook for every dollar that came in
- A red notebook for every dollar that went out
Neither notebook ever loses a mark. Both only ever grow. At the end of the day, you sync the green notebooks (take max per person), sync the red notebooks (take max per person), and do one subtraction:
Green total β Red total = what's in the register
That's a PN-Counter. Positive-Negative Counter.
Two G-Counters. One subtraction at read time. No coordination. No locks.
The "decrement" you thought you needed was never a real operation. It was always just an increment to a second counter that you subtract when you read the value. The counter doesn't go down β your accounting does.
The Code
gotype PNCounter struct {
inc GCounter
dec GCounter
}
func NewPNCounter() PNCounter {
return PNCounter{
inc: Zero(),
dec: Zero(),
}
}
Two G-Counters. That's the entire data structure.
Reading the score:
func (p PNCounter) Value() int64 {
return p.inc.Value() - p.dec.Value()
}
The subtraction only happens here β at read time, when someone loads the page. Never during updates. Never during merges.
Upvoting:
func (p PNCounter) Inc(r ReplicaId) PNCounter {
return PNCounter{
inc: p.inc.Inc(r),
dec: p.dec,
}
}
Downvoting:
func (p PNCounter) Dec(r ReplicaId) PNCounter {
return PNCounter{
inc: p.inc,
dec: p.dec.Inc(r),
}
}
Look at Dec. It doesn't subtract anything. It calls .Inc() on the decrement counter.
A downvote is an upvote to the wrong notebook.
If you showed this to someone without context they'd think it was a bug. It's the entire algorithm.
Merging two replicas:
func MergePN(a, b PNCounter) PNCounter {
return PNCounter{
inc: Merge(a.inc, b.inc),
dec: Merge(a.dec, b.dec),
}
}
Run the G-Counter merge twice β once on the increment counters, once on the decrement counters. Same "take the maximum per replica" logic. Nothing new to prove. Everything PN-Counter needs, it inherits from G-Counter.
What This Looks Like on Reddit's Servers
Let's trace a viral comment across three servers: US, EU, Asia.
T=0: Comment just posted. All counters at zero.
US: inc={US:0}, dec={US:0} β score: 0
EU: inc={EU:0}, dec={EU:0} β score: 0
Asia: inc={As:0}, dec={As:0} β score: 0
T=1: Activity floods in. US gets 8 upvotes. EU gets 3 upvotes and 5 downvotes. Asia gets 2 downvotes. All happening simultaneously, all independent.
US: inc={US:8}, dec={US:0} β local score: 8
EU: inc={EU:3}, dec={EU:5} β local score: -2
Asia: inc={As:0}, dec={As:2} β local score: -2
T=2: US and EU sync.
inc = Merge({US:8}, {EU:3}) = {US:8, EU:3}
dec = Merge({US:0}, {EU:5}) = {US:0, EU:5}
Score = (8+3) - (0+5) = 6
US and EU now agree: score is +6.
T=3: Asia reconnects after a brief network partition and syncs.
inc = {US:8, EU:3, As:0} β total: 11
dec = {US:0, EU:5, As:2} β total: 7
Score = 11 - 7 = 4
Final score: +4. Every upvote counted. Every downvote counted. No vote lost during the partition. No coordination during any of the voting itself.
This is the lie in the title: the +4 you see wasn't computed from one authoritative source. It was assembled from fragments of state scattered across three continents and merged after the fact.
It's an approximation. A consistent, mathematically guaranteed approximation β but an approximation nonetheless.
The Trade-Off Worth Naming
PN-Counter inherits G-Counter's main cost: every sync ships full state. Two maps, N entries each, across every gossip round. For tens of replicas this is fine. For thousands of edge nodes, the bandwidth adds up.
The more important trade-off is eventual consistency. When you're actively voting on a hot post, the score you see is the best approximation your nearest server has right now. Somewhere in Europe, a server might briefly disagree. They'll converge β they always do β but there's a window where different users see slightly different scores.
For Reddit, this is entirely acceptable. Nobody's life depends on whether the karma score reads 1,042 or 1,047 for 200 milliseconds. The system is highly available, survives network partitions cleanly, and handles write throughput that would collapse a coordinated system.
The trade-off is: exact consistency in exchange for scale and availability. Reddit made that trade. Most social platforms do.
The Problem PN-Counter Doesn't Solve
Here's where it gets interesting again β and where the next problem is quietly waiting.
PN-Counter has no floor. No ceiling. Nothing prevents the score from going to -50,000 if enough downvotes arrive. And nothing stops it from going to +10,000,000 either.
For Reddit karma, that's fine. But consider:
A distributed rate limiter that needs to cap requests at 1,000 per minute. The counter must not go above 1,000, enforced across all replicas.
An inventory system where stock can't go below zero. You can't sell items you don't have.
A per-user karma floor β some subreddits prevent karma from dropping below a certain value to protect new users from pile-ons.
These are bounded operations. And PN-Counter, with its two unbounded G-Counters, cannot enforce bounds without coordination.
You'd think: just check the current value before decrementing, and reject the operation if you'd go below the floor.
Try that across distributed replicas and you immediately reintroduce the problem we just spent this entire article solving. Two replicas both see the counter at 1 (the floor). Both receive a decrement. Both check locally β "1 is above 0, I can proceed." Both decrement. Counter is now -1 across the board. The floor was violated.
This is the problem the Bounded Counter CRDT was designed for. It's the next piece in this series β and it's the most complex CRDT we'll cover, because enforcing limits without coordination requires a genuinely different approach.
The short version: you can't do it with just max and subtraction. You need to distribute the permission to decrement itself.
But that's next time.
The Thread Running Through All of This
Two episodes in, the pattern is clear.
Every CRDT solves a hard distributed problem by finding a constraint that eliminates ambiguity. G-Counter said: make it grow-only, and max is always correct. PN-Counter said: never decrement β maintain two grow-only counters and subtract at read time.
The constraint is never a workaround. It's the design.
When you next open Reddit and see a karma score, you're looking at the output of two grow-only counters being summed and subtracted on a server near you β a server that may not have talked to its peers in the last few hundred milliseconds and doesn't need to.
The number is an approximation. But it's a principled one, built on math that guarantees eventual correctness without sacrificing the availability and performance that make Reddit usable at scale.
That's not a lie. That's engineering.
What's Next
This is Episode 2 of a series on CRDTs β the data structures powering Google Docs, Redis, Riak, and distributed infrastructure at scale.
- Episode 1: G-Counter β why YouTube's view count is always approximate
- Episode 2: PN-Counter β Reddit karma and the two-notebook trick (you're here)
- Episode 3: G-Set β grow-only sets, and where they appear in the wild
- Episode 4: LWW Registry β when "latest timestamp wins" is good enough
- Episode 5: OR-Set β the elegant fix for the "deleted but still there" problem
- Episode 6: Map CRDT β composing CRDTs into richer structures
- Episode 7: The full picture β when to reach for CRDTs and when to walk away
Subscribe to the newsletter to get each episode as it drops.
If you're building distributed systems and want to think through architecture decisions with someone who has spent years deep in Go and distributed infrastructure β I do 1:1 sessions on Topmate.
Top comments (0)