DEV Community

Eastern Dev
Eastern Dev

Posted on

We Deleted 77% of Our Code and Got Better Results: NeuralBridge V10

We Deleted 77% of Our Code and Got Better Results: NeuralBridge V10

The brutal truth about why simpler is better in AI API reliability.


The Wake-Up Call

We built NeuralBridge to solve one problem: AI API calls fail, and developers need automatic recovery without babysitting.

After 9 versions and 13,000 lines of code, we made a disturbing discovery.

Our "smartest" version was actually our worst.


The V9 Bayesian Experiment: When More Complexity Hurts

In V9, we implemented Bayesian inference for fault diagnosis. It sounded brilliant:

  • Probabilistic fault classification
  • Prior probabilities updated with new evidence
  • Theoretically optimal decision-making

The results?

Version Timeout Recovery Invalid Model Total
V8.2 (Rules) 100% 100% 100%
V9 (Bayesian) 99.3% 100% 99.7%

V9 was mathematically worse than V8.2's simple if-else statements.

The "smart" Bayesian approach added overhead that actually hurt performance. After 600 real API calls testing both approaches, the evidence was undeniable.


The Numbers That Changed Everything

100 Rounds of Real API Calls (Zero Mocks)

Fault Type Strategy Recovery Rate Avg Latency
Timeout SimpleRetry 83% 3,941ms
Timeout LiteLLM 87% 4,928ms
Timeout V8.2 Flywheel 98% 2,211ms
Invalid Model SimpleRetry 0% 3,901ms
Invalid Model LiteLLM 100% 5,363ms
Invalid Model V8.2 Flywheel 100% 5,239ms

Key insight: V8.2 recovers 11 more timeouts than LiteLLM with half the latency.

The rule-based flywheel isn't just better—it's 2x faster.


What We Learned

1. Network Effect Flywheel

Every customer hits unique failure patterns. When one customer solves a problem, that solution helps everyone.

Customer A hits error "TPM limit at 2:30 AM UTC"
   ↓
Solution discovered and stored
   ↓
Customer B hits similar error at 3:00 AM UTC
   ↓
Instant recovery from knowledge base
Enter fullscreen mode Exit fullscreen mode

This is why we crawl GitHub Issues, Stack Overflow, and Status Pages. They're free ammunition for our flywheel.

2. The Dual-Flywheel Architecture

We separated two concerns that shouldn't be mixed:

  • Training Flywheel (Offline): Parallel strategy testing, can be slow, discovers optimal solutions
  • Execution Flywheel (Real-time): Must be fast, just looks up pre-tested strategies
# Training (can take seconds)
fault  test 5 strategies in parallel  store best in knowledge base

# Execution (must be milliseconds)
fault  lookup knowledge base  execute best strategy immediately
Enter fullscreen mode Exit fullscreen mode

The execution path has zero ML, zero statistics, just rules. Because rules are fast.

3. Real Failure Data > Theoretical Models

We scraped:

  • 50+ GitHub Issues (openai-python, anthropic-sdk)
  • 30+ Stack Overflow questions
  • 15+ OpenAI status incidents
  • 8+ Anthropic status incidents

Real error messages → Real patterns → Real strategies

No synthetic data. No simulated failures. Every strategy was tested against actual API errors.


V10: The Rebuild

What We Removed

Component Lines Reason
Bayesian inference 800 Slower, worse accuracy than rules
MDP/POMDP planning 1,200 Overhead unjustified by results
Complex retry budgets 600 Better to have simple backoff
Unused providers 400 Maintenance burden, no value
Total Removed ~3,000 Better performance

What We Kept

Strategy Why
param_fix Solves 40% of param errors
immediate_switch_model Instant recovery for bad models
dynamic_backoff Actually works unlike fancy alternatives

Result

V9:  ~13,000 lines
V10: ~2,200 lines
Reduction: 77%
Enter fullscreen mode Exit fullscreen mode

The code fits in your head now.


The Brutal Honest Status

What We've Validated ✅

  • 2 fault types tested: timeout, invalid_model
  • 600 real API calls across 100 rounds
  • V8.2 vs LiteLLM comparison complete

What We Haven't Tested Yet ❌

  • Rate limiting scenarios (no live 429s during testing)
  • Quota exceeded (billing edge cases)
  • Connection errors (network instability)

Our Current Reality

  • 0 paying customers
  • Open source only (GitHub blocked in China, can't even push code)
  • Building in public

We're not pretending to be production-ready. We're showing you the data and letting you decide.


Why We Published This

GitHub is blocked in China. We can't push code or build a community there.

So we're using content marketing to reach developers who might benefit from what we've learned:

  1. Simpler can be better - The data proves it
  2. Real testing > theoretical optimization - 600 calls, zero mocks
  3. Network effects work - Every failure you solve makes the system smarter for everyone

The Architecture That Made It Possible

┌─────────────────────────────────────────────────────────────┐
│                     NeuralBridge V10                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌─────────────────┐     ┌─────────────────┐                │
│  │   Training      │────▶│   Knowledge     │                │
│  │   Flywheel      │     │   Base          │                │
│  │  (Offline/slow) │     │  (Strategies)   │                │
│  └─────────────────┘     └─────────────────┘                │
│           │                       │                         │
│           ▼                       ▼                         │
│  ┌─────────────────────────────────────────────────────────┐│
│  │               Execution Flywheel (Real-time)           ││
│  │  ┌──────────┐  ┌──────────┐  ┌──────────────────────┐  ││
│  │  │Diagnoser │─▶│Strategy  │─▶│ Executor            │  ││
│  │  │(Rules)   │  │Router   │  │ switch_model        │  ││
│  │  │          │  │(Lookup) │  │ fix_params          │  ││
│  │  │          │  │         │  │ retry_with_delay   │  ││
│  │  └──────────┘  └──────────┘  └──────────────────────┘  ││
│  └─────────────────────────────────────────────────────────┘│
│                                                              │
└─────────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Try It Yourself

from flywheel import quick_recover

error = TimeoutError("Request timed out after 30s")
result = quick_recover(error)

if result.success:
    print(f"Recovered with: {result.strategy_used}")
    print(f"Latency: {result.latency_ms}ms")
Enter fullscreen mode Exit fullscreen mode

Or integrate directly:

from flywheel import NeuralBridgeV10

engine = NeuralBridgeV10(api_key="your-key")

try:
    response = openai.ChatCompletion.create(**params)
except Exception as e:
    # Auto-heal and retry
    fixed_params = engine.recover_sync(e, params)
    response = openai.ChatCompletion.create(**fixed_params)
Enter fullscreen mode Exit fullscreen mode

The Honest Numbers (One More Time)

Metric Value
Code reduction 77% (13,000 → 2,200 lines)
Timeout recovery 98% (vs LiteLLM's 87%)
Speed improvement 2x faster (2,211ms vs 4,928ms)
Fault types validated 2
Real API calls 600
Paying customers 0

We deleted 77% of our code and got better results. That's the story.


NeuralBridge is open source. We're building in public and sharing what we learn. No hype, just data.

Tags: ai, api, reliability, selfhealing, openai, llm, error-handling


Published 2024-07-07 | Last updated: 2024-07-07

Top comments (0)