MinBapE

Posted on May 11

[Lime #2] How To Search Music

#dotnet

What I worked on

I designed and implemented the music search pipeline for Lime.

Music search sounds simple. Take a query, return results.

But there was more to think about than expected.

Users should be able to search for music that isn't in Lime's internal DB yet
Calling external APIs on every search request is slow and costly
If an external API fails, search shouldn't break entirely
Saving every external result directly to the DB would pollute it with data nobody cares about

To satisfy these constraints, I split search into two separate flows.

The search API returns internal DB results and cached candidates first.

External provider searches are handled asynchronously through background Jobs.

What I built

Internal DB search for Artists, Albums, and Tracks
SearchCandidate model and candidate caching
SearchJob enqueueing and ExternalSearchWorker (background processing)
External provider interface with MusicBrainz and Spotify implementations
Per-provider Rate Limiter
Merging internal DB results with external candidates, with deduplication
SearchCandidate → Artist / Album / Track Import API
ExternalMusicIds (cross-platform ID linking)
ExternalGenreTags (storing provider genre tags as-is)
Graceful degradation when providers fail

The full flow

When a user types a search query, here's what happens.

GET /search?keyword=radiohead

1. Normalize the keyword
2. Search internal DB for Artists, Albums, Tracks (parallel)
3. Query SearchCandidate cache for candidates (parallel)
4. Enqueue SearchJobs per external provider (async — does not block)
5. Merge internal results with cached candidates
6. Return response

The key point is that Step 4 does not block the response.

The user gets internal results and previously cached candidates immediately.

Background Jobs handle the external search, and the next request will see those results.

The API looks like this:

GET  /search?keyword=...
POST /search/import/{candidateId}
GET  /search/albums/{albumId}

SearchCandidate: not permanent data, just candidates

My first instinct was to save external provider results directly into the Artist, Album, and Track tables.

But thinking it through, that's a problem.

A user searching "radiohead" shouldn't cause dozens of MusicBrainz albums to become permanent Lime records.

Only music the user actually wants to review should become permanent data.

So external search results are first stored as SearchCandidate.

SearchCandidate
  - provider          (MusicBrainz, Spotify...)
  - providerEntityId  (the provider's own ID for this entity)
  - resultType        (Artist, Album, Track)
  - title
  - artistName
  - coverImageUrl
  - releaseDate
  - expiresAt         (cache TTL: 24 hours)
  - rawJson           (original response payload)

expiresAt exists because this is a cache.

After 24 hours, results are treated as stale and re-fetched on the next search.

Only when a user picks a specific candidate does it get promoted to permanent data.

SearchJob: decoupling external search from the request

If the search API called external providers directly, two problems would follow.

External APIs like MusicBrainz have rate limits (1 request per second)
A slow or failing external API would make the entire search endpoint slow

So I moved external search into SearchJob entries and let a background ExternalSearchWorker process them.

SearchJob
  - normalizedQuery  (normalized search term)
  - provider         (MusicBrainz, Spotify...)
  - resultType       (Artist, Album, Track)
  - status           (Pending, Running, Completed, Failed)
  - startedAt
  - completedAt
  - failedReason

If a Job with the same (normalizedQuery, provider, resultType) already exists, a duplicate isn't created.

ExternalSearchWorker runs every 5 seconds.

1. Fetch Pending Jobs
2. Mark each Job as Running
3. Call the corresponding provider
4. Save results as SearchCandidates
5. Mark the Job as Completed

Provider abstraction and Rate Limiter

Just like OAuth providers were abstracted behind an interface in the auth feature, external music sources follow the same pattern.

internal interface IExternalMusicProvider
{
    string ProviderName { get; }

    Task<IReadOnlyList<ExternalProviderResult>> SearchArtistsAsync(string query, CancellationToken ct);
    Task<IReadOnlyList<ExternalProviderResult>> SearchAlbumsAsync(string query, CancellationToken ct);
    Task<IReadOnlyList<ExternalProviderResult>> SearchTracksAsync(string query, CancellationToken ct);

    Task<IReadOnlyList<GenreTagResult>> LookupTagsAsync(...) => Task.FromResult(...);
    Task<ReleaseDetailResult?> LookupReleaseDetailAsync(...) => Task.FromResult<ReleaseDetailResult?>(null);
}

LookupTagsAsync and LookupReleaseDetailAsync have default implementations that return empty results.

Not every provider needs to support genre tag lookups or detailed metadata fetching.

Adding Apple Music later means creating an AppleProvider and registering it with DI.

ExternalSearchWorker looks up providers by name, so it's open for extension without modification.

Rate Limiter

MusicBrainz allows 1 request per second.

Exceed that and you get 429 responses.

ProviderRateLimiter uses a SemaphoreSlim to enforce the minimum interval between calls.

internal sealed class ProviderRateLimiter : IDisposable
{
    private readonly SemaphoreSlim _semaphore = new(1, 1);
    private readonly TimeSpan _minInterval;
    private DateTime _lastAcquired = DateTime.MinValue;

    public async Task WaitAsync(CancellationToken ct)
    {
        await _semaphore.WaitAsync(ct);
        try
        {
            var elapsed = DateTime.UtcNow - _lastAcquired;
            if (elapsed < _minInterval)
                await Task.Delay(_minInterval - elapsed, ct);

            _lastAcquired = DateTime.UtcNow;
        }
        finally
        {
            _semaphore.Release();
        }
    }
}

Each provider can have its own rate limit policy, managed by a singleton ProviderRateLimiterRegistry keyed by provider name.

Merging search results

SearchMerger combines internal DB results with SearchCandidate results.

The rules are straightforward.

1. Internal DB results go in first
2. External candidates are appended (up to 10 total per type) if not already present
3. Duplicates are detected by title + artist name

Using ExternalMusicIds for deduplication would be more precise, but title-based comparison is sufficient for now.

Internal DB results always come first.

If the same album exists in both internal and external results, only the internal one stays.

Import: promoting a candidate to permanent data

When a user selects a candidate, the client calls POST /search/import/{candidateId}.

This promotes a SearchCandidate into Lime's permanent Artist, Album, or Track records.

1. Look up the SearchCandidate by candidateId
2. Check ExternalMusicIds to see if it's already been imported
3. If yes, return the existing internal ID (prevent duplicate imports)
4. If no, find or create Artist → Album → Track in order
5. Save the platform ID in ExternalMusicIds
6. Save genre tags in ExternalGenreTags
7. If it's an album, enqueue a metadata enrichment Job

"Find or create" is the important phrase here.

The same artist or album might already exist in Lime.

The service searches by name first, and only creates a new record if nothing matches.

For example, when importing a Radiohead album, if Radiohead already exists in Lime, the import links to that existing artist rather than creating a duplicate.

ExternalMusicIds: linking platform IDs

This table connects MusicBrainz album IDs with Lime's internal Album IDs.

ExternalMusicId
  - provider          (MusicBrainz, Spotify...)
  - providerEntityId  (the provider's ID for this entity)
  - entityType        (Artist, Album, Track)
  - internalId        (Lime's internal ID)

Thanks to this table, when the same album is later fetched from Spotify, it can be linked to the existing Lime record instead of creating a duplicate.

It's the connective tissue that solves the problem: same music, different ID on every platform.

Duplicate imports are also prevented here.

If the same (provider, providerEntityId, entityType) already exists, the existing internal ID is returned as-is.

ExternalGenreTags: store genres verbatim

Lime doesn't try to define its own canonical genre taxonomy.

If MusicBrainz says "alternative rock", that's what gets stored.

If Spotify says "indie", "indie" gets stored.

ExternalGenreTag
  - entityType     (Artist, Album, Track)
  - entityId       (Lime's internal ID)
  - provider       (MusicBrainz, Spotify...)
  - tagName        (house, alternative rock, ambient...)
  - sourceLevel    (Artist, Album, Track, ReleaseGroup, Video)
  - providerEntityId
  - fetchedAt

The same (entity, provider, tagName) combination is never stored twice.

On the frontend, the plan is to display genres with their source: "MusicBrainz: alternative rock, indie".

The tricky part: when do external results become permanent?

The question I thought about the most during this work was:

When and how should external search results become permanent data?

The options I considered were:

A. Save immediately when search results come in
B. Save when the user selects something (synchronous in the search API)
C. Keep in cache only; promote to permanent data on selection

A is simple to implement, but useless data accumulates fast.

B means the search API has to wait for external API responses before it can reply.

C is what I went with.

SearchCandidate is a cache. After 24 hours it's stale.

Only when a user decides "I want to review this album" does it get promoted to permanent data.

Search stays fast. Data promotion happens after selection.

That separation is the core of this design.

A provider failure is not a search failure

MusicBrainz being unavailable shouldn't break search.

ExternalSearchWorker handles provider failures by case:

ProviderRateLimitException   -> mark Job Failed ("Rate limit exceeded")
ProviderUnavailableException -> mark Job Failed ("Provider unavailable")
any other exception          -> mark Job Failed (exception message)

All failures are recorded at the Job level.

The search response communicates this via an externalSearchStatus field.

From the user's perspective, internal DB results and cached candidates are always returned first.

If external search failed, the status is visible in the response.

Album enrichment

When an album is imported, detailed metadata — cover image, tracklist, genres — isn't fetched immediately.

Trying to fetch everything at import time would slow down that API call.

So immediately after import, an enrichment Job is enqueued for AlbumEnrichmentWorker to process.

Enrichment includes:

- Fetching cover art from Cover Art Archive
- Filling in release date
- Saving tracklist and track numbers
- Collecting genre tags per provider

Search stays fast. Details come after selection.

This same principle showed up in both search and enrichment.

Summary

This work established the core search pipeline for Lime.

Here's the full picture:

Search API
  -> Internal DB search (parallel)
  -> Cached external candidates (parallel)
  -> Enqueue external SearchJobs (async, non-blocking)
  -> Merge results + deduplicate + cap at 10 per type

Background Worker
  -> Process SearchJobs
  -> Call provider (rate-limited)
  -> Save results as SearchCandidates

Import API
  -> SearchCandidate → Artist, Album, Track
  -> Link ExternalMusicIds
  -> Save ExternalGenreTags
  -> Enqueue enrichment Job for albums

The search pipeline turned out to be more than "take a query, return results".

Thinking through response latency, API rate limits, data consistency, and provider failure handling made the flow considerably longer than expected.

Next up is wiring the review feature into this search foundation.

Leaving a rating on an imported track — that's what Lime is for.

DEV Community