DEV Community

Cover image for Cisco Data Engineering Interview Questions
Gowtham Potureddi
Gowtham Potureddi

Posted on

Cisco Data Engineering Interview Questions

Cisco data engineering interview questions are Python end-to-end with a fundamentals-and-decorators edge: four primitives — {v: k for k, v in d.items()} dict-comprehension key-value inversion, status-filter dicts via {k: v for k, v in d.items() if v['status'] == 'active'}, function-timing decorators built with functools.wraps and time.perf_counter(), and the greedy comparator-sort sorted(chunks, key=cmp_to_key(lambda a, b: -1 if a+b > b+a else 1)) that produces the maximum concatenated string. The framings are everyday DE Python — invert a lookup dict, filter records by metadata flag, time a slow function via decorator, build the biggest string by concatenating chunks in the right order.

This guide walks through the four topic clusters Cisco actually tests, each with a detailed topic explanation, per-sub-topic explanation with a worked example and its solution, and an interview-style problem with a full solution that explains why it works. The mix matches the curated 4-problem Cisco set (3 easy, 1 medium, no hard) — a Python-fundamentals hub that rewards dict-comprehension fluency, decorator awareness, and a single non-obvious sorting trick. There is no SQL in this loop; every minute of preparation should go to Python.

Cisco data engineering interview questions cover image with bold headline, Python chip, and pipecode.ai attribution.


Top Cisco data engineering interview topics

From the Cisco data engineering practice set, the four numbered sections below follow this topic map (one row per H2):

# Topic (sections 1–4) Why it shows up at Cisco
1 Python dict comprehensions for key-value inversion Dictionary Key-Value Inversion — {v: k for k, v in d.items()}, plus duplicate-value handling.
2 Python hash tables and status filters Dictionary Status Filter — {k: v for k, v in d.items() if v['status'] == 'active'} over nested dict values.
3 Python decorators and functools.wraps for function timing Decorator Timing — @functools.wraps decorator pattern with time.perf_counter() and *args/**kwargs forwarding.
4 Python sorting and greedy for the maximum concatenated substring Maximum Substring (MEDIUM) — sorted(chunks, key=cmp_to_key(lambda a, b: -1 if a+b > b+a else 1)) then ''.join(...).

Python-fundamentals framing rule: Cisco's prompts are everyday data-engineering Python — dict inversions, status filters, function timing, max-concat strings. The interviewer is grading whether you map each business framing to the right primitive: invert lookup → dict comprehension; filter records → if cond clause inside comprehension; instrument a slow function → decorator + perf_counter; biggest concatenated string → greedy comparator sort, not lexicographic. State the mapping out loud.


1. Python Dict Comprehensions for Key-Value Inversion

Dict comprehensions and inverted lookups in Python for data engineering

"Invert a dict so values become keys and keys become values" is the canonical dict-comprehension interview prompt at Cisco. The mental model: {v: k for k, v in d.items()} swaps each pair in one line; the result is a new dict mapping the original values to their original keys. Same primitive powers any "flip a lookup table" or "build a reverse index" question — invert a country-code map, build a name → id dict from an id → name dict.

Pro tip: Dict-inversion silently overwrites duplicates. If two original keys share a value, only one survives in the inverted dict — and which one is insertion-order-dependent (last wins in Python 3.7+). State the duplicate-handling rule out loud; the interviewer will probe it.

Dict comprehension syntax: {expr_k: expr_v for ... in ...}

The dict-comprehension invariant: {key_expr: value_expr for x in iter} builds a dict by evaluating both expressions for each element in the iterable. Reading order: key_expr, then value_expr, then for x in iter. Execution order: scan iter, evaluate both expressions, insert into the dict.

  • {k: v for k, v in d.items()} — copy of d (no transformation).
  • {k: v * 2 for k, v in d.items()} — copy with doubled values.
  • {k: v for k, v in d.items() if v > 0} — filter clause; only positive values pass.
  • Set comprehension{expr for ... in ...} (no colon) — different syntax, different output type.

Worked example. Build a square-roots dict from a list of perfect squares.

input output
[1, 4, 9, 16] {1: 1, 4: 2, 9: 3, 16: 4}

Worked-example solution.

nums = [1, 4, 9, 16]
sqrts = {n: int(n ** 0.5) for n in nums}
print(sqrts)
# {1: 1, 4: 2, 9: 3, 16: 4}
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: if the loop body is "compute key, compute value, insert into dict," it's a dict comprehension; reach for it before for n in nums: d[n] = ....

Inversion: swap k and v in the comprehension head

The inversion invariant: {v: k for k, v in d.items()} swaps each pair. The original key becomes the new value; the original value becomes the new key. Both halves must be hashable (the new key especially — original values that are lists or dicts will fail with TypeError).

  • {v: k for k, v in d.items()} — straight inversion.
  • {v: [k for k, v2 in d.items() if v2 == v] for v in set(d.values())} — grouping form; preserves all duplicate keys.
  • Hashability check — original values must be hashable; lists/dicts will raise.
  • d.items() — yields (key, value) tuples; perfect for unpacking in the comprehension head.

Worked example. Invert a small lookup dict.

input output
{'a': 1, 'b': 2, 'c': 3} {1: 'a', 2: 'b', 3: 'c'}

Worked-example solution.

d = {'a': 1, 'b': 2, 'c': 3}
inv = {v: k for k, v in d.items()}
print(inv)
# {1: 'a', 2: 'b', 3: 'c'}
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: {v: k for k, v in d.items()} is the canonical one-liner; if the interviewer asks for "all keys per value" instead, switch to the grouping form.

Duplicate-value handling: last-wins vs preserve-all-via-list

The duplicate invariant: when two original keys share a value, the inverted dict can only keep one — the last one inserted wins (Python 3.7+) or is implementation-defined on older versions. To preserve all original keys per value, group into a defaultdict(list).

  • Last-wins behavior{v: k for k, v in d.items()} silently drops earlier duplicates.
  • Preserve-all formdefaultdict(list) + loop, or a set-comprehension over d.values().
  • set(d.values()) — distinct values; useful when you need to know "are there duplicates?".
  • Worked-example: with {'a': 1, 'b': 2, 'c': 1}, the simple inversion returns {1: 'c', 2: 'b'} (loses 'a'); the grouping form returns {1: ['a', 'c'], 2: ['b']}.

Worked example. Demonstrate both behaviors on {'a': 1, 'b': 2, 'c': 1}.

approach output
{v: k for k, v in d.items()} {1: 'c', 2: 'b'} (last wins)
defaultdict(list) group {1: ['a', 'c'], 2: ['b']}

Worked-example solution.

from collections import defaultdict

d = {'a': 1, 'b': 2, 'c': 1}

# Last-wins inversion
inv_simple = {v: k for k, v in d.items()}
print(inv_simple)        # {1: 'c', 2: 'b'}

# Preserve-all grouping
inv_group = defaultdict(list)
for k, v in d.items():
    inv_group[v].append(k)
print(dict(inv_group))    # {1: ['a', 'c'], 2: ['b']}
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: always ask the interviewer about duplicate semantics before inverting; "what should happen if two keys share a value?" is the question that earns points.

Common beginner mistakes

  • Trying to invert a dict whose values are unhashable (lists, dicts) — TypeError on dict-key creation.
  • Using {k: v} instead of {v: k} — that's a copy, not an inversion.
  • Building a list of (v, k) tuples and converting via dict(...) — works but verbose; comprehension is one line.
  • Assuming "first wins" on duplicate values — Python 3.7+ is "last wins" by insertion order.
  • Forgetting .items() and iterating for k in d — gets keys only; you need both.

Python Interview Question on Dictionary Key-Value Inversion

Given a dict d whose values are hashable, return a new dict where each value becomes a key mapping to the original key. If two original keys share a value, the lexicographically smallest original key should be kept.

def invert_min_key(d):
    ...
Enter fullscreen mode Exit fullscreen mode

Solution Using a min-aware loop over d.items()

def invert_min_key(d):
    inv = {}
    for k, v in d.items():
        if v not in inv or k < inv[v]:
            inv[v] = k
    return inv
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace (input: {'b': 1, 'a': 1, 'c': 2}):

step k v inv before action inv after
1 b 1 {} 1 not in inv → set {1: 'b'}
2 a 1 {1: 'b'} 'a' < 'b' → replace {1: 'a'}
3 c 2 {1: 'a'} 2 not in inv → set {1: 'a', 2: 'c'}
  1. Initialize — empty inv.
  2. Row ('b', 1)1 not in inv; insert inv[1] = 'b'.
  3. Row ('a', 1)1 already in inv with value 'b'; 'a' < 'b' lexicographically, so replace.
  4. Row ('c', 2)2 not in inv; insert inv[2] = 'c'.
  5. Return{1: 'a', 2: 'c'}.

Output:

return value
{1: 'a', 2: 'c'}

Why this works — concept by concept:

  • d.items() — yields (key, value) pairs in insertion order; required to access both halves.
  • v not in inv — first-touch detection; the value hasn't been seen yet, so we insert directly.
  • k < inv[v] — lexicographic comparison on already-stored key; replaces it only if the current key is smaller, enforcing the min-key tie-break.
  • Single-pass — one walk over the dict; no second loop, no auxiliary structures beyond the result.
  • Hashability — original values must be hashable to become keys; the prompt guarantees this.
  • O(N) time, O(K) spaceN original entries, K distinct values in the result.

PYTHON
Topic — dictionary
Dictionary problems

Practice →

PYTHON
Cisco — dictionary
Cisco dictionary problems

Practice →


2. Python Hash Tables and Status Filters

Filter-comprehensions over nested dict values in Python for data engineering

"Filter a dict-of-dicts by a status field" is the canonical hash-table interview prompt at Cisco. The mental model: {k: v for k, v in d.items() if v['status'] == 'active'} keeps only entries whose nested status matches. Same primitive powers any "filter records by metadata flag" question — keep enabled users, drop archived orders, surface pending jobs.

Pro tip: Always guard against missing keys in the predicate. v['status'] raises KeyError if any record lacks the field; v.get('status') == 'active' is the safer form. State the missing-key rule with the interviewer — "drop missing-status records" or "treat missing as inactive" — both are reasonable but they need to be explicit.

Filter-only comprehension: {k: v for k, v in d.items() if cond}

The filter invariant: the if cond clause runs after unpacking but before insertion; only entries where cond is truthy are kept. The expression slots k and v are unchanged from the original — this is a filter, not a map.

  • {k: v for k, v in d.items() if cond} — filter-only.
  • {k: f(v) for k, v in d.items() if cond} — filter-then-map.
  • Multiple if clauses — chain if cond1 if cond2; both must be truthy.
  • Negative filterif not cond for "drop matching" semantics.

Worked example. Keep only positive entries.

input output
{'a': 1, 'b': -2, 'c': 3, 'd': 0} {'a': 1, 'c': 3}

Worked-example solution.

d = {'a': 1, 'b': -2, 'c': 3, 'd': 0}
positive = {k: v for k, v in d.items() if v > 0}
print(positive)
# {'a': 1, 'c': 3}
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: filter-comprehensions read top-to-bottom: "for each pair, if condition, keep it." Reach for them before for k in list(d): if not cond: del d[k] (which mutates).

Nested-attribute predicates: v['status'] == 'active'

The nested-predicate invariant: inside the if clause, v is the inner dict; access fields with v['key'] or v.get('key'). The .get form is safer when the field is optional; use [] only when the schema guarantees it.

  • v['status'] — direct access; raises KeyError if missing.
  • v.get('status') — returns None if missing.
  • v.get('status') == 'active' — defensive equality check.
  • v.get('status', 'unknown') == 'active' — explicit default for missing.

Worked example. Filter users dict by status.

input output
{1: {'name': 'Alice', 'status': 'active'}, 2: {'name': 'Bob', 'status': 'inactive'}, 3: {'name': 'Carol', 'status': 'active'}} {1: {...Alice...}, 3: {...Carol...}}

Worked-example solution.

users = {
    1: {'name': 'Alice', 'status': 'active'},
    2: {'name': 'Bob', 'status': 'inactive'},
    3: {'name': 'Carol', 'status': 'active'},
}
active = {k: v for k, v in users.items() if v.get('status') == 'active'}
print(active)
# {1: {'name': 'Alice', 'status': 'active'}, 3: {'name': 'Carol', 'status': 'active'}}
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: .get for optional fields, [] for required ones; mixing them is a sign of unclear thinking about the data shape.

Combining filter and projection: return only certain fields per match

The combined invariant: the value_expr slot can transform v per match; common patterns are projecting a subset of fields, computing a derived value, or wrapping with a label. Keep the projection inside the comprehension; don't post-process with a second loop.

  • {k: v['name'] for k, v in d.items() if cond} — project a single field.
  • {k: {'name': v['name'], 'role': v.get('role', 'guest')} for k, v in d.items() if cond} — pick a subset of fields.
  • {k: f(v) for k, v in d.items() if cond} — call a transform function.
  • Nested comprehensions for re-shaping — pair with .items() of the inner dict.

Worked example. Project active users to {id: name}.

input output
(same as above) {1: 'Alice', 3: 'Carol'}

Worked-example solution.

active_names = {
    k: v['name']
    for k, v in users.items()
    if v.get('status') == 'active'
}
print(active_names)
# {1: 'Alice', 3: 'Carol'}
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: one comprehension can express "filter + project" cleanly; resist the temptation to break it into two passes.

Common beginner mistakes

  • Using v['status'] without .get and crashing on schema-missing records.
  • Putting the if before the for — that's a conditional expression, not a filter clause; syntax error in dict comprehensions.
  • Mutating d in place during iteration — RuntimeError: dictionary changed size during iteration.
  • Comparing v.get('status') is 'active' — works for short interned strings but is unreliable; use ==.
  • Forgetting that comprehension creates a new dict; the original d is unchanged.

Python Interview Question on Dictionary Status Filter

Given a dict-of-dicts where each value contains at least a status key, return a new dict containing only entries whose status is 'active'. Records missing the status field should be dropped (not raise).

def filter_active(d):
    ...
Enter fullscreen mode Exit fullscreen mode

Solution Using a filter dict-comprehension with .get

def filter_active(d):
    return {k: v for k, v in d.items() if v.get('status') == 'active'}
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace (input: 4 user records):

id name status
1 Alice active
2 Bob inactive
3 Carol active
4 Dave (missing)
  1. Iterate(1, {...Alice, active}), (2, {...Bob, inactive}), (3, {...Carol, active}), (4, {...Dave, no status}).
  2. Test predicate per pair
    • id 1: v.get('status') == 'active''active' == 'active' → True → keep.
    • id 2: 'inactive' == 'active' → False → drop.
    • id 3: True → keep.
    • id 4: v.get('status')None; None == 'active' → False → drop.
  3. Build result dict{1: {...}, 3: {...}}.

Output:

id name status
1 Alice active
3 Carol active

Why this works — concept by concept:

  • d.items() — yields (key, value) pairs over the outer dict.
  • v.get('status') — safe access; returns None for missing field instead of KeyError.
  • == 'active'None == 'active' is False, so missing-field records drop cleanly.
  • Filter clause if ... — runs after unpacking, before insertion; failed predicates produce no output entry.
  • New dict — comprehension constructs a new dict; the original is untouched, no mutation gotcha.
  • O(N) time, O(K) spaceN original entries, K matching entries in the output.

PYTHON
Topic — hash table
Hash-table problems

Practice →

PYTHON
Cisco — hash table
Cisco hash-table problems

Practice →


3. Python Decorators and functools.wraps for Function Timing

Decorator wrappers and high-resolution timing in Python for data engineering

"Build a decorator that times a function and prints elapsed seconds" is Cisco's signature decorator interview prompt. The mental model: a decorator is a function that takes a function and returns a wrapper; the wrapper captures t0 = perf_counter() before calling the wrapped function, captures t1 after, and prints or logs the difference. The same pattern powers any cross-cutting instrumentation — logging, retry, caching, auth checks.

Diagram of a function wrapped by a @timed decorator that captures perf_counter before and after the call and prints the elapsed seconds.

Pro tip: Always wrap with @functools.wraps(fn). Without it, the wrapped function's __name__, __doc__, and __module__ are clobbered to the wrapper's, which breaks introspection (Sphinx docs, debuggers, decorator stacks). It's a one-line ritual that interviewers expect to see.

Decorator skeleton: def decorator(fn): @wraps(fn) def inner(*a, **kw): ... return inner

The decorator invariant: a decorator is a higher-order function that takes a callable and returns a new callable with the same signature. The wrapper accepts *args and **kwargs to forward whatever the original function accepts.

  • Outer function — takes fn, the function being decorated.
  • @functools.wraps(fn) — preserves __name__, __doc__, etc.
  • Inner function — accepts *args, **kwargs to be call-signature-agnostic.
  • Call the originalresult = fn(*args, **kwargs).
  • Return the result — return value must propagate through the wrapper.

Worked example. Skeleton decorator that prints 'before' and 'after'.

import functools

def announce(fn):
    @functools.wraps(fn)
    def inner(*args, **kwargs):
        print('before')
        result = fn(*args, **kwargs)
        print('after')
        return result
    return inner

@announce
def add(a, b):
    return a + b

print(add(2, 3))
# before
# after
# 5
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: every wrapper accepts *args, **kwargs unless you're writing a single-purpose decorator for a known signature.

time.perf_counter() for high-resolution elapsed seconds

The timing invariant: perf_counter() returns a monotonic float measuring seconds since an arbitrary start point; differences between two calls give elapsed time with sub-microsecond resolution on most platforms. Use it for benchmarks; never use time.time() (which can jump backward across NTP adjustments).

  • time.perf_counter() — monotonic, high-resolution.
  • time.time() — wall-clock; can move backward; avoid for elapsed.
  • time.process_time() — CPU time only; use for compute-bound profiling.
  • time.perf_counter_ns() — nanosecond integer; same monotonic source.

Worked example. Time time.sleep(0.05).

import time

t0 = time.perf_counter()
time.sleep(0.05)
t1 = time.perf_counter()
print(f"elapsed = {t1 - t0:.4f} s")
# elapsed = 0.0501 s  (approximately)
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: perf_counter for elapsed; time.time for "what is the current wall clock"; never confuse them.

Preserving metadata with functools.wraps

The metadata invariant: @functools.wraps(fn) copies __name__, __doc__, __wrapped__, and __module__ from fn to the wrapper. Without it, decorated functions look like the wrapper to introspection, which breaks debugging tools and stacked decorators.

  • @functools.wraps(fn) — official one-liner.
  • functools.update_wrapper(inner, fn) — equivalent imperative form.
  • Without wrapsadd.__name__ becomes 'inner' instead of 'add'.
  • __wrapped__wraps sets this so callers can recover the original.

Worked example. Compare __name__ with and without wraps.

import functools

def bad(fn):
    def inner(*args, **kwargs):
        return fn(*args, **kwargs)
    return inner

def good(fn):
    @functools.wraps(fn)
    def inner(*args, **kwargs):
        return fn(*args, **kwargs)
    return inner

@bad
def f(): pass
@good
def g(): pass

print(f.__name__)   # 'inner'   (bad — clobbered)
print(g.__name__)   # 'g'       (good — preserved)
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: always reach for @functools.wraps; the one-line cost is always smaller than the debugging cost of forgetting it.

Common beginner mistakes

  • Forgetting @functools.wraps — silently breaks introspection.
  • Wrapping with a fixed signature like def inner(a, b) — fails on functions with different arities; always use *args, **kwargs.
  • Using time.time() for elapsed — non-monotonic; can yield negative deltas.
  • Forgetting to return result from the wrapper — decorated function returns None.
  • Capturing t0 outside the wrapper — measures decoration time, not call time.

Python Interview Question on Decorator Timing

Write a decorator timed that, when applied to any function, prints "<funcname> took <seconds> s" after each call and returns the function's original result. Preserve the wrapped function's name and docstring.

@timed
def slow_query(...):
    ...
Enter fullscreen mode Exit fullscreen mode

Solution Using functools.wraps and time.perf_counter

import functools
import time

def timed(fn):
    @functools.wraps(fn)
    def inner(*args, **kwargs):
        t0 = time.perf_counter()
        result = fn(*args, **kwargs)
        t1 = time.perf_counter()
        print(f"{fn.__name__} took {t1 - t0:.4f} s")
        return result
    return inner
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace (decorate and call a function):

@timed
def slow_query(n):
    """Pretend to query a database."""
    time.sleep(0.05)
    return n * 2

result = slow_query(7)
Enter fullscreen mode Exit fullscreen mode
  1. Decorationslow_query = timed(slow_query); slow_query now refers to the wrapper, but __name__ and __doc__ are preserved by wraps.
  2. Call slow_query(7) — control enters inner with args=(7,), kwargs={}.
  3. t0 = time.perf_counter() — capture start.
  4. result = fn(*args, **kwargs) — invokes the original slow_query(7); time.sleep(0.05); returns 14.
  5. t1 = time.perf_counter() — capture end.
  6. print(f"{fn.__name__} took {t1 - t0:.4f} s") — prints "slow_query took 0.0501 s" (approximately).
  7. return result — wrapper returns 14 to the caller.

Output:

stdout return value
slow_query took 0.0501 s 14

Why this works — concept by concept:

  • Outer def timed(fn) — takes the wrapped function and returns a new callable; the canonical decorator skeleton.
  • @functools.wraps(fn) — preserves slow_query.__name__, __doc__, __module__; without it, the print line would say inner took ....
  • *args, **kwargs — accepts any signature; the decorator works on functions of any arity.
  • time.perf_counter() — monotonic high-resolution timer; safe for elapsed measurements.
  • fn(*args, **kwargs) — invokes the original function with the forwarded arguments.
  • return result — propagates the return value; without it, callers get None and silently break.
  • O(1) overhead per call — two perf_counter reads + one print + one delegated call.

PYTHON
Topic — decorator
Decorator problems

Practice →

PYTHON
Topic — higher-order functions
Higher-order-function problems

Practice →


4. Python Sorting and Greedy for the Maximum Concatenated Substring

Greedy comparator-sort for max concatenation in Python for data engineering

"Given a list of digit chunks, concatenate them in the order that produces the maximum numeric string" is Cisco's signature MEDIUM problem (#287 Maximum Substring). The mental model: lexicographic sort fails because '9' sorts after '12' alphabetically but 9 is greater than 12 numerically; instead, sort with a custom comparator that asks "does a + b come before b + a when concatenated?". The pair (a, b) should go in whichever order produces the larger concatenation. This greedy comparator extends to a total order; sorting then concatenating yields the maximum.

Diagram showing a list of digit chunks compared pairwise via a+b versus b+a then sorted into the order that produces the maximum concatenated string.

Pro tip: The comparator a + b > b + a is the classic "greedy on pairs implies global optimum" trick — and it's not obvious from first principles. Memorize it: when concatenating chunks, you compare their string-concat results, not the chunks themselves.

Why lexicographic sort fails: '9' > '12' but 912 > 129

The lexicographic-fail invariant: string comparison is character-by-character; '9' > '12' because '9' > '1' at index 0. But concatenation order matters numerically: 9 + 12 = '912' (= 912) is larger than 12 + 9 = '129' (= 129). Lexicographic sort puts '12' before '9', producing '129...' — wrong.

  • '9' > '12' — string comparison is True (character-by-character).
  • int('912') > int('129') — also True; concat order matters.
  • sorted(['9', '12'])['12', '9']; descending: ['9', '12'] → concat = '912' (correct!).
  • Counter-example['3', '30']. Lexicographic descending: ['30', '3']'303'. But '330' is bigger.

Worked example. Lexicographic descending fails on ['3', '30'].

sort concat numeric
sorted(['3', '30'], reverse=True)['30', '3'] '303' 303
custom-comparator ['3', '30'] '330' 330

Worked-example solution.

chunks = ['3', '30']
print(sorted(chunks, reverse=True))     # ['30', '3'] — wrong order
# Reason: '3' < '30' lexicographically; reverse puts '30' first
# but '3' + '30' = '330' is bigger than '30' + '3' = '303'.
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: when ordering strings for concatenation, never trust lexicographic sort; use the pairwise concatenation comparator.

Custom key: cmp_to_key(lambda a, b: -1 if a+b > b+a else 1)

The comparator invariant: functools.cmp_to_key(cmp) adapts a two-argument comparator to a key= callable for sorted. The comparator returns negative if a should come before b, positive if b should come before a, zero if equal.

  • from functools import cmp_to_key — the import.
  • cmp_to_key(lambda a, b: -1 if a+b > b+a else (1 if a+b < b+a else 0)) — full three-way.
  • Shorthandcmp_to_key(lambda a, b: -1 if a+b > b+a else 1) works when ties are acceptable.
  • sorted(chunks, key=cmp_to_key(...)) — applies the comparator pairwise.

Worked example. Sort ['9', '30', '34', '5', '3'].

Pairwise comparisons:

  • '9' vs '30': '930' > '309'9 first.
  • '9' vs '5': '95' > '59'9 first.
  • '5' vs '34': '534' > '345'5 first.
  • '34' vs '3': '343' > '334'34 first.
  • '3' vs '30': '330' > '303'3 first.

After full sort: ['9', '5', '34', '3', '30']. Concat: '9534330'.

Worked-example solution.

from functools import cmp_to_key

chunks = ['9', '30', '34', '5', '3']
ordered = sorted(chunks, key=cmp_to_key(lambda a, b: -1 if a + b > b + a else 1))
print(ordered)
# ['9', '5', '34', '3', '30']
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: a + b > b + a is the comparator that always works for "max concatenation"; memorize the pairwise form.

Concatenation after sort: ''.join(sorted_chunks)

The join invariant: after the comparator-sort produces the right order, ''.join(...) concatenates them into the final string. str.join is O(N) total for N chunks; faster and clearer than chained +=.

  • ''.join(seq) — concatenate strings in seq with no separator.
  • '-'.join(seq) — with separator (different problem).
  • sum(seq, '') — works but O(N²); never use for strings.
  • Edge case — if all chunks are '0', the result is '000...'; convention often says return '0' instead. Check the prompt.

Worked example. Apply join to the sorted list.

sorted concat
['9', '5', '34', '3', '30'] '9534330'

Worked-example solution.

ordered = ['9', '5', '34', '3', '30']
result = ''.join(ordered)
print(result)
# '9534330'
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: ''.join is the only correct way to concatenate a list of strings in Python; never use a += loop.

Common beginner mistakes

  • Sorting lexicographically with reverse=True and assuming it works — fails on ['3', '30'].
  • Forgetting from functools import cmp_to_keyNameError.
  • Returning a list instead of a string — read the prompt carefully.
  • Mutating the input — use sorted(...) (returns new list) instead of .sort() (in place).
  • Mixing string chunks and integer chunks — convert to strings first; the comparator depends on string concatenation.

Python Interview Question on the Maximum Concatenated Substring

Given a list of non-negative integer chunks (as strings), return the largest possible numeric string formed by concatenating them in some order.

def max_concat(chunks):
    ...
Enter fullscreen mode Exit fullscreen mode

Solution Using cmp_to_key with the a + b > b + a comparator

from functools import cmp_to_key

def max_concat(chunks):
    if not chunks:
        return ''
    ordered = sorted(chunks, key=cmp_to_key(lambda a, b: -1 if a + b > b + a else 1))
    result = ''.join(ordered)
    # Handle edge case where all inputs are zero
    return '0' if result.lstrip('0') == '' else result
Enter fullscreen mode Exit fullscreen mode

Step-by-step trace (input: ['9', '30', '34', '5', '3']):

  1. Empty check — list non-empty, continue.
  2. Build comparatorcmp_to_key(lambda a, b: -1 if a + b > b + a else 1).
  3. Pairwise sort — applies the comparator across pairs:
    • '9' + '30' = '930' vs '30' + '9' = '309'9 first.
    • '9' + '34' = '934' vs '34' + '9' = '349'9 first.
    • '5' + '34' = '534' vs '34' + '5' = '345'5 first.
    • '34' + '3' = '343' vs '3' + '34' = '334'34 first.
    • '3' + '30' = '330' vs '30' + '3' = '303'3 first.
  4. Sorted result['9', '5', '34', '3', '30'].
  5. Join'9534330'.
  6. Zero-strip check'9534330'.lstrip('0') is '9534330', non-empty, return as-is.

Output:

return value
'9534330'

Why this works — concept by concept:

  • Pairwise comparator — the a + b > b + a test on strings answers "if I had only these two chunks, which order produces the bigger concatenation"; this local greedy choice extends to a global total order.
  • cmp_to_key adapter — Python's sorted takes a key= callable, not a cmp= callable; cmp_to_key bridges the API gap.
  • String concatenation comparison — comparing string concatenations bypasses the integer-vs-string mismatch; works for any chunk length.
  • ''.join(ordered)O(N) total concatenation; correct and idiomatic.
  • Zero-strip edge case['0', '00'] would produce '000'; convention says return '0' for all-zero inputs.
  • O(N log N · L) timeN log N comparisons, each O(L) string concat where L is average chunk length; O(N · L) space for the sorted output.

PYTHON
Topic — sorting
Sorting problems

Practice →

PYTHON
Topic — greedy
Greedy problems

Practice →


Tips to Crack Cisco Data Engineering Interviews

SQL is not on this loop — skip it

The curated Cisco practice set is 4 Python, 0 SQL. Spending interview-prep time on SQL window functions for a Cisco data-engineering interview is a misallocation; spend it on dict comprehensions, status filters, decorators, and the greedy comparator-sort. If a SQL question appears, treat it as a different role family rather than the canonical loop covered here.

Drill the four Cisco Python primitives, in order

The four primitives in the curated Cisco Python practice set are dict-comprehension key-value inversion (#91), status-filter dict comprehension (#96), @functools.wraps decorator with time.perf_counter (#97), and cmp_to_key greedy comparator-sort (#287). Each maps to a specific module: vanilla dict for #91 and #96, functools + time for #97, functools.cmp_to_key for #287. Memorize the imports as muscle memory.

Decorator fluency is a Cisco signature

97 Decorator Timing is the make-or-break easy-tier round at Cisco. Candidates who can't fluently write @functools.wraps, *args, **kwargs, and time.perf_counter lose the decorator round. Drill the decorator practice page and the higher-order functions practice page until you can write the timing decorator from memory in under two minutes.

The maximum-substring greedy is the medium-tier ceiling

287 Maximum Substring is the only MEDIUM, and the trick is recognizing that lexicographic sort fails. Candidates who reach for sorted(chunks, reverse=True) lose the round; the right move is cmp_to_key(lambda a, b: -1 if a+b > b+a else 1). Drill the greedy practice page and the sorting practice page until the comparator pattern is automatic.

Easy-tier discipline matters

Three of four Cisco problems are EASY-tier. Easy at Cisco does not mean trivial; it means the interviewer expects zero hesitation, idiomatic Python, and an articulated invariant. A correct EASY answer with stuttering or an off-by-one is graded worse than a correct MEDIUM answer with the same bug, because the EASY signal is supposed to be "this candidate writes Python fluently." The Cisco easy practice page is the right warmup.

Where to practice on PipeCode

Start with the Cisco practice page for the curated 4-problem set. After that, drill the matching topic pages: dictionary, hash-table, decorator, higher-order-functions, sorting, greedy. The interview courses page bundles the Python course if you want a structured curriculum. For broad coverage, browse by topic.

Communication and approach under time pressure

Talk through the invariant first, the brute force second, and the optimal third — even if you go straight to the optimal in code. Interviewers grade process as much as the final answer. Leave 5 minutes at the end of each problem for an edge-case sweep: empty dict, missing fields, all-zero inputs, decorator on a function that raises. The most common "almost passed" failure mode is correct happy-path code that crashes on edge cases — a 30-second sweep prevents it.


Frequently Asked Questions

What is the Cisco data engineering interview process like?

Cisco's data engineering interview opens with a recruiter screen, then a technical phone screen with one Python coding problem, then an onsite (or virtual onsite) of three to four rounds: two Python coding rounds (dict comprehensions, status filters, decorators, greedy sorting), one data-modeling or system-design discussion (telemetry pipelines, network analytics), and one behavioral round. The coding rounds are heavy on the patterns covered in this guide.

What Python topics does Cisco test for data engineers?

Cisco Python interviews concentrate on four primitives that map directly to the curated 4-problem set: dict-comprehension key-value inversion (#91), filter dict-comprehension over nested status fields (#96), @functools.wraps decorator with time.perf_counter (#97), and cmp_to_key greedy comparator-sort (#287). Memorize the imports: from functools import wraps, cmp_to_key; import time; vanilla dict for the comprehensions.

How hard are Cisco data engineering interview questions?

The Cisco set is 3 easy + 1 medium + 0 hard. The MEDIUM problem (#287 Maximum Substring) hinges on recognizing that lexicographic sort fails and switching to the a + b > b + a greedy comparator. The three EASY problems test fundamentals: dict comprehension, filter comprehension, and decorator-with-functools.wraps. Stuttering on any EASY problem signals lack of fluency.

Are decorators and dict comprehensions common in Cisco interviews?

Yes — three of four Cisco problems test these patterns explicitly. #91 and #96 are dict comprehensions; #97 is decorators. If you cannot fluently write {v: k for k, v in d.items()}, {k: v for k, v in d.items() if v.get('status') == 'active'}, and a @functools.wraps decorator skeleton with *args, **kwargs, you cannot pass three of the four Cisco rounds. Drill these patterns until they are reflexive.

How many Cisco practice problems should I solve before the interview?

Solve all 4 problems on the Cisco practice page end-to-end — untimed first, then timed at 25 minutes per problem. After that, broaden to 20 to 30 additional Python problems spread across the matching topic pages: dictionary, hash-table, decorator, higher-order-functions, sorting, greedy. The Cisco Python practice page and the Cisco easy practice page are the right surfaces for the final week of prep.


Start practicing Cisco data engineering problems

Top comments (0)