Efficiently Finding Duplicates in a List Using Python

Finding duplicates in a list is a common problem in programming. A straightforward but inefficient way is to use nested loops, resulting in an O(n²) time complexity. However, Python provides a much faster solution using sets, reducing the time complexity to O(n).

In this blog, we'll first explore the brute-force approach and then move on to the optimized solution using sets.

Brute-Force Approach: Nested Loops (O(n²))

The simplest way to find duplicates is by comparing each element with every other element using two loops.

def find_number(arr):
    for i in range(len(arr)):
        for j in range(i + 1, len(arr)):  
            if arr[i] == arr[j]:
                return f"Number found: {arr[i]}"
    return f"Number not found in {arr}"

# Example Run
print(find_number([1, 2, 3, 2, 5]))

Time Complexity: O(n²) – For each element, we scan the remaining elements.Space Complexity: O(1) – No extra data structures are used.

For large lists, this approach becomes very slow since it checks each number against every other number.

Optimized Approach: Using a Set (O(n))

A more efficient solution uses a set, allowing O(1) lookups and O(1) insertions on average.

Optimised Code

def find_number(arr):
    num_set = set()  # Create an empty set for fast lookups

    for num in arr:
        if num in num_set:
            return f"Number found: {num}"  
        num_set.add(num)  # Store unique numbers

    return f"Number not found in {arr}"

# Example Run
print(find_number([2, 3, 4, 7]))

Why is Searching in a Set O(1)?

A set in Python is implemented as a hash table, which provides constant-time lookups (O(1)) in most cases.

How Does it Work?

Hashing the Element:

When an element is inserted into a set, Python calculates its hash value using the hash() function.
Example: hash(5) gives a unique number that determines where 5 is stored internally.

Indexing Using Hash Values:

The set uses this hash value as an index in an internal array.
If no collision occurs, retrieving or checking an element happens instantly.

Handling Collisions:

In rare cases, different values may produce the same hash.
Python resolves this using probing (searching for another slot) or chaining (storing multiple values at one index).
Even with collisions, lookups remain O(1) on average.

Time Complexity Summary

Average Case: O(1) (Fast because of hash table indexing).
Worst Case (Many Collisions): O(n) (Extremely rare).

Unlike lists (O(n) search), sets use hash tables to provide near-instant lookups (O(1) time complexity).

Complexity Analysis

Time Complexity: O(n) – We iterate through the list once, and each lookup & insertion in a set takes O(1).
Space Complexity: O(n) – In the worst case (no duplicates), we store all elements in the set.

Example Runs

Case 1: Duplicate Found

print(find_number([1, 2, 3, 2, 5]))  
# Output: "Number found: 2"

Case 2: No Duplicates

print(find_number([2, 3, 4, 7]))  
# Output: "Number not found in [2, 3, 4, 7]"

Why is This Better?

Approach	Time Complexity	Space Complexity	Best For
Brute-Force (Nested Loops)	O(n²)	O(1)	Small lists
Optimized (Using a Set)	O(n)	O(n)	Large datasets

The set-based approach is significantly faster, making it the preferred method for large datasets.

Final Thoughts

Brute-force methods work for small inputs but scale poorly.
Using a set allows O(1) lookups, making the solution O(n) in time complexity.
Python’s set implementation uses hash tables, making searches extremely fast.
Always optimize algorithms when working with large data to improve performance.

mlTutor