top of page

Efficiently Finding Duplicates in a List Using Python

Mr. Data Bugger

Finding duplicates in a list is a common problem in programming. A straightforward but inefficient way is to use nested loops, resulting in an O(n²) time complexity. However, Python provides a much faster solution using sets, reducing the time complexity to O(n).

In this blog, we'll first explore the brute-force approach and then move on to the optimized solution using sets.


Brute-Force Approach: Nested Loops (O(n²))


The simplest way to find duplicates is by comparing each element with every other element using two loops.


def find_number(arr):
    for i in range(len(arr)):
        for j in range(i + 1, len(arr)):  
            if arr[i] == arr[j]:
                return f"Number found: {arr[i]}"
    return f"Number not found in {arr}"

# Example Run
print(find_number([1, 2, 3, 2, 5]))  

Time Complexity: O(n²) – For each element, we scan the remaining elements.Space Complexity: O(1) – No extra data structures are used.

For large lists, this approach becomes very slow since it checks each number against every other number.


Optimized Approach: Using a Set (O(n))


A more efficient solution uses a set, allowing O(1) lookups and O(1) insertions on average.


Optimised Code

def find_number(arr):
    num_set = set()  # Create an empty set for fast lookups

    for num in arr:
        if num in num_set:
            return f"Number found: {num}"  
        num_set.add(num)  # Store unique numbers

    return f"Number not found in {arr}"

# Example Run
print(find_number([2, 3, 4, 7]))  

Why is Searching in a Set O(1)?


A set in Python is implemented as a hash table, which provides constant-time lookups (O(1)) in most cases.

How Does it Work?


Hashing the Element:

  • When an element is inserted into a set, Python calculates its hash value using the hash() function.

  • Example: hash(5) gives a unique number that determines where 5 is stored internally.


Indexing Using Hash Values:

  • The set uses this hash value as an index in an internal array.

  • If no collision occurs, retrieving or checking an element happens instantly.


Handling Collisions:

  • In rare cases, different values may produce the same hash.

  • Python resolves this using probing (searching for another slot) or chaining (storing multiple values at one index).

  • Even with collisions, lookups remain O(1) on average.


Time Complexity Summary

  • Average Case: O(1) (Fast because of hash table indexing).

  • Worst Case (Many Collisions): O(n) (Extremely rare).


Unlike lists (O(n) search), sets use hash tables to provide near-instant lookups (O(1) time complexity).


Complexity Analysis

  • Time Complexity: O(n) – We iterate through the list once, and each lookup & insertion in a set takes O(1).

  • Space Complexity: O(n) – In the worst case (no duplicates), we store all elements in the set.


Example Runs

Case 1: Duplicate Found

print(find_number([1, 2, 3, 2, 5]))  
# Output: "Number found: 2"

Case 2: No Duplicates

print(find_number([2, 3, 4, 7]))  
# Output: "Number not found in [2, 3, 4, 7]"

Why is This Better?

Approach

Time Complexity

Space Complexity

Best For

Brute-Force (Nested Loops)

O(n²)

O(1)

Small lists

Optimized (Using a Set)

O(n)

O(n)

Large datasets

The set-based approach is significantly faster, making it the preferred method for large datasets.


Final Thoughts

  • Brute-force methods work for small inputs but scale poorly.

  • Using a set allows O(1) lookups, making the solution O(n) in time complexity.

  • Python’s set implementation uses hash tables, making searches extremely fast.

  • Always optimize algorithms when working with large data to improve performance.


Key Takeaway: Use sets for faster duplicate detection in lists!


Recent Posts

See All

Comments


Subscribe Form

Thanks for submitting!

©2021 by MLTUTOR. 

bottom of page