Remove Duplicates From an Array in JavaScript

If you already understand the basics of JavaScript arrays, it’s time to take your skills to the next level with more advanced topics. In this series of tutorials, you’ll explore intermediate-level topics for programming with arrays in JavaScript.

In this post we are going to deal with a commonly needed functionality for arrays in JavaScript: how to remove duplicates. There are multiple ways of removing array duplicates, I’ll show you each in this tutorial.

You’ll learn how to remove duplicates from an array of primitives, and how to remove duplicates from an array of objects the right way. 

Here’s a summary of the methods we’ll cover:










Method Benefits Drawbacks
Set O(n) performance only works with primitive arrays
Array.reduce simple implementation O(n2) performance
Array.filter simple implementation O(n2) performance
nested loops doesn’t use modern JavaScript features O(n2) performance, more error-prone
hashing O(n) performance only works with primitive arrays
hashing with custom keys O(n) performance, works with complex types more complicated implementation

Using Set to Remove Duplicates From an Array

This is one of our most-loved methods for removing duplicates from an array. First you should know what a Set is. A Set is an important data object introduced in ES6. It can only store unique values. The moment you convert an array into a set, it will remove all duplicates. The use of set to remove duplicates involves two stages:

  1. Create a new Set using the values of the array. 
  2. Convert the Set back into an array. For this, you can use the spread operator or the Array.from function.
1
const array = [1,1,2,3,2,2,1]
2

3
//Step 1
4
const newSet = new Set(array)
5

6
//Step 2
7
const uniqueArray = [...newSet]
8
//or
9
const uniqueArray = Array.from(new Set(array))
10

11
console.log(uniqueArray) // [1,2,3]

Time Complexity of Using Set

Creating a set from an existing array has time complexity O(N), ie. it is proportional to the length of the array. That’s because the internal implementation of Set in real-world JavaScript implementations use an efficient data structure like a hash table. This is much faster than many other methods for removing duplicates from an array.

A Drawback of Using Set

You can use Set only on primitive values. Set does not work when you want to remove duplicates from an array of objects. If you want an efficient method to remove duplicates from an array of objects, scroll to the bottom of this post.

Using filter to Remove Duplicates From an Array

filter is one of the oldest methods for removing duplicates from an array.  Before we understand how the filter works, you need to be aware of the indexOf method. The indexOf method is used to find the first index, of any item in an array. We will use this logic, inside the filter function.

  1. Iterate every element in the array using the filter method. The filter method returns a new array containing only the elements that return true on a helper function.
  2. Filter will call the helper function with each item and its index. Next we search to see if that item occurs earlier in the array using the indexOf method. This happens when indexOf returns a different value than the index passed to the helper function.
  3. If indexOf finds another instance of the item in the array, return false from the helper function. This will tell filter to not include the current item in the filtered array.
  4. Otherwise the item is not a duplicate and the helper function can return true
1
const array = [ 1, 2, 1, 4, 2, 3];
2

3
const newArray = array.filter((item, index) => {
4
  return array.indexOf(item) === index;
5
});
6

7
console.log(newArray) // [1,2,4,3]

Here is a table to help you understand how the logic works:










item index indexOf(item) Result
1 0 0 true
2 1 1 true
1 2 0 false
4 3 3 true
2 4 1 false
3 5 5 true

Time Complexity of Using Filter

The time complexity to remove duplicates using filter is O(n2). For short arrays this is an easy way to remove duplicates, but as the array becomes longer it will be much more expensive than using Set.

Using reduce to Remove Duplicates From an Array

Another interesting method for removing elements from an array would be reduce. Here, we can use the indexOf function too. However, we are going to try another method called includes

Reduce can be rather tricky, if you are unaware of how it works. So, let us help you by breaking down the entire process. Every reduce helper function has an accumulator, and a current item. With each operation, we can choose to add another item into the accumulator, or skip. In our case, the pseudo code would be as follows.

  1. Iterate through every item in the array using reduce.
  2. The helper function will be passed two parameters: the accumulator (the unique items array), and the current item.
  3. If the accumulator has the current item already, return the accumulator without modification. We use the includes function to perform this check.
  4. Otherwise insert the item into the accumulator and return.
1
const array = [1, 1, 2, 4, 2, 3];
2

3
const newArray = array.reduce((unique, item) => {
4
  return unique.includes(item) ? unique : [...unique, item];
5
}, []); 
6

7
console.log(newArray) // [1,2,4,3]

Time Complexity of Using Reduce

The time complexity to remove duplicates using reduce is O(n2). In general, the time to run reduce on any array is O(n). However, we are executing Array.includes for each element inside the reduce helper function, which makes the overall time complexity increase to O(n2)

Using Nested Loops to Remove Duplicates From an Array

Now, we are going to look at a classic approach that uses a forEach loop and the find function to remove duplicates. 

Pseudo Code for Using nested loops

  1. Create a new array called unique, to store values without duplicates.
  2. Iterate through the array using the forEach method. 
  3. Use the find method within each forEach iteration to check if the current item is already present in the array unique. If the current item is not present, insert it.
1
const array = [1,3,4,1,2,3,4,1]
2

3
function removeDuplicates(inputArray) {
4
  const unique = [];
5
  inputArray.forEach(item => {
6
    const isFound = unique.find(inputItem => item === inputItem);
7
    if (!isFound) 
8
      unique.push(item);
9
  });
10
  return unique;
11
}
12

13
console.log(removeDuplicates(array)) // [1,3,4,2]

Time Complexity of Using Nested Loops

The removeDuplicates method calls unique.find() for each element in the input array. Again, this makes the overall time complexity O(n2).

Using Hashes to Remove Duplicates From an Array

As you’ve seen, other than the Set technique, all the ways of removing duplicates from an array have O(n2) time complexity. Not great for larger datasets.

Now, we are going to do something interesting. Let us write a custom function which removes duplicates with a time complexity of O(n). We are going to take advantage of the fact that objects are effectively hash maps.  The time complexity for accessing any value from an object is O(1). Here is the pseudocode:

  1. Create a temporary object to track existing items and an array to collect unique items.
  2. Iterate through the array using forEach.
  3. Test if the current item has been previously found by checking whether it exists as a key in the temporary object. 
  4. If not, create one! And, push the item into the array unique.
1
function removeDuplicates(inputArray){
2
  const unique = [];
3
  const obj = {}; 
4
  inputArray.forEach(item => {
5
    if (!obj[item]) { 
6
      unique.push(item);
7
      obj[item] = item;
8
    }
9
  });
10
  return unique;
11
}

Time Complexity of Using Hashes

The time complexity here in removing duplicates is O(n), much better than the other techniques that use only the array methods. Of course, just like with using Set, this method only works on primitive values, not objects.

Remove Duplicates From an Array of Objects

What if you need to remove duplicates from an array of objects? This is a common use case not addressed well by the other methods.

It’s not hard to do, though. We’ll use the efficient hash-based function above, with one small change

1
function removeDuplicatesByKey(inputArray, keyFunction){
2
  const unique = [];
3
  const obj = {}; 
4
  inputArray.forEach(item => {
5
    let key = keyFunction(item)
6
    if (!obj[key]) { 
7
      unique.push(item);
8
      obj[key] = item;
9
    }
10
  });
11
  return unique;
12
}

This uses the same hash-based comparison test to determine whether each item is unique. But this time, we’ll use a custom key to track the items. If two items have the same key, they can be considered the same. The keyFunction parameter is what makes that possible. It’s a helper function that takes an item and returns a key for that item.

But what to use as the key? It depends on the specific data you’re working with. But there are a couple common options. 

Using an Id Field as the Key

This is probably the most common solution. If the objects have a uniquely identifying field—ie. an id—we can just use that as the key.

1
const array = [{id:1, name: 'a'},{id:2, name: 'b'},{id:1, name: 'a2'} ]
2

3
const idKey = (item) => item.id;
4
const uniqueArray = removeDuplicatesByKey(array, idKey);
5

6
console.log(uniqueArray) //[{id:1, name: 'a'},{id:2, name: 'b'}]

Using JSON.stringify as the Key

You can also use JSON.stringify(). This is rather a generic solution and it doesn’t require the data items to have unique ids. The idea is to use the JSON representation of each item as a key. This effectively does a deep comparison of each item’s object structure and will only recognize items as being equal if they have the same object structure and values.

1
const array = [{x:1, y:1}, {x:2, y:1}, {x:1, y:2}, {x:1, y:1}]
2

3
const jsonKey = (item) => JSON.stringify(item);
4
const uniqueArray = removeDuplicatesByKey(array, jsonKey);
5

6
console.log(uniqueArray) //[{id:1, name: 'a'},{id:2, name: 'b'}]

Conclusion

In this post we saw a number of ways to remove duplicates from a JavaScript array. The methods that use hashing have much better performance when arrays get large. However is also possible to remove duplicates using only the built-in Array methods. 

Finally, if you need to remove duplicates from an array of objects, you can define a key for each item and use that to filter duplicates out of the list.


Source link