Published on

7. List, set, and dictionary in Pyhon

Authors

7. If you have to choose between a list, set, and a dictionary to store 10 million integers, what will you use? Bear in mind that you would later like to query the frequency of a number within the dataset.

The problem:

  • Store 10 million integers.
  • Later query the frequency of any number efficiently.

Option 1: List

  • Stores duplicates, keeps order.
  • To find frequency: list.count(x)O(n) time.
  • With 10 million integers, frequency queries would be very slow.
  • Memory: stores all values including duplicates (could be large).

Option 2: Set

  • Stores only unique elements (no duplicates).
  • No frequency info is stored, so you cannot query frequency.
  • Good for membership tests (is x present?), but not frequency.
  • Memory: smaller than list if many duplicates.

Option 3: Dictionary (or collections.Counter)

  • Keys: integers; Values: frequency counts.
  • Frequency queries: O(1) average time.
  • Stores frequency directly, so you get what you need efficiently.
  • Memory: overhead for dictionary, but usually acceptable for 10 million entries (especially if many duplicates).

Recommendation:

Use a dictionary or collections.Counter to store frequencies.

Example:

from collections import Counter

data = [...]  # your 10 million integers list
freq = Counter(data)

# Query frequency of a number:
print(freq[42])  # O(1) time

Summary:

Data StructureSupports Frequency Query?Query TimeMemory Use
ListYes (with count())O(n) (slow)High (stores all values)
SetNoN/ALower (unique values only)
DictionaryYesO(1) (fast)Higher (stores counts)

Final answer:

Use a dictionary (Counter) to efficiently store and query frequencies.