Skip to content

Data Structures

Ruy Diaz edited this page Aug 13, 2016 · 7 revisions
  • Contiguous: Array, matrix, heap, hash table
  • Linked: lists, trees, graph
  • Containers
    • Retrieval independent of content
      • Stacks
      • Queues

Dictionaries

Retrieve data by content (search by key)

Operation Unsorted Array Sorted Array Singly unsorted Double unsorted Singly sorted Doubly sorted
Search O(n) O(logn) O(n) O(n) O(n) O(n)
Insert O(1) O(n) O(1) O(1) O(n) O(n)
Delete O(1)1 O(n) O(n)2 O(1) O(n)2 (1)
Succ O(n) O(n) O(n) O(n) O(1) O(1)
Pred O(n) O(n) O(n) O(n) O(n)2 O(1)
Min O(n) O(1) O(n) O(n) O(1) O(1)
Max O(n) O(1) O(n) O(n) O(1)3 O(1)

1 Swap A[x] with A[n] for constant time

2 Must spend linear time finding predecessor

3 Maintain separate pointer to tail of list (update on every insert/delete), with constant time on doubly linked lists. for singly linked, add an extra linear sweep to update the pointer on delete.

  • Possible data structures:
    • Unsorted arrays or linked lists. Bad for dictionaries larger than 50 or 100 items
    • Sorted linked lists or arrays. Not worth effort unless eliminating duplicates (for binary search). Only appropriate with few insertions/deletions
    • Hash table - for moderate to large number of keys 100-10M).
      • Num buckets should be about the size of the number of items expected to be put in table
      • With open addressing, 30% more
      • selecting num buckets to be a prime number minimizes the dangers of a bad hash function
    • Binary search trees
      • random search - insert node at leaf where it can be found, no rebalancing
      • balanced tree - use rotation to restructure
    • B-trees - for data sets that don't fit into main memory (more than 1M items)

Trees

  • Full and complete binary tree: all non-leaf nodes have exactly two children (2n - 1 nodes)
  • Balanced trees - depth of subtrees will not vary by more than a certain amount
  • Traversal: in-order, pre-order, post-order
  • To check identity between trees, a single traversal (in-, pre- or post-) is not sufficient. Must combine in-order with either pre- or post-order
    • Need to insert special character to identify when a left or right node is null
     Same in-order, pre-order traversal, but not identical
      3    3
     /      \
    3        3
    

Binary Search tree

  • left, right, parent, item
  • Search is O(height); h = logn for a balanced tree; h = n for worst case insertion order
    • A randomly built tree gives a height of logn
  • Search
  • Insert
  • Delete
    • 0 children: clear pointer to node
    • 1 child: point grandparent to grandchild
    • 2 children: relabel with key to next successor (min of right subtree), remove the successor node (with at most one child)
  • Max/Min
  • Pred/Succ
  • How does binary tree handle identity?

  • Can be used for dictionaries and priority queue

  • Binary search tree property: if node y is in left subtree of x, key[y] ≤ key[x]; if node y is in right subtree of x, key[x] ≤ key[y]

    • Corollary: all nodes in left subtree are less than x, and all nodes in right subtree are more than x
    Invalid BST
       20
      /  \
    10   30
      \
       25
    
  • Inorder tree walk - print out keys in sorted order

    inorder = (node)->
      if node != nil
        inorder(node.left)
        print node.key
        inorder(node.right)
    
  • Search

    searchRecursive = (node, k) ->
      return node if node.nil? || node.key == k
      if k < node.key
        search(node.left, k)
      else
        search(node.right, k)
    # More efficient on most computers (why?)
    searchIterative = (node, k) ->
      while node?.key != k
        if k < node.key
          node = node.left
        else
          node = node.right
    
    treeSuccessor = (x) -> # O(h)
      if x.right?
        treeMinimum(x.right)
      else
        y = x.parent
        while y? && y.right == x
          x = y
          y = y.parent
        return y
    
    treeInsert = (tree, z)->  # O(h)
      y = nil
      x = tree.root
      while x?
        y = x
        if z.key < x.key
          x = x.left
        else
          x = x.right
      z.parent = y
      if y == nil
        tree.root = y
      else if z.key < y.key
        y.left = z
      else
        y.right = z
    
    treeInsertRecursive = (tree, key, parent) ->
      if tree == null
        node =
          key: key
          left: null
          right: null
          parent: parent
        tree = node
        return
      if x < tree.key
        treeInsertRecursive(x.left, key, x) # how to pass left as a "pointer"?
      else
        treeInsertRecursive(x.right, key, x)
    
    treeDelete = (tree, z) -> # O(h)
      if z.left == nil || z.right == nil # Zero or One child
        y = z
      else # Two children
        y = treeSuccessor(z)
      if y.left?
        x = y.left
      else
        x = y.right
      if x? # No children
        x.parent = y.parent
    
      if y.parent == nil
        tree.root = x
      else if y == y.parent.left
        y.parent.left = x
      else
        y.parent.right = x
    
      if y != z
        z.key = y.key
        # copy sattelite data into z
    
      return y
    

Red-Black trees

  • balanced, guarantees O(lg n) operations
  • extra bit of storage per node (colour: red/black)
  • Properties:
    • every node is either red or black
    • root is black
    • every leaf (nil) is black
    • if a node is red, both children are black
    • for each node, all paths from node to descendant leaves have same number of black nodes
  • Pointer structure adjusted through rotations which preserve the properties.
    • Left rotate assumes right child is not nil
    • Right rotate assumes left child is not nil
    • Rotation pivots around the link from x to y, making y the new root of the subtree and x the left child of y and ys left child becomes x's right child
      x
     / \
    A   y
       / \
       B  C
    Left Rotate(T, x) # O(1)
       y
      / \
     x   C
    / \
    A B
    
  • After inserting (just like regular insert) setting the new node as red, run an auxiliary RB-Insert-Fixup to recolour nodes and perform rotations
    • Case 1: z's uncle is red - recolour parent and uncle black and grandparent red. Repeat with grandparent as z
    • Case 2: z's uncle y is black and z is a right child - left rotation on z's parent, ans z becomes z's new left (the former parent) to transform to case 3
    • Case 3: z's uncle y is black and z is a left child - (colour changes?) rotate right
  • After deleting, run auxiliar rb-delete-fixup if y is black (the spliced node)

Heaps

  • An array that can be viewed as a nearly complete binary tree
  • Tree is filled on all levels except possibly the lowest, which is filled from the left
  • A heap is represented by an array and two attributes: length and heap-size. (heap-size <= length)
  • Root of the tree is the first element in the array
  • Indices of parent and children are computed by:
    • parent = (i)-> Math.floor((i+1)/2) - 1
    • leftChild = (i) -> 2*(i+1) - 1
    • rightChild = (i) -> 2*(i+1) // leftChild(i) + 1
  • Two types of heaps: max-heap and min-heap
    • max-heap - A[parent(i)] >= A[i]; largest element stored at root
    • min-heap - A[parent(i)] <= A[i]; smallest element stored at root
  • The height of the heap of lg n.
  • Elements A[(⌊n/2⌋+1)..n] are all leaves of the tree
  • Operations on heap are O(h) = O(lgn)
    • maxHeapify/minHeapify - maintain heap-property (O(lgn)). Worst case is when the last row is exactly half full
    • buildMaxHeap/buildMinHeap - produce a heap from an unsorted input (O(n))
    • heapsort - sort in place. (O(nlgn))
    • insert/extractMax/increaseKey/max - allow heap to be used as priority queue (O(lgn))
  • maxHeapify(A, i) - float value down to right place
maxHeapify = (A,i) ->
  l = left(i)
  r = right(i)
  largest = if l <= heapSize && A[l] > A[i] then l else i
  if r <= heapSize && A[r] > A[largest]
    largest = r
  if largest != i
    swap(A[i], A[largest])
    maxHeapify(A, largest)
  • buildMaxHeap(A) - Goes through each leave in the tree, which is a 1 element heap. Go through each remaining node (bottom-up) and call maxHeapify on each
buildMaxHeap = (A) ->
  heapSize = A.length
  for i in [Math.floor(A.length/2)..1]
    maxHeapify(A, i-1)
  • heapsort(a) - since max element is in root, we know its final position (A[n]) and can be swapped. Decreasing heap-size by 1 makes the remaining elements easily turned into a max-heap after calling maxHeapify on the new root
heapsort = (A) ->
  buildMaxHeap(A)
  for i in [A.length..2]
    swap(A[1], A[i])
    heapSize--
    maxHeapify(A, 1)

Priority Queue

  • Type of heap
  • max-priority/min-priority
  • Allow retrieval in terms of which has highest priority (opposed to insertion time like stack or queue)
  • If application will perform no insertions after initial query, no need for an explicit priority queue, just sort by priority and proceed top to bottom. If mixing insertions, deletions and queries, a priority queue is needed
  • Insert
  • Find Min/ Find Max
  • Delete Min/ Delete Max
  • Increase-key(S,x,k) (from new book only?)
Operation Unsorted Array Sorted Array Balanced tree
Insert O(1) O(n) O(log n)
Find Min O(1)1 O(1)1 O(1)1
Del Min O(n) O(1)2 O(log n)

1 Store a pointer to the min entry. Update on insert. On delete, do a new search (linear and logn) to find the new min

2 Order descending, so delete is always last element and requires no re-arranging

Available data structures:

  • Sorted array or list - efficient to identify max/min and delete it. Maintaining order is slow. Only suitable with few insertions
  • Binary heaps - support insertion and extract min efficiently. Right choice when upper bound of number of items is known. Weakness mitigated by dynamic arrays
  • Binary search tree - smallest element is always leftmost leaf, largest is rightmost. Most appropriate when you need other dictionary operations, or unbounded key range and don't know max priority queue size in advance

Hash tables

  • Mathematical function on key to determine (large) integer position in Array
  • Under reasonable assumptions, search is O(1) (but can be O(n) under some circumstances)
  • O(1) in average case
  • Element k is stored in h(k) (hash function)
  • Hash function reduces the range of array indices that need to be handled (the size m of the possible outcomes of the hash function)

Hash functions

  • Division method
    • E.g. h(k) = k mod m
    • A prime not too close to an exact power of 2 is often a good choice for m
  • Multiplication method
    • h(k) = ⌊m (k A mod 1)⌋ (with 0 < A < 1; kA mod 1 is the fractional part of kA
    • m is usually a power of
  • Universal hashing
    • Choose a random hashing function from a carefully designed class of functions at the beginning of execution
    • Guarantees no single input will always evoke worst-case
  • Perfect hashing
    • Elements hashing to slot j are re-hashed into a Secondary hash table. Each slot has different constants for the hash function

Collisions

  • Chaining (buckets of linked lists) - considerable memory to pointers which could be used to make table larger
  • Open addressing - find desired location, if occupied, find next available open slot. Deletions complicated
Operation Hash table (expected) Hash table (worst case)
Search O(n/m) O(n)
Insert O(1) O(1)
Delete O(1) O(1)
Succ O(n+m) O(n+m)
Pred O(n+m) O(n+m)
Min O(n+m) O(n+m)
Max O(n+m) O(n+m)

Often the best option for a dictionary

Direct Address table

  • One array position for every possible key. Element with key k is stored in slot k
  • Lots of wasted space if few keys are used
  • If universe of keys is large, might be impractical or impossible
  • O(1) in worst case

Pointers and Objects

  • Multiple arrays (one for key, one for next one for prev)
    • Variable pointing to Head of list
    • Variable pointing to head of free
  • Single array, with known length (3 elements per object for next, key, prev). Can be used for heterogeneous, but its more complex (store length?)

Trees

Binary tree

  • Three fields: parent, left, right
  • Pointer to root of tree

Rooted tree with unbounded branches

  • Fixed number of arrays not viable, lots of wasted space on a sparse tree
  • parent, left-child, right sibling
  • pointer to root of tree

Graphs

Data Structures:

  • Adjacency matrices

    • Impractical for graphs with more than 1k vertices (1M entries)
    • Make sense only for small or very dense (complete) graphs
    • Better suited for all-pairs shortest path algorith
    • Better suited for algorithms that repeatedly ask is (i,j) in G. Most graph algos can be designed to eliminate the queries
  • Adjacency lists

    • Best for most purposes
    • Can better handle satellite data
    • Structure
      • edge (x,y) represented by edgenode with vertex (y), weight and next (edgenode)
      • graph: array of edgenodes, array of outdegree of each vertex, num vertices, num edges, isDirected (directed graphs include an edge only from x to y, but not from y to x Converting between the two only costs linear time

Clone this wiki locally