Data Structures

Contiguous: Array, matrix, heap, hash table
Linked: lists, trees, graph
Containers
- Retrieval independent of content
  - Stacks
  - Queues

Dictionaries

Retrieve data by content (search by key)

Operation	Unsorted Array	Sorted Array	Singly unsorted	Double unsorted	Singly sorted	Doubly sorted
Search	O(n)	O(logn)	O(n)	O(n)	O(n)	O(n)
Insert	O(1)	O(n)	O(1)	O(1)	O(n)	O(n)
Delete	O(1)¹	O(n)	O(n)²	O(1)	O(n)²	(1)
Succ	O(n)	O(n)	O(n)	O(n)	O(1)	O(1)
Pred	O(n)	O(n)	O(n)	O(n)	O(n)²	O(1)
Min	O(n)	O(1)	O(n)	O(n)	O(1)	O(1)
Max	O(n)	O(1)	O(n)	O(n)	O(1)³	O(1)

¹ Swap A[x] with A[n] for constant time

² Must spend linear time finding predecessor

³ Maintain separate pointer to tail of list (update on every insert/delete), with constant time on doubly linked lists. for singly linked, add an extra linear sweep to update the pointer on delete.

Possible data structures:
- Unsorted arrays or linked lists. Bad for dictionaries larger than 50 or 100 items
- Sorted linked lists or arrays. Not worth effort unless eliminating duplicates (for binary search). Only appropriate with few insertions/deletions
- Hash table - for moderate to large number of keys 100-10M).
  - Num buckets should be about the size of the number of items expected to be put in table
  - With open addressing, 30% more
  - selecting num buckets to be a prime number minimizes the dangers of a bad hash function
- Binary search trees
  - random search - insert node at leaf where it can be found, no rebalancing
  - balanced tree - use rotation to restructure
- B-trees - for data sets that don't fit into main memory (more than 1M items)

Trees

Full and complete binary tree: all non-leaf nodes have exactly two children (2ⁿ - 1 nodes)
Balanced trees - depth of subtrees will not vary by more than a certain amount
Traversal: in-order, pre-order, post-order
To check identity between trees, a single traversal (in-, pre- or post-) is not sufficient. Must combine in-order with either pre- or post-order
- Need to insert special character to identify when a left or right node is null
```
 Same in-order, pre-order traversal, but not identical
  3    3
 /      \
3        3
```

Binary Search tree

left, right, parent, item
Search is O(height); h = logn for a balanced tree; h = n for worst case insertion order
- A randomly built tree gives a height of logn
Search
Insert
Delete
- 0 children: clear pointer to node
- 1 child: point grandparent to grandchild
- 2 children: relabel with key to next successor (min of right subtree), remove the successor node (with at most one child)
Max/Min
Pred/Succ

How does binary tree handle identity?
Can be used for dictionaries and priority queue
Binary search tree property: if node y is in left subtree of x, key[y] ≤ key[x]; if node y is in right subtree of x, key[x] ≤ key[y]
- Corollary: all nodes in left subtree are less than x, and all nodes in right subtree are more than x
```
Invalid BST
   20
  /  \
10   30
  \
   25
```

Inorder tree walk - print out keys in sorted order

inorder = (node)->
  if node != nil
    inorder(node.left)
    print node.key
    inorder(node.right)

Search

searchRecursive = (node, k) ->
  return node if node.nil? || node.key == k
  if k < node.key
    search(node.left, k)
  else
    search(node.right, k)
# More efficient on most computers (why?)
searchIterative = (node, k) ->
  while node?.key != k
    if k < node.key
      node = node.left
    else
      node = node.right

treeSuccessor = (x) -> # O(h)
  if x.right?
    treeMinimum(x.right)
  else
    y = x.parent
    while y? && y.right == x
      x = y
      y = y.parent
    return y

treeInsert = (tree, z)->  # O(h)
  y = nil
  x = tree.root
  while x?
    y = x
    if z.key < x.key
      x = x.left
    else
      x = x.right
  z.parent = y
  if y == nil
    tree.root = y
  else if z.key < y.key
    y.left = z
  else
    y.right = z

treeInsertRecursive = (tree, key, parent) ->
  if tree == null
    node =
      key: key
      left: null
      right: null
      parent: parent
    tree = node
    return
  if x < tree.key
    treeInsertRecursive(x.left, key, x) # how to pass left as a "pointer"?
  else
    treeInsertRecursive(x.right, key, x)

treeDelete = (tree, z) -> # O(h)
  if z.left == nil || z.right == nil # Zero or One child
    y = z
  else # Two children
    y = treeSuccessor(z)
  if y.left?
    x = y.left
  else
    x = y.right
  if x? # No children
    x.parent = y.parent

  if y.parent == nil
    tree.root = x
  else if y == y.parent.left
    y.parent.left = x
  else
    y.parent.right = x

  if y != z
    z.key = y.key
    # copy sattelite data into z

  return y

Red-Black trees

balanced, guarantees O(lg n) operations
extra bit of storage per node (colour: red/black)
Properties:
- every node is either red or black
- root is black
- every leaf (nil) is black
- if a node is red, both children are black
- for each node, all paths from node to descendant leaves have same number of black nodes
Pointer structure adjusted through rotations which preserve the properties.
- Left rotate assumes right child is not nil
- Right rotate assumes left child is not nil
- Rotation pivots around the link from x to y, making y the new root of the subtree and x the left child of y and ys left child becomes x's right child
```
  x
 / \
A   y
   / \
   B  C
Left Rotate(T, x) # O(1)
   y
  / \
 x   C
/ \
A B
```
After inserting (just like regular insert) setting the new node as red, run an auxiliary RB-Insert-Fixup to recolour nodes and perform rotations
- Case 1: z's uncle is red - recolour parent and uncle black and grandparent red. Repeat with grandparent as z
- Case 2: z's uncle y is black and z is a right child - left rotation on z's parent, ans z becomes z's new left (the former parent) to transform to case 3
- Case 3: z's uncle y is black and z is a left child - (colour changes?) rotate right
After deleting, run auxiliar rb-delete-fixup if y is black (the spliced node)

Heaps

An array that can be viewed as a nearly complete binary tree
Tree is filled on all levels except possibly the lowest, which is filled from the left
A heap is represented by an array and two attributes: length and heap-size. (heap-size <= length)
Root of the tree is the first element in the array
Indices of parent and children are computed by:
- parent = (i)-> Math.floor((i+1)/2) - 1
- leftChild = (i) -> 2*(i+1) - 1
- rightChild = (i) -> 2*(i+1) // leftChild(i) + 1
Two types of heaps: max-heap and min-heap
- max-heap - A[parent(i)] >= A[i]; largest element stored at root
- min-heap - A[parent(i)] <= A[i]; smallest element stored at root
The height of the heap of lg n.
Elements A[(⌊n/2⌋+1)..n] are all leaves of the tree
Operations on heap are O(h) = O(lgn)
- maxHeapify/minHeapify - maintain heap-property (O(lgn)). Worst case is when the last row is exactly half full
- buildMaxHeap/buildMinHeap - produce a heap from an unsorted input (O(n))
- heapsort - sort in place. (O(nlgn))
- insert/extractMax/increaseKey/max - allow heap to be used as priority queue (O(lgn))
maxHeapify(A, i) - float value down to right place

maxHeapify = (A,i) ->
  l = left(i)
  r = right(i)
  largest = if l <= heapSize && A[l] > A[i] then l else i
  if r <= heapSize && A[r] > A[largest]
    largest = r
  if largest != i
    swap(A[i], A[largest])
    maxHeapify(A, largest)

buildMaxHeap(A) - Goes through each leave in the tree, which is a 1 element heap. Go through each remaining node (bottom-up) and call maxHeapify on each

buildMaxHeap = (A) ->
  heapSize = A.length
  for i in [Math.floor(A.length/2)..1]
    maxHeapify(A, i-1)

heapsort(a) - since max element is in root, we know its final position (A[n]) and can be swapped. Decreasing heap-size by 1 makes the remaining elements easily turned into a max-heap after calling maxHeapify on the new root

heapsort = (A) ->
  buildMaxHeap(A)
  for i in [A.length..2]
    swap(A[1], A[i])
    heapSize--
    maxHeapify(A, 1)

Priority Queue

Type of heap
max-priority/min-priority
Allow retrieval in terms of which has highest priority (opposed to insertion time like stack or queue)
If application will perform no insertions after initial query, no need for an explicit priority queue, just sort by priority and proceed top to bottom. If mixing insertions, deletions and queries, a priority queue is needed

Insert
Find Min/ Find Max
Delete Min/ Delete Max
Increase-key(S,x,k) (from new book only?)

Operation	Unsorted Array	Sorted Array	Balanced tree
Insert	O(1)	O(n)	O(log n)
Find Min	O(1)¹	O(1)¹	O(1)¹
Del Min	O(n)	O(1)²	O(log n)

¹ Store a pointer to the min entry. Update on insert. On delete, do a new search (linear and logn) to find the new min

² Order descending, so delete is always last element and requires no re-arranging

Available data structures:

Sorted array or list - efficient to identify max/min and delete it. Maintaining order is slow. Only suitable with few insertions
Binary heaps - support insertion and extract min efficiently. Right choice when upper bound of number of items is known. Weakness mitigated by dynamic arrays
Binary search tree - smallest element is always leftmost leaf, largest is rightmost. Most appropriate when you need other dictionary operations, or unbounded key range and don't know max priority queue size in advance

Hash tables

Mathematical function on key to determine (large) integer position in Array
Under reasonable assumptions, search is O(1) (but can be O(n) under some circumstances)
O(1) in average case
Element k is stored in h(k) (hash function)
Hash function reduces the range of array indices that need to be handled (the size m of the possible outcomes of the hash function)

Hash functions

Division method
- E.g. h(k) = k mod m
- A prime not too close to an exact power of 2 is often a good choice for m
Multiplication method
- h(k) = ⌊m (k A mod 1)⌋ (with 0 < A < 1; kA mod 1 is the fractional part of kA
- m is usually a power of
Universal hashing
- Choose a random hashing function from a carefully designed class of functions at the beginning of execution
- Guarantees no single input will always evoke worst-case
Perfect hashing
- Elements hashing to slot j are re-hashed into a Secondary hash table. Each slot has different constants for the hash function

Collisions

Chaining (buckets of linked lists) - considerable memory to pointers which could be used to make table larger
Open addressing - find desired location, if occupied, find next available open slot. Deletions complicated

Operation	Hash table (expected)	Hash table (worst case)
Search	O(n/m)	O(n)
Insert	O(1)	O(1)
Delete	O(1)	O(1)
Succ	O(n+m)	O(n+m)
Pred	O(n+m)	O(n+m)
Min	O(n+m)	O(n+m)
Max	O(n+m)	O(n+m)

Often the best option for a dictionary

Direct Address table

One array position for every possible key. Element with key k is stored in slot k
Lots of wasted space if few keys are used
If universe of keys is large, might be impractical or impossible
O(1) in worst case

Pointers and Objects

Multiple arrays (one for key, one for next one for prev)
- Variable pointing to Head of list
- Variable pointing to head of free
Single array, with known length (3 elements per object for next, key, prev). Can be used for heterogeneous, but its more complex (store length?)

Trees

Binary tree

Three fields: parent, left, right
Pointer to root of tree

Rooted tree with unbounded branches

Fixed number of arrays not viable, lots of wasted space on a sparse tree
parent, left-child, right sibling
pointer to root of tree

Graphs

Data Structures:

Adjacency matrices
- Impractical for graphs with more than 1k vertices (1M entries)
- Make sense only for small or very dense (complete) graphs
- Better suited for all-pairs shortest path algorith
- Better suited for algorithms that repeatedly ask is (i,j) in G. Most graph algos can be designed to eliminate the queries
Adjacency lists
- Best for most purposes
- Can better handle satellite data
- Structure
  - edge (x,y) represented by edgenode with vertex (y), weight and next (edgenode)
  - graph: array of edgenodes, array of outdegree of each vertex, num vertices, num edges, isDirected (directed graphs include an edge only from x to y, but not from y to x Converting between the two only costs linear time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Structures

Dictionaries

Trees

Binary Search tree

Red-Black trees

Heaps

Priority Queue

Hash tables

Hash functions

Collisions

Direct Address table

Pointers and Objects

Trees

Binary tree

Rooted tree with unbounded branches

Graphs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally