-
Notifications
You must be signed in to change notification settings - Fork 0
Data Structures
- Contiguous: Array, matrix, heap, hash table
- Linked: lists, trees, graph
- Containers
- Retrieval independent of content
- Stacks
- Queues
- Retrieval independent of content
Retrieve data by content (search by key)
| Operation | Unsorted Array | Sorted Array | Singly unsorted | Double unsorted | Singly sorted | Doubly sorted |
|---|---|---|---|---|---|---|
| Search | O(n) | O(logn) | O(n) | O(n) | O(n) | O(n) |
| Insert | O(1) | O(n) | O(1) | O(1) | O(n) | O(n) |
| Delete | O(1)1 | O(n) | O(n)2 | O(1) | O(n)2 | (1) |
| Succ | O(n) | O(n) | O(n) | O(n) | O(1) | O(1) |
| Pred | O(n) | O(n) | O(n) | O(n) | O(n)2 | O(1) |
| Min | O(n) | O(1) | O(n) | O(n) | O(1) | O(1) |
| Max | O(n) | O(1) | O(n) | O(n) | O(1)3 | O(1) |
1 Swap A[x] with A[n] for constant time
2 Must spend linear time finding predecessor
3 Maintain separate pointer to tail of list (update on every insert/delete), with constant time on doubly linked lists. for singly linked, add an extra linear sweep to update the pointer on delete.
- Possible data structures:
- Unsorted arrays or linked lists. Bad for dictionaries larger than 50 or 100 items
- Sorted linked lists or arrays. Not worth effort unless eliminating duplicates (for binary search). Only appropriate with few insertions/deletions
- Hash table - for moderate to large number of keys 100-10M).
- Num buckets should be about the size of the number of items expected to be put in table
- With open addressing, 30% more
- selecting num buckets to be a prime number minimizes the dangers of a bad hash function
- Binary search trees
- random search - insert node at leaf where it can be found, no rebalancing
- balanced tree - use rotation to restructure
- B-trees - for data sets that don't fit into main memory (more than 1M items)
- Full and complete binary tree: all non-leaf nodes have exactly two children (2n - 1 nodes)
- Balanced trees - depth of subtrees will not vary by more than a certain amount
- Traversal: in-order, pre-order, post-order
- To check identity between trees, a single traversal (in-, pre- or post-) is not sufficient. Must combine in-order with either pre- or post-order
- Need to insert special character to identify when a left or right node is null
Same in-order, pre-order traversal, but not identical 3 3 / \ 3 3
- left, right, parent, item
- Search is O(height); h = logn for a balanced tree; h = n for worst case insertion order
- A randomly built tree gives a height of logn
- Search
- Insert
- Delete
- 0 children: clear pointer to node
- 1 child: point grandparent to grandchild
- 2 children: relabel with key to next successor (min of right subtree), remove the successor node (with at most one child)
- Max/Min
- Pred/Succ
-
How does binary tree handle identity?
-
Can be used for dictionaries and priority queue
-
Binary search tree property: if node y is in left subtree of x, key[y] ≤ key[x]; if node y is in right subtree of x, key[x] ≤ key[y]
- Corollary: all nodes in left subtree are less than x, and all nodes in right subtree are more than x
Invalid BST 20 / \ 10 30 \ 25 -
Inorder tree walk - print out keys in sorted order
inorder = (node)-> if node != nil inorder(node.left) print node.key inorder(node.right) -
Search
searchRecursive = (node, k) -> return node if node.nil? || node.key == k if k < node.key search(node.left, k) else search(node.right, k) # More efficient on most computers (why?) searchIterative = (node, k) -> while node?.key != k if k < node.key node = node.left else node = node.righttreeSuccessor = (x) -> # O(h) if x.right? treeMinimum(x.right) else y = x.parent while y? && y.right == x x = y y = y.parent return y treeInsert = (tree, z)-> # O(h) y = nil x = tree.root while x? y = x if z.key < x.key x = x.left else x = x.right z.parent = y if y == nil tree.root = y else if z.key < y.key y.left = z else y.right = z treeInsertRecursive = (tree, key, parent) -> if tree == null node = key: key left: null right: null parent: parent tree = node return if x < tree.key treeInsertRecursive(x.left, key, x) # how to pass left as a "pointer"? else treeInsertRecursive(x.right, key, x) treeDelete = (tree, z) -> # O(h) if z.left == nil || z.right == nil # Zero or One child y = z else # Two children y = treeSuccessor(z) if y.left? x = y.left else x = y.right if x? # No children x.parent = y.parent if y.parent == nil tree.root = x else if y == y.parent.left y.parent.left = x else y.parent.right = x if y != z z.key = y.key # copy sattelite data into z return y
- balanced, guarantees O(lg n) operations
- extra bit of storage per node (colour: red/black)
- Properties:
- every node is either red or black
- root is black
- every leaf (nil) is black
- if a node is red, both children are black
- for each node, all paths from node to descendant leaves have same number of black nodes
- Pointer structure adjusted through rotations which preserve the properties.
- Left rotate assumes right child is not nil
- Right rotate assumes left child is not nil
- Rotation pivots around the link from x to y, making y the new root of the subtree and x the left child of y and ys left child becomes x's right child
x / \ A y / \ B C Left Rotate(T, x) # O(1) y / \ x C / \ A B - After inserting (just like regular insert) setting the new node as red, run an auxiliary RB-Insert-Fixup to recolour nodes and perform rotations
- Case 1: z's uncle is red - recolour parent and uncle black and grandparent red. Repeat with grandparent as z
- Case 2: z's uncle y is black and z is a right child - left rotation on z's parent, ans z becomes z's new left (the former parent) to transform to case 3
- Case 3: z's uncle y is black and z is a left child - (colour changes?) rotate right
- After deleting, run auxiliar rb-delete-fixup if y is black (the spliced node)
- An array that can be viewed as a nearly complete binary tree
- Tree is filled on all levels except possibly the lowest, which is filled from the left
- A heap is represented by an array and two attributes: length and heap-size. (heap-size <= length)
- Root of the tree is the first element in the array
- Indices of parent and children are computed by:
parent = (i)-> Math.floor((i+1)/2) - 1leftChild = (i) -> 2*(i+1) - 1rightChild = (i) -> 2*(i+1) // leftChild(i) + 1
- Two types of heaps: max-heap and min-heap
- max-heap - A[parent(i)] >= A[i]; largest element stored at root
- min-heap - A[parent(i)] <= A[i]; smallest element stored at root
- The height of the heap of lg n.
- Elements A[(⌊n/2⌋+1)..n] are all leaves of the tree
- Operations on heap are O(h) = O(lgn)
- maxHeapify/minHeapify - maintain heap-property (O(lgn)). Worst case is when the last row is exactly half full
- buildMaxHeap/buildMinHeap - produce a heap from an unsorted input (O(n))
- heapsort - sort in place. (O(nlgn))
- insert/extractMax/increaseKey/max - allow heap to be used as priority queue (O(lgn))
- maxHeapify(A, i) - float value down to right place
maxHeapify = (A,i) ->
l = left(i)
r = right(i)
largest = if l <= heapSize && A[l] > A[i] then l else i
if r <= heapSize && A[r] > A[largest]
largest = r
if largest != i
swap(A[i], A[largest])
maxHeapify(A, largest)- buildMaxHeap(A) - Goes through each leave in the tree, which is a 1 element heap. Go through each remaining node (bottom-up) and call maxHeapify on each
buildMaxHeap = (A) ->
heapSize = A.length
for i in [Math.floor(A.length/2)..1]
maxHeapify(A, i-1)- heapsort(a) - since max element is in root, we know its final position (A[n]) and can be swapped. Decreasing heap-size by 1 makes the remaining elements easily turned into a max-heap after calling maxHeapify on the new root
heapsort = (A) ->
buildMaxHeap(A)
for i in [A.length..2]
swap(A[1], A[i])
heapSize--
maxHeapify(A, 1)- Type of heap
- max-priority/min-priority
- Allow retrieval in terms of which has highest priority (opposed to insertion time like stack or queue)
- If application will perform no insertions after initial query, no need for an explicit priority queue, just sort by priority and proceed top to bottom. If mixing insertions, deletions and queries, a priority queue is needed
- Insert
- Find Min/ Find Max
- Delete Min/ Delete Max
- Increase-key(S,x,k) (from new book only?)
| Operation | Unsorted Array | Sorted Array | Balanced tree |
|---|---|---|---|
| Insert | O(1) | O(n) | O(log n) |
| Find Min | O(1)1 | O(1)1 | O(1)1 |
| Del Min | O(n) | O(1)2 | O(log n) |
1 Store a pointer to the min entry. Update on insert. On delete, do a new search (linear and logn) to find the new min
2 Order descending, so delete is always last element and requires no re-arranging
Available data structures:
- Sorted array or list - efficient to identify max/min and delete it. Maintaining order is slow. Only suitable with few insertions
- Binary heaps - support insertion and extract min efficiently. Right choice when upper bound of number of items is known. Weakness mitigated by dynamic arrays
- Binary search tree - smallest element is always leftmost leaf, largest is rightmost. Most appropriate when you need other dictionary operations, or unbounded key range and don't know max priority queue size in advance
- Mathematical function on key to determine (large) integer position in Array
- Under reasonable assumptions, search is O(1) (but can be O(n) under some circumstances)
- O(1) in average case
- Element k is stored in h(k) (hash function)
- Hash function reduces the range of array indices that need to be handled (the size m of the possible outcomes of the hash function)
- Division method
- E.g. h(k) = k mod m
- A prime not too close to an exact power of 2 is often a good choice for m
- Multiplication method
- h(k) = ⌊m (k A mod 1)⌋ (with 0 < A < 1; kA mod 1 is the fractional part of kA
- m is usually a power of
- Universal hashing
- Choose a random hashing function from a carefully designed class of functions at the beginning of execution
- Guarantees no single input will always evoke worst-case
- Perfect hashing
- Elements hashing to slot j are re-hashed into a Secondary hash table. Each slot has different constants for the hash function
- Chaining (buckets of linked lists) - considerable memory to pointers which could be used to make table larger
- Open addressing - find desired location, if occupied, find next available open slot. Deletions complicated
| Operation | Hash table (expected) | Hash table (worst case) |
|---|---|---|
| Search | O(n/m) | O(n) |
| Insert | O(1) | O(1) |
| Delete | O(1) | O(1) |
| Succ | O(n+m) | O(n+m) |
| Pred | O(n+m) | O(n+m) |
| Min | O(n+m) | O(n+m) |
| Max | O(n+m) | O(n+m) |
Often the best option for a dictionary
- One array position for every possible key. Element with key k is stored in slot k
- Lots of wasted space if few keys are used
- If universe of keys is large, might be impractical or impossible
- O(1) in worst case
- Multiple arrays (one for key, one for next one for prev)
- Variable pointing to Head of list
- Variable pointing to head of free
- Single array, with known length (3 elements per object for next, key, prev). Can be used for heterogeneous, but its more complex (store length?)
- Three fields: parent, left, right
- Pointer to root of tree
- Fixed number of arrays not viable, lots of wasted space on a sparse tree
- parent, left-child, right sibling
- pointer to root of tree
Data Structures:
-
Adjacency matrices
- Impractical for graphs with more than 1k vertices (1M entries)
- Make sense only for small or very dense (complete) graphs
- Better suited for all-pairs shortest path algorith
- Better suited for algorithms that repeatedly ask is (i,j) in G. Most graph algos can be designed to eliminate the queries
-
Adjacency lists
- Best for most purposes
- Can better handle satellite data
- Structure
- edge (x,y) represented by edgenode with vertex (y), weight and next (edgenode)
- graph: array of edgenodes, array of outdegree of each vertex, num vertices, num edges, isDirected (directed graphs include an edge only from x to y, but not from y to x Converting between the two only costs linear time