Asymptotic Analysis
Introduction
Algorithm: A finite sequence of “well-defined” instructions to solve a given computational problem.
Algorithm design is an art that demands a lot of creativity, intuition, and perseverance.
There is generally a trade-off between simplicity and efficiency of algorithms (i.e., efficient algorithms are generally more complex)
We don’t want to analyze running time of an algorithm using a wall clock because that depends on the machine, programming language, etc. Instead, we use mathematical analysis.
Word-RAM Model
Word is the basic storage unit of RAM. Word is a collection of few bytes. Each input item (number, name) is stored in binary format.
RAM can be viewed as a huge 1D array of words.
Any arbitrary location of RAM can be accessed in the same time irrespective of the location (hence the name “Random Access”) → approximately same latency regardless of the position in the array.
Data as well as Program reside fully in RAM (otherwise you’d have to load each instruction from the disk which is super inefficient - think of CS2106).
Each arithmetic or logical operation (+, -, *, /, AND, OR, NOT) involving a constant number of words takes a constant number of CPU cycles. (However, note that exponentiation takes logarithmic time if done right)
So, an instruction is executed by fetching it first from RAM, decoding it, fetching operands, performing arithmetic or logical operations, and then storing the results back into RAM. Each instruction takes a few clock cycles to be executed (about a few nanoseconds)
In this model, we count the number of instructions taken in Word-RAM model as a mesaure of the time taken for a program to run (so we abstract away running time). There is an assumption here that all operations (+, *, etc.) take the same time - although multiplication is obviously much harder (and hence, takes longer) then addition.
The reason we analyze the algorithm running time mathematically is that it is independent of the machine, hardware, etc. We are only concerned with the analysis of algorithms, not the running time on actual machines.
Running Time
Running time depends on the size of input. So, we parameterize the running time by the size of the input. Generally, we seek upper bounds on the running time, because well, everybody likes a guarantee (we want to know for sure that no matter what input, our algorithm cannot do worse than a specific amount of time)
There are 3 kinds of analysis depending on how the inputs are fed to the algorithm:
- Worst case: T(n) = maximum number of instructions of an algorithm on any input of size n (we assume that an adversary will try to find the worst possible input for our program such that it takes the longest time)
- Average case: T(n) = expected number of instructions of an algorithm over all inputs of size n. We need some assumption of statistical distribution of inputs to calculate this properly.
- Best case: T(n) = minimum number of instructions of an algorithm on an input of size n (we can just cheat with some naive slow algorithm that works fast on some input)
Example: worst case of bubble sort (reverse array) is ≈n2 instructions, while best case of bubble sort is ≈n instructions (already sorted array).
Example: Best case, worst case, average case of merge sort is ≈nlogn instructions.
Note that there is no indeterminism in the algorithm itself for average-case analysis. The uncertainty lies in the input, NOT the algorithm. This is different from expected time complexity for randomised algorithms where the expectation is over the statistical distribution of randomness internal to the algorithm/program.
Comparing Efficiency
When given two algorithms with their running times, we only compare for asymptotically large values of input size because time complexity really matters only for large-sized input (for small inputs, even a very inefficient algorithm will terminate in reasonable time).
Different machines/languages, etc. have different running time. So, we don’t measure actual run-time.
We estimate the rate of growth of runtime by asymptotic analysis.
Asymptotic Notations
Note that O(f(n)),Ω(f(n)), etc. are sets of functions, i.e., they represent a family of functions that are asymptotically upper bounded (or lower bounded) by f(n).
Big-O notation (Upper Bound): We write f(n)=O(g(n)) if there exists constants c>0,n0>0 such that 0≤f(n)≤cg(n) for all n≥n0
In set notation, O(g(n))={f(n):∃c>0,n0>0 s.t. ∀n≥n0,0≤f(n)≤cg(n))}
(So, actually f(n)=O(g(n)) is abuse of notation! We should be writing f(n)∈O(g(n)) since O(g(n)) is a set of functions of which f(n) is a member)
Big-Ω notation (Lower Bound): We write f(n)=Ω(g(n)) if there exists constants c>0,n0>0 such that 0≤cg(n)≤f(n) for all n≥n0
Big-Θ (Tight Bound): Θ(g(n))={f(n):∃c1,c2,n0>0 s.t. ∀n≥n0,0≤c1g(n)≤f(n)≤c2g(n)}
Note: Θ(g(n))=O(g(n))∩Ω(g(n)) (proof under Appendix)
There are 2 other notations - little-o and little-ω (omega) - which are even stricter definitions of their counterparts:
- Little-o: o(g(n))={f(n):∀c>0,∃n0>0 s.t. ∀n≥n0,0≤f(n)<cg(n)} (i.e., g is strictly faster-growing than f)
- Little-ω: ω(g(n))={f(n):∀c>0,∃n0>0 s.t. ∀n≥n0,0≤cg(n)<f(n)} (i.e., g is stricly slower-growing than f)
There are some important points to note about these little-o and little-omega definitions:
- You need to find an n0 for every positive constant c. In other words, you need some sort of relation between n0 and c so that given a c, you can find n0. It is a universal statement in terms of c. So, one example of c and n0 cannot be used to prove that a function is o(g(n)) or ω(g(n)). However, if you can find one value of c for which there is no n0 that satisfies the above conditions, then we can say that it is not o(g(n)) or ω(g(n)).
- Notice the strict inequality between cg(n) and f(n) as opposed to the non-strict inequality for Big-O and Big-Ω.
- It should be obvious that:
- f(n)∈o(g(n))⟹f(n)∈O(g(n)) (if you can find an n0 for every positive value of c, you can pick any combination to satisfy the Big-O definition)
- f(n)∈ω(g(n))⟹f(n)∈Ω(g(n)) (same as above)
- If f(n)∈o(g(n)) or f(n)∈ω(g(n)), then f(n)∈/Θ(g(n)) (since g(n) either grows faster or slower than f(n), it doesn’t grow at the same rate as f(n), which is a requirement for f(n) to be tightly bounded by g(n).
Limits
Assume f(n),g(n)>0. Then, the following results are true:
- limn→∞g(n)f(n)=0⟹f(n)∈o(g(n))
- limn→∞g(n)f(n)<∞⟹f(n)∈O(g(n))
- 0<limn→∞g(n)f(n)<∞⟹f(n)∈Θ(g(n))
- limn→∞g(n)f(n)>0⟹f(n)∈Ω(g(n))
- limn→∞g(n)f(n)=∞⟹f(n)∈ω(g(n))
We prove the first one using the epsilon-delta definition of limits:
- Since limn→∞g(n)f(n)=0, by definition, we have:
- For all ϵ>0, there exists δ>0 such that g(n)f(n)<ϵ for n>δ (as you increase n beyond δ, the ratio goes below ϵ)
- Setting c=ϵ and n0=δ, we have:
- For all c>0, there exists n0>0 such that g(n)f(n)<c for n>n0
- Hence, for all c>0, there exists n0>0 such that f(n)<cg(n) for n>n0.
- So, by definition, f(n)∈o(g(n))
Other Important Properties
Transitivity
Should be obvious intuitively (e.g. If g upper bounds f, and h upper bounds g, then obviously h upper bounds f). All of the following are true because <,>,≤,≥ are transitive.
- f(n)=Θ(g(n))∧g(n)=Θ(h(n))⟹f(n)=Θ(h(n))
- f(n)=O(g(n))∧g(n)=O(h(n))⟹f(n)=O(h(n))
- f(n)=Ω(g(n))∧g(n)=Ω(h(n))⟹f(n)=Ω(h(n))
- f(n)=o(g(n))∧g(n)=o(h(n))⟹f(n)=o(h(n))
- f(n)=ω(g(n))∧g(n)=ω(h(n))⟹f(n)=ω(h(n))
Reflexivity
- f(n)=Θ(f(n))
- f(n)=O(f(n))
- f(n)=Ω(f(n))
In particular, note that f(n)=o(f(n)) and f(n)=ω(f(n)) because f cannot grow faster or slower than f. (So, if f(n)=Θ(g(n)), then f cannot be o(g(n)) or ω(g(n)) → here, g(n) = f(n) itself)
Symmetry
- f(n)=Θ(g(n))⟺g(n)=Θ(f(n)) (if f grows at the same rate as g, then g grows at the same rate as f → since equality is symmetrical)
Complementarity
- f(n)=O(g(n))⟺g(n)=Ω(f(n)) (saying that f is upper bounded by g is the same as saying that g is lower bounded by f)
- f(n)=o(g(n))⟺g(n)=ω(f(n)) (if f grows faster than g, then g grows slower than f)
Appendix
Useful Mathematical Facts
Properties of logarithms and exponents
- ax=m⟺x=loga(m) → definition of logarithm
- loga(mn)=loga(m)+loga(n) → product property
- loga(m/n)=loga(m)−loga(n) → quotient property
- log(mn)=nlog(m) → power property
- loga(b)=1/logb(a)
- logba=logc(a)×logb(c)=logc(a)/logc(b) → change of base property
- aloga(x)=a → number raised to log
- loga(a)=1
- loga(1)=0
- loga(1/b)=−loga(b)
- alogb(x)=xlogb(a) (when x,a>0) → can take log on both sides to verify its true
- aman=am+n
- am/an=am−n
- 1/am=a−m
- (am)n=amn
- (ab)m=ambm
Artithmetic series:
a+(a+d)+(a+2d)+⋯+(a+(n−1)d)=2n(2a+(n−1)d) 1+2+3+⋯+n=2n(n+1)=Θ(n2) Geometric series:
a+ar+ar2+…arn−1=r−1a(rn−1) a+ar+ar2+⋯=1−raif ∣r∣<1 Sum of Squares:
12+22+⋯+n2=6n(n+1)(2n+1)=Θ(n3) Sum of Cubes:
13+23+33+⋯+n3=(2n(n+1))2=Θ(n4) Harmonic Series:
11+21+31+⋯+n1=Θ(logn) Sterling’s approximation: (en)n≤n!≤nn
log1+log2+log3+⋯+logn=Θ(nlogn) Other important series:
211+222+233+⋯+2nn≤i=1∑∞2ii=2=O(1) Proof of Θ(g(n))=O(g(n))∩Ω(g(n))
Since all the above terms are sets (families of functions), we have to show that any arbitrary element of LHS is also in RHS, and vice versa. (Recall that proving that 2 sets are equal is the same as showing that each set is a subset of the other)
- So, let f(n)∈Θ(g(n)) be an arbitrary element of Θ(g(n)).
- Then, by definition, 0≤c1g(n)≤f(n)≤c2g(n) for some constants c1,c2>0 and for all n>n0 for some n0.
- The above inequality can be rewritten as:
0≤c1g(n)≤f(n) for all n>n0 and 0≤f(n)≤c2g(n) for all n>n0
- By definition of Big-Omega and Big-O notation, we can write (3) it as: f(n)∈Ω(g(n))∩f(n)∈O(g(n)).
- So, we have proved one direction of the equality.
- Now, we show that if f(n) (an arbitrary but particular function) is O(g(n)) and Ω(g(n)), then f(n)∈Θ(g(n)).
- Since f(n)∈O(g(n)), by definition: there exists some constants c1,n1 such that ∀n>n1,0≤f(n)≤c1g(n)
- Similarly, since f(n)∈O(g(n)), by definition: there exists some constants c2,n2 such that ∀n>n2,0≤c2g(n)≤f(n)
- We pick n0=max(n1,n2) (to ensure the above inequalities hold simultaneously, we have to pick the intersection of n>n1 and n>n2, which is given by n>max(n1,n2)).
- Then, combining the above inequalities, we get: ∀n>n0, 0≤c2g(n)≤f(n)≤c1g(n), which means that f(n)∈Θ(g(n)) by defintion of Big-Theta.
- Hence, Θ(g(n))=O(g(n))∩Ω(g(n))